Forum: >>> Magnum BBS <<<

Minimal string rotation

From =?UTF-8?Q?Bj=C3=B6rn_Johansson?=@21:1/5 to All on Tue Mar 8 02:39:04 2022

Hi all, I am new to TCL and I wonder if anyone anywhere
implemented a lyndon word algorithm in TCL?

https://en.wikipedia.org/wiki/Lyndon_word

I need this to find the minimal string rotation for a circular string.
There are plenty of implementations for python like this:

https://gist.github.com/dvberkel/1950267

I know it does not seem complicated, but I am not proficient in TCL and I seek to motivate a TCL developer to add a feature to a piece of software I depend on.

Grateful for answers!
/bjorn

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andreas Leitgeb@21:1/5 to [email protected] on Tue Mar 8 15:56:01 2022

Björn Johansson <[email protected]> wrote:

Hi all, I am new to TCL and I wonder if anyone anywhere
implemented a lyndon word algorithm in TCL? https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string.
There are plenty of implementations for python like this: https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.

Some words may of course not have a Lyndon word, if they happen to
be a juxtaposition of multiple copies of a shorter string.

Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or unicode?) Or does it need to support any "s-sized" alphabet?

Will the strings be very long? (That would impose the need of some optimisations)

For very small alphabets (compared to length of string), other
optimisations spring to mind.

I'll start with a quite unoptimized version:

proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf }
}
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Bj=C3=B6rn_Johansson?=@21:1/5 to Andreas Leitgeb on Wed Mar 9 09:09:17 2022

On Tuesday, March 8, 2022 at 3:56:06 PM UTC, Andreas Leitgeb wrote:

Björn Johansson <[email protected]> wrote:

Hi all, I am new to TCL and I wonder if anyone anywhere
implemented a lyndon word algorithm in TCL? https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string. There are plenty of implementations for python like this: https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.

Some words may of course not have a Lyndon word, if they happen to
be a juxtaposition of multiple copies of a shorter string.

Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or unicode?) Or does it need to support any "s-sized" alphabet?

Will the strings be very long? (That would impose the need of some optimisations)

For very small alphabets (compared to length of string), other
optimisations spring to mind.

I'll start with a quite unoptimized version:

proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf }
}
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}

Thanks for your reply and for taking your time with the code.

Strings will be smaller that 30000 characters, they are DNA so the alphabet is small.
I need only ASCII as this covers the IUPAC extended DNA code (16 symbols):

Symbol Mnemonic Translation
A A (adenine)
C C (cytosine)
G G (guanine)
T T (thymine)
U U (uracil)
R puRine A or G (purines)
Y pYrimidine C or T/U (pyrimidines)
M aMino group A or C
K Keto group G or T/U
S Strong interaction C or G
W Weak interaction A or T/U
H not G A, C or T/U
B not A C, G or T/U
V not T/U A, C or G
D not C A, G or T/U
N aNy A, C, G or T/U

Ill try it out and get back here with the result.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Harald Oehlmann@21:1/5 to All on Wed Mar 9 18:19:58 2022

Am 09.03.2022 um 18:09 schrieb Björn Johansson:

On Tuesday, March 8, 2022 at 3:56:06 PM UTC, Andreas Leitgeb wrote:

Björn Johansson <[email protected]> wrote:

Hi all, I am new to TCL and I wonder if anyone anywhere
implemented a lyndon word algorithm in TCL?
https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string.
There are plenty of implementations for python like this:
https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.

Some words may of course not have a Lyndon word, if they happen to
be a juxtaposition of multiple copies of a shorter string.

Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or
unicode?) Or does it need to support any "s-sized" alphabet?

Will the strings be very long? (That would impose the need of some
optimisations)

For very small alphabets (compared to length of string), other
optimisations spring to mind.

I'll start with a quite unoptimized version:

proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf }
}
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}

Thanks for your reply and for taking your time with the code.

Strings will be smaller that 30000 characters, they are DNA so the alphabet is small.
I need only ASCII as this covers the IUPAC extended DNA code (16 symbols):

Symbol Mnemonic Translation
A A (adenine)
C C (cytosine)
G G (guanine)
T T (thymine)
U U (uracil)
R puRine A or G (purines)
Y pYrimidine C or T/U (pyrimidines)
M aMino group A or C
K Keto group G or T/U
S Strong interaction C or G
W Weak interaction A or T/U
H not G A, C or T/U
B not A C, G or T/U
V not T/U A, C or G
D not C A, G or T/U
N aNy A, C, G or T/U

Ill try it out and get back here with the result.

Björn,

there is a considerable DNA TCL community, located in Strasbourg and Heidelberg. You may ask them for DNA optimized code also at this list.

Take care,
Harald

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Bj=C3=B6rn_Johansson?=@21:1/5 to Harald Oehlmann on Wed Mar 9 09:21:52 2022

On Wednesday, March 9, 2022 at 5:20:01 PM UTC, Harald Oehlmann wrote:

Am 09.03.2022 um 18:09 schrieb Björn Johansson:

On Tuesday, March 8, 2022 at 3:56:06 PM UTC, Andreas Leitgeb wrote:

Björn Johansson <[email protected]> wrote:

Hi all, I am new to TCL and I wonder if anyone anywhere
implemented a lyndon word algorithm in TCL?
https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string. >>> There are plenty of implementations for python like this:
https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.

Some words may of course not have a Lyndon word, if they happen to
be a juxtaposition of multiple copies of a shorter string.

Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or
unicode?) Or does it need to support any "s-sized" alphabet?

Will the strings be very long? (That would impose the need of some
optimisations)

For very small alphabets (compared to length of string), other
optimisations spring to mind.

I'll start with a quite unoptimized version:

proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf } >> }
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}

Thanks for your reply and for taking your time with the code.

Strings will be smaller that 30000 characters, they are DNA so the alphabet is small.
I need only ASCII as this covers the IUPAC extended DNA code (16 symbols):

Symbol Mnemonic Translation
A A (adenine)
C C (cytosine)
G G (guanine)
T T (thymine)
U U (uracil)
R puRine A or G (purines)
Y pYrimidine C or T/U (pyrimidines)
M aMino group A or C
K Keto group G or T/U
S Strong interaction C or G
W Weak interaction A or T/U
H not G A, C or T/U
B not A C, G or T/U
V not T/U A, C or G
D not C A, G or T/U
N aNy A, C, G or T/U

Ill try it out and get back here with the result.

Björn,

there is a considerable DNA TCL community, located in Strasbourg and Heidelberg. You may ask them for DNA optimized code also at this list.

Take care,
Harald

Thanks! Very useful!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ian Braithwaite@21:1/5 to All on Thu Mar 10 13:25:01 2022

On 3/9/22 18:09, Björn Johansson wrote:

On Tuesday, March 8, 2022 at 3:56:06 PM UTC, Andreas Leitgeb wrote:

Björn Johansson <[email protected]> wrote:

Hi all, I am new to TCL and I wonder if anyone anywhere
implemented a lyndon word algorithm in TCL?
https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string.
There are plenty of implementations for python like this:
https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.

Some words may of course not have a Lyndon word, if they happen to
be a juxtaposition of multiple copies of a shorter string.

Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or
unicode?) Or does it need to support any "s-sized" alphabet?

Will the strings be very long? (That would impose the need of some
optimisations)

For very small alphabets (compared to length of string), other
optimisations spring to mind.

I'll start with a quite unoptimized version:

proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf }
}
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}

Thanks for your reply and for taking your time with the code.

Strings will be smaller that 30000 characters, they are DNA so the alphabet is small.
I need only ASCII as this covers the IUPAC extended DNA code (16 symbols):

Symbol Mnemonic Translation
A A (adenine)
C C (cytosine)
G G (guanine)
T T (thymine)
U U (uracil)
R puRine A or G (purines)
Y pYrimidine C or T/U (pyrimidines)
M aMino group A or C
K Keto group G or T/U
S Strong interaction C or G
W Weak interaction A or T/U
H not G A, C or T/U
B not A C, G or T/U
V not T/U A, C or G
D not C A, G or T/U
N aNy A, C, G or T/U

Ill try it out and get back here with the result.

I had a stab at Booth's algorithm (https://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation).
I have no idea how it works...

I replaced the values for i and f[] with i+1 and f+1, since it seemed to simplify the code a bit.

Give it a try if Andreas' code is too slow.

proc lr {s} {
append S $s $s
set f [lrepeat [string length $S] 0]
set k 0
for {set j 1} {$j < [string length $S]} {incr j} {
set sj [string index $S $j]
set i [lindex $f [expr {$j-$k-1}]]
while {$i && $sj ne [string index $S $k+$i]} {
if {$sj < [string index $S $k+$i]} {
set k [expr {$j-$i}]
}
set i [lindex $f $i-1]
}
if {$sj ne [string index $S $k+$i]} {
if {$sj < [string index $S $k]} {
set k $j
}
lset f [expr {$j-$k}] 0
} else {
lset f [expr {$j-$k}] [expr {$i+1}]
}
}
string range $S $k [expr {$k+[string length $s]-1}]
}

-Ian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Centurion
  Tue Jul 28 22:54:59 2026
  from Berea, Ohio via Telnet
- Bob Worm
  Tue Jul 28 16:01:18 2026
  from Wales, Uk via Telnet
- Rixter
  Tue Jul 28 13:42:46 2026
  from Madison, Nc via Telnet
- Krenn
  Tue Jul 28 11:59:57 2026
  from Sydney, Nsw via Telnet
- Rixter
  Tue Jul 28 01:23:48 2026
  from Madison, Nc via Telnet
- Centurion
  Mon Jul 27 22:50:42 2026
  from Berea, Ohio via Telnet
- Ataricrypt
  Mon Jul 27 19:19:17 2026
  from England via Telnet
- Bob Worm
  Mon Jul 27 15:19:55 2026
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	741
Nodes:	16 (2 / 14)
Uptime:	52:17:03
Calls:	12,445
Calls today:	5
Files:	15,192
Messages:	6,537,264

Minimal string rotation

Who's Online

Recent Visitors

System Info