Hi all, I am new to TCL and I wonder if anyone anywhere
implemented a lyndon word algorithm in TCL? https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string.
There are plenty of implementations for python like this: https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.
Björn Johansson <[email protected]> wrote:
Hi all, I am new to TCL and I wonder if anyone anywhereSome words may of course not have a Lyndon word, if they happen to
implemented a lyndon word algorithm in TCL? https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string. There are plenty of implementations for python like this: https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.
be a juxtaposition of multiple copies of a shorter string.
Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or unicode?) Or does it need to support any "s-sized" alphabet?
Will the strings be very long? (That would impose the need of some optimisations)
For very small alphabets (compared to length of string), other
optimisations spring to mind.
I'll start with a quite unoptimized version:
proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf }
}
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}
On Tuesday, March 8, 2022 at 3:56:06 PM UTC, Andreas Leitgeb wrote:
Björn Johansson <[email protected]> wrote:
Hi all, I am new to TCL and I wonder if anyone anywhereSome words may of course not have a Lyndon word, if they happen to
implemented a lyndon word algorithm in TCL?
https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string.
There are plenty of implementations for python like this:
https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.
be a juxtaposition of multiple copies of a shorter string.
Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or
unicode?) Or does it need to support any "s-sized" alphabet?
Will the strings be very long? (That would impose the need of some
optimisations)
For very small alphabets (compared to length of string), other
optimisations spring to mind.
I'll start with a quite unoptimized version:
proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf }
}
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}
Thanks for your reply and for taking your time with the code.
Strings will be smaller that 30000 characters, they are DNA so the alphabet is small.
I need only ASCII as this covers the IUPAC extended DNA code (16 symbols):
Symbol Mnemonic Translation
A A (adenine)
C C (cytosine)
G G (guanine)
T T (thymine)
U U (uracil)
R puRine A or G (purines)
Y pYrimidine C or T/U (pyrimidines)
M aMino group A or C
K Keto group G or T/U
S Strong interaction C or G
W Weak interaction A or T/U
H not G A, C or T/U
B not A C, G or T/U
V not T/U A, C or G
D not C A, G or T/U
N aNy A, C, G or T/U
Ill try it out and get back here with the result.
Am 09.03.2022 um 18:09 schrieb Björn Johansson:
On Tuesday, March 8, 2022 at 3:56:06 PM UTC, Andreas Leitgeb wrote:
Björn Johansson <[email protected]> wrote:
Hi all, I am new to TCL and I wonder if anyone anywhereSome words may of course not have a Lyndon word, if they happen to
implemented a lyndon word algorithm in TCL?
https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string. >>> There are plenty of implementations for python like this:
https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.
be a juxtaposition of multiple copies of a shorter string.
Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or
unicode?) Or does it need to support any "s-sized" alphabet?
Will the strings be very long? (That would impose the need of some
optimisations)
For very small alphabets (compared to length of string), other
optimisations spring to mind.
I'll start with a quite unoptimized version:
proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf } >> }
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}
Thanks for your reply and for taking your time with the code.
Strings will be smaller that 30000 characters, they are DNA so the alphabet is small.
I need only ASCII as this covers the IUPAC extended DNA code (16 symbols):
Symbol Mnemonic Translation
A A (adenine)
C C (cytosine)
G G (guanine)
T T (thymine)
U U (uracil)
R puRine A or G (purines)
Y pYrimidine C or T/U (pyrimidines)
M aMino group A or C
K Keto group G or T/U
S Strong interaction C or G
W Weak interaction A or T/U
H not G A, C or T/U
B not A C, G or T/U
V not T/U A, C or G
D not C A, G or T/U
N aNy A, C, G or T/U
Ill try it out and get back here with the result.
Björn,
there is a considerable DNA TCL community, located in Strasbourg and Heidelberg. You may ask them for DNA optimized code also at this list.
Take care,
Harald
On Tuesday, March 8, 2022 at 3:56:06 PM UTC, Andreas Leitgeb wrote:
Björn Johansson <[email protected]> wrote:
Hi all, I am new to TCL and I wonder if anyone anywhereSome words may of course not have a Lyndon word, if they happen to
implemented a lyndon word algorithm in TCL?
https://en.wikipedia.org/wiki/Lyndon_word
I need this to find the minimal string rotation for a circular string.
There are plenty of implementations for python like this:
https://gist.github.com/dvberkel/1950267
I know it does not seem complicated, but I am not proficient in TCL
and I seek to motivate a TCL developer to add a feature to a piece
of software I depend on.
be a juxtaposition of multiple copies of a shorter string.
Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or
unicode?) Or does it need to support any "s-sized" alphabet?
Will the strings be very long? (That would impose the need of some
optimisations)
For very small alphabets (compared to length of string), other
optimisations spring to mind.
I'll start with a quite unoptimized version:
proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf }
}
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}
Thanks for your reply and for taking your time with the code.
Strings will be smaller that 30000 characters, they are DNA so the alphabet is small.
I need only ASCII as this covers the IUPAC extended DNA code (16 symbols):
Symbol Mnemonic Translation
A A (adenine)
C C (cytosine)
G G (guanine)
T T (thymine)
U U (uracil)
R puRine A or G (purines)
Y pYrimidine C or T/U (pyrimidines)
M aMino group A or C
K Keto group G or T/U
S Strong interaction C or G
W Weak interaction A or T/U
H not G A, C or T/U
B not A C, G or T/U
V not T/U A, C or G
D not C A, G or T/U
N aNy A, C, G or T/U
Ill try it out and get back here with the result.
| Sysop: | Keyop |
|---|---|
| Location: | Huddersfield, West Yorkshire, UK |
| Users: | 715 |
| Nodes: | 16 (2 / 14) |
| Uptime: | 159:33:55 |
| Calls: | 12,094 |
| Calls today: | 2 |
| Files: | 15,000 |
| Messages: | 6,517,760 |