Minimal string rotation

Björn Johansson

unread,

Mar 8, 2022, 5:39:08 AM3/8/22

to

Hi all, I am new to TCL and I wonder if anyone anywhere
implemented a lyndon word algorithm in TCL?

https://en.wikipedia.org/wiki/Lyndon_word

I need this to find the minimal string rotation for a circular string.
There are plenty of implementations for python like this:

https://gist.github.com/dvberkel/1950267

I know it does not seem complicated, but I am not proficient in TCL and I seek to motivate a TCL developer to add a feature to a piece of software I depend on.

Grateful for answers!
/bjorn

Andreas Leitgeb

unread,

Mar 8, 2022, 10:56:06 AM3/8/22

to

Some words may of course not have a Lyndon word, if they happen to
be a juxtaposition of multiple copies of a shorter string.

Can we assume a usual alphabet? (e.g. just latin chars, iso-latin-* or
unicode?) Or does it need to support any "s-sized" alphabet?

Will the strings be very long? (That would impose the need of some
optimisations)

For very small alphabets (compared to length of string), other
optimisations spring to mind.

I'll start with a quite unoptimized version:

proc findLyndon {s} {
set bsf $s; set csf 1; # best-so-far, count-so-far
set len [string length $s]; # length of string
append s $s; # duplicate string - makes rotations easier
for {set i 1} {$i < $len} {incr i} {
# each rotation is a substring of the duplicated string:
set c [string range $s $i [expr {$i+$len-1}]]
set cmp [string compare $c $bsf]
if {$cmp < 0} { set bsf $c; set csf 1 } elseif {$cmp == 0} { incr csf }
}
# if result is unique, return best-so-far, else the empty string:
if {$csf == 1} { return $bsf } else { return "" }
}

Björn Johansson

unread,

Mar 9, 2022, 12:09:21 PM3/9/22

to

Thanks for your reply and for taking your time with the code.

Strings will be smaller that 30000 characters, they are DNA so the alphabet is small.
I need only ASCII as this covers the IUPAC extended DNA code (16 symbols):

Symbol Mnemonic Translation
A A (adenine)
C C (cytosine)
G G (guanine)
T T (thymine)
U U (uracil)
R puRine A or G (purines)
Y pYrimidine C or T/U (pyrimidines)
M aMino group A or C
K Keto group G or T/U
S Strong interaction C or G
W Weak interaction A or T/U
H not G A, C or T/U
B not A C, G or T/U
V not T/U A, C or G
D not C A, G or T/U
N aNy A, C, G or T/U

Ill try it out and get back here with the result.

Harald Oehlmann

unread,

Mar 9, 2022, 12:20:01 PM3/9/22

to

Björn,

there is a considerable DNA TCL community, located in Strasbourg and
Heidelberg. You may ask them for DNA optimized code also at this list.

Take care,
Harald

Björn Johansson

unread,

Mar 9, 2022, 12:21:56 PM3/9/22

to

Thanks! Very useful!

Ian Braithwaite

unread,

Mar 10, 2022, 7:25:08 AM3/10/22

to

I had a stab at Booth's algorithm
(https://en.wikipedia.org/wiki/Lexicographically_minimal_string_rotation).
I have no idea how it works...

I replaced the values for i and f[] with i+1 and f+1, since it seemed to
simplify the code a bit.

Give it a try if Andreas' code is too slow.

proc lr {s} {
append S $s $s
set f [lrepeat [string length $S] 0]
set k 0
for {set j 1} {$j < [string length $S]} {incr j} {
set sj [string index $S $j]
set i [lindex $f [expr {$j-$k-1}]]
while {$i && $sj ne [string index $S $k+$i]} {
if {$sj < [string index $S $k+$i]} {
set k [expr {$j-$i}]
}
set i [lindex $f $i-1]
}
if {$sj ne [string index $S $k+$i]} {
if {$sj < [string index $S $k]} {
set k $j
}
lset f [expr {$j-$k}] 0
} else {
lset f [expr {$j-$k}] [expr {$i+1}]
}
}
string range $S $k [expr {$k+[string length $s]-1}]
}

-Ian