more than 50 000 tags in widget text ....

mou...@igbmc.u-strasbg.fr

unread,

Sep 21, 2005, 4:34:08 AM9/21/05

to

Hello world !

i'm a researcher in bioinformatics, and trying to make an application
in tk in order to see alignment of protein sequences.
Typically, i have to display in a text widget 100 to 1000 strings (same
length) of 1000-5000 characters. Each string (sequence) is made of 20
different characters (amino acids), plus the "." sign. Every of the 20
different characters have to be tagged individually, so there are
something like 50 000 tags at least in the widget.

When I try to initialize the widget, following previous remarks found
in this newsgroup, i append every character + (tag or not) in a
variable that I insert then in the widget text. This takes quite a lot
of time !!!
I just try an other solution, which is to tag only the visible
characters. This gives not a "smooth" scroll, and if you scroll too
quickly, the display is in a hurry ....

Does someone have a hint ?

Many thanks in advance !!

(I can send my script test if needed)

-------------------------------
luc Moulinier,
Laboratoire de Bioinformatique et Genomique Integratives
IGBMC
Illkirch, 67404
France

Arjen Markus

unread,

Sep 21, 2005, 4:40:20 AM9/21/05

to

Have you considered using the canvas widget instead? I have seen
presentations on applications in similar fields where the canvas widget
was used.

http://wiki.tcl.tk/3043 is the port to the details. The presentation I
was thinking of was by Bastien Chevreux.

Regards,

Arjen

davidh...@simplifiedlogic.com

unread,

Sep 21, 2005, 8:22:02 AM9/21/05

to

You might want too look at this as a quick fix. -- canvas might be
better in the long run.

http://mini.net/tcl/4134

mou...@igbmc.u-strasbg.fr

unread,

Sep 21, 2005, 8:44:18 AM9/21/05

to

I forgot to mention that i must be able to edit the text .... A canvas
solution becomes then very difficult ...

Luc Moulinier

suchenwi

unread,

Sep 21, 2005, 10:12:03 AM9/21/05

to

You could do it somehow like Excel:
display the whole picture compactly in a canvas;
display the selected line in an entry widget for editing;
update the whole picture on <Return> in the entry.

bs

unread,

Sep 21, 2005, 11:01:18 AM9/21/05

to

mou...@igbmc.u-strasbg.fr wrote:
> Hello world !
>
> i'm a researcher in bioinformatics, and trying to make an application
> in tk in order to see alignment of protein sequences.
> Typically, i have to display in a text widget 100 to 1000 strings (same
> length) of 1000-5000 characters. Each string (sequence) is made of 20
> different characters (amino acids), plus the "." sign. Every of the 20
> different characters have to be tagged individually, so there are
> something like 50 000 tags at least in the widget.
>

each character, or each string needs to be tagged differently? And why?
How are you using the tags?

You might try tktable...it might or might not be better, but something
to try...

--brett

Donal K. Fellows

unread,

Sep 21, 2005, 11:07:03 AM9/21/05

to

mou...@igbmc.u-strasbg.fr wrote:
> i'm a researcher in bioinformatics, and trying to make an application
> in tk in order to see alignment of protein sequences.
> Typically, i have to display in a text widget 100 to 1000 strings (same
> length) of 1000-5000 characters. Each string (sequence) is made of 20
> different characters (amino acids), plus the "." sign. Every of the 20
> different characters have to be tagged individually, so there are
> something like 50 000 tags at least in the widget.

That's an awful lot of tags, and "lots of tags" is the case that the
text widget isn't heavily optimized for.

> When I try to initialize the widget, following previous remarks found
> in this newsgroup, i append every character + (tag or not) in a
> variable that I insert then in the widget text. This takes quite a lot
> of time !!!
> I just try an other solution, which is to tag only the visible
> characters. This gives not a "smooth" scroll, and if you scroll too
> quickly, the display is in a hurry ....

Err, why is it necessary to tag every character differently? Wouldn't it
be easier to only tag things according to what they look like and
process mouse/keyboard activity in the widget directly? That'd certainly
be faster in this case...

Donal.

mou...@igbmc.u-strasbg.fr

unread,

Sep 21, 2005, 11:52:48 AM9/21/05

to

There are only 20 letters. All "A" should be foreground white and
background pink, "D" fg white, bg green, "A" fg black , bg orange, etc
....I defined 20 tags, and i applyed them for each letter
The code looks like that :

set Lc [split [$wt get 1.0 end] ""]
$w delete 1.0 end

foreach c $Lc {
if {$c == "\n" || $c == "."} {
set t {}
} else {
set t "Tag$c"
}
lappend Ldata $c $t
}

eval $w insert end $Ldata

Is there an other way ?
I don't understand "tag things like what they look like" ....?

And , can someone tell me where to download the ctext widget ? The
tklib is not with the tcllib anymore.... I found an tklib archive on
the net dated of feb 2003 ....

Many thanks again !

Luc Moulinier

Torsten Reincke

unread,

Sep 22, 2005, 4:31:49 AM9/22/05

to

> There are only 20 letters. All "A" should be foreground white and
> background pink, "D" fg white, bg green, "A" fg black , bg orange, etc
> ....I defined 20 tags, and i applyed them for each letter
> The code looks like that :
>
> set Lc [split [$wt get 1.0 end] ""]
> $w delete 1.0 end
>
> foreach c $Lc {
> if {$c == "\n" || $c == "."} {
> set t {}
> } else {
> set t "Tag$c"
> }
> lappend Ldata $c $t
> }
>
> eval $w insert end $Ldata
>
>
> Is there an other way ?

There is no real other way, but you could avoid using eval here and try
not to build a quite huge list first before you display the tagged
data, speeding things up a bit. First insert your data into the text
widget without using the tags and then loop through the data tagging
them afterwards line per line (since your data is organized in lines):

set lineNumber 1
for {set i 0} {$i < [string length $myText]} {incr i} {
set c [string index $myText $i]
$w tag add Tag$c $lineNumber.$i
}

This is quicker, since inserting text into the text widget is not the
problem. Try your code above without the eval (meaning: without tags),
it will be much quicker. But lastly, 50,000 characters are a lot and
the tagging is the slow part. I don't think ctext will help much. It's
just a wrapper around the text widget using a new layer to implement
syntax highlightning. So the fundamental problem is still there. If you
want to give ctext a try, you need to get it from the cvs on
sourceforge (these are the unix commands needed):

cvs -d:pserver:anon...@cvs.sourceforge.net:/cvsroot/tcllib login
cvs -z3 -d:pserver:anon...@cvs.sourceforge.net:/cvsroot/tcllib co -P
tklib

I never understood why there is no current tklib in the "File releases"
section.

> I don't understand "tag things like what they look like" ....?

I think, this is what you do. You tag the characters using a different
tag for each letter (not for each character). But this only makes 20
different tags, not 50,000 tags, as the subject line makes us think.

Hmm, perhaps you could insert tiny coloured letter bitmaps instead of
tagged letters. Would that speed up things??

Torsten

Arjen Markus

unread,

Sep 22, 2005, 5:05:05 AM9/22/05

to

That does not solve his problem with actually editing the contents, but
it may be worthwhile to combine this with an overlaid widget to edit a
single line ....

Regards,

Arjen

Torsten Reincke

unread,

Sep 23, 2005, 3:26:38 AM9/23/05

to

Another idea ...

Why not "virtualize" the tagging? You could just insert the aprotein
sequences as plains letters wihout tagging information first. This is
fast enough.

Then, start by only tagging the portion of the text widget that is
visible to the user (or perhaps one more page). This should be
reasonably fast. When the user scrolls through the text, a binding on
Button1-Release will then trigger the tagging of the new visible part.

The hard here would be to find the "visible part" of the text. The
upcoming Tk8.5 makes that much easier. And of course, if you start
scrolling all over to right end of the widget, would probably make the
GUI very busy ...

Just a thought,

Torsten

Bryan Oakley

unread,

Sep 23, 2005, 9:30:09 AM9/23/05

to

Torsten Reincke wrote:
> Another idea ...
>
> Why not "virtualize" the tagging? You could just insert the aprotein
> sequences as plains letters wihout tagging information first. This is
> fast enough.
>
> Then, start by only tagging the portion of the text widget that is
> visible to the user (or perhaps one more page). This should be
> reasonably fast. When the user scrolls through the text, a binding on
> Button1-Release will then trigger the tagging of the new visible part.
>

Personally, I'd create a loop that does the tagging in the background a
screenful of lines (or columns) at a time. Most likely the loop would
always be ahead of any scrolling the user might do, but if not, the
tagging code could do sanity checks to make sure to tag the currently
visible region as soon as possible.

Roughly, like this (though with better error checking and such):

proc tag_a_chunk {index} {
# set magicnum with whatever can be done in a few tens of
# milliseconds so that the UI stays responsive.
for {set i 0} {$i < $magicnum} {incr i} {
<code to add tag to "index+$i chars"
}
# if index is before the visible part of the screen,
# tag the visible portion of the screen, too
if {[.textwidget compare $index < @0,0]} {
<code to add tags to text between @0,0 and @width,height>
}
if {[.text widget compare "$index+1c" < end]} {
after 10 [list tag_a_chunk [.textwidget index "$index + 1 c"]
}
}

> The hard here would be to find the "visible part" of the text. The
> upcoming Tk8.5 makes that much easier. And of course, if you start
> scrolling all over to right end of the widget, would probably make the
> GUI very busy ...

Finding the visible part is easy; it's just index @0,0 and index
@$width,$height.

Eckhard Lehmann

unread,

Sep 25, 2005, 6:40:19 AM9/25/05

to

mou...@igbmc.u-strasbg.fr wrote:
> i'm a researcher in bioinformatics, and trying to make an application
> in tk in order to see alignment of protein sequences.
> Typically, i have to display in a text widget 100 to 1000 strings (same
> length) of 1000-5000 characters. Each string (sequence) is made of 20
> different characters (amino acids), plus the "." sign. Every of the 20
> different characters have to be tagged individually, so there are
> something like 50 000 tags at least in the widget.

I am also in bioinformatics - but, sorry, I can not imagine the reason
to make alignments visible as text and editable to the user. Well, you
will have your reason to do this - but in case you don't know:
alignments *are* already displayed colorized and in text form with
clustalx, which is available across platforms and for free.

IMHO, it is much more convenient for the users to provide the alignments
as simple lines or bars on a canvas, together with a mouseover or
Button-1 binding, that displays statistics and the sequence (which could
be editable, of course) in another widget - text or canvas.
You could code the alignment quality as colors - e.g. red for very good
and more dark colors for worse alignment statistics. Then they get a
quick overview about the alignment and it's quality, and if they are
interrested in more details, they can go over the bars or click on them
and get/edit the details.

Displaying a huge amount of AA sequence text is not what most users want
(at least in my experience). There is too much information, the eye just
can't catch this.

I developed a canvas based widget some time ago to visualize BLAST
results, for the application of vector screening. We wanted to know, to
which extend a puplic available EST library (you can get it from NCBI)
was vector contaminated. There were around 8000 EST's and you can
determine the vector contamination with BLAST and very restrictive
parameters, as described at NCBI
(http://www.ncbi.nlm.nih.gov/VecScreen/VecScreen.html).
I run a standalone BLAST against the UniVec database and wrote a
parser/visualizer in Itcl/Itk for the results. It was a very convenient
way to have a look at the BLAST results, especially in conjuction with a
programmatically grouping, we got a quick overview about the vector
contamination of each of the EST's.

That is what I suggest ;-). If you are interrested, I can dig out the
scripts and send them to you. Mail me:

Eckhard

Eckhard Lehmann

unread,

Sep 25, 2005, 8:58:38 AM9/25/05

to

With protein alignments, the purpose is usualy to determine conserved
domains within protein families or single proteins, isn't it? This could
be well visualized with bar alignments, after the domains were
determined programmatically. The domains could be visualized in parallel
to the alignment and for each protein a score could be computed, that
shows how well the protein fits the domains. An interresting
challenge... ;-).

Eckhard

Jeff Hobbs

unread,

Sep 25, 2005, 12:55:33 PM9/25/05

to mou...@igbmc.u-strasbg.fr

mou...@igbmc.u-strasbg.fr wrote:
> There are only 20 letters. All "A" should be foreground white and
> background pink, "D" fg white, bg green, "A" fg black , bg orange, etc
> ....I defined 20 tags, and i applyed them for each letter
> The code looks like that :

> set Lc [split [$wt get 1.0 end] ""]
> $w delete 1.0 end
>
> foreach c $Lc {
> if {$c == "\n" || $c == "."} {
> set t {}
> } else {
> set t "Tag$c"
> }
> lappend Ldata $c $t
> }
>
> eval $w insert end $Ldata

I hope that the above is actually done inside a 'proc'
statement, as that would be much faster.

> Is there an other way ?
> I don't understand "tag things like what they look like" ....?
>
> And , can someone tell me where to download the ctext widget ? The
> tklib is not with the tcllib anymore.... I found an tklib archive on
> the net dated of feb 2003 ....

You don't want ctext for this, as it is intended for building
up a text editor, and won't be any faster. If anything, you
need a more customized widget. You might try tktable, as
someone else mentioned. It is part of ActiveTcl. Then you
would have the 20 letters, so 20 tags in all. Fill each
letter one per table cell with appropriate tag, make all the
cells one 1 char width wide.

The table widget is designed to hold millions of cells
efficiently, but not necessarily all tagged. If you go this
route, I'd be interested to hear your performance results.

--
Jeff Hobbs, The Tcl Guy
http://www.ActiveState.com/, a division of Sophos

mou...@igbmc.u-strasbg.fr

unread,

Sep 27, 2005, 9:45:18 AM9/27/05

to

Hello everybody !

First of all, let me thank all of you for you
replies/comments/suggestion ! I've learned a lot !

This post to reply to some posts and giving you some news.

First, we do need such a tool to edit/visualize multiple sequence
alignments as the output of alignment programs is not of good quality
enough. Even if big progress have been done in the accuracy of such
program (muscle, mafft, the so slow t-coffee), some errors remains.
Alignment quality, estimated through objective functions like NorMD,
clearly show sometimes a deep uncorrect alignment. As some studies in
structural biology, phylogenetic footprinting for example, need
alignment of high quality, we must be able to edit them and correct
them by hand. We usually used Seqlab, part of the GCG package. As GCG
is now commercial, we must change to something else.

So we develop a new program, Ordali, to view/edit alignments and all
related informations, like 3D structure, pfam domains, conserved blacks
etc .... in one go. My post is aimed to improve Ordali. Ordali is 30
000 lines code, and is part of our genomic/transcriptomic/comparative
genomic data analysis platform, Gscope (350 000 lines of code ....) all
in tcl/tk.

As we must edit the alignment in the context of all the sequences, we
can't do e "line editing" mode after button-1 cliking on a canvas
widget. It will not be practible at all !! When we edit an alignemtn,
we are moving lots of things correcting, the "widget text editing like
mode" seems compulsory.

I tried the "vitualisation" just before Bryan post. You may find the
little test code I wrote (it's test code !!!) and a real sample. It
works more or less OKish. The funny thing is that there is no delay in
scrolling in the y dimension, and a real delay in the x dimension ...
you can download at :
ftp://ftp-igbmc.u-strasbg.fr/pub/moumou/ttext.tar.gz
use it by ttext.tcl ali2.tfa

Jeff I will try the tktable trick. If it really done to handle millions
of cells, it maybe worth the pain. They wil be a good coding fun for
editing, but ...

If you have some time to have a look on my code, any comments are again
very welcome !!

luc

Torsten Reincke

unread,

Sep 27, 2005, 11:16:15 AM9/27/05

to

> etc .... in one go. My post is aimed to improve Ordali. Ordali is

> 30000 lines code, and is part of our genomic/transcriptomic/

> comparative
> genomic data analysis platform, Gscope (350 000 lines of code ....)
> all in tcl/tk.

Wow. This is quite a lot of code!

> I tried the "vitualisation" just before Bryan post.

> If you have some time to have a look on my code, any comments
> are again very welcome !!

Your code to determine the visible part of the text widget is far too
complicated. Bryan's tip on how to determine this region makes the
tagging quite easy. Thanks, Bryan! So if you just take this procedure:

proc TagRegion {win} {
# make scrolled text show up before we start tagging:
update idletasks
# text index a upper left corner of text widget:
foreach {la ca} [split [$win index @0,0] .] {break}
# text index at loer right corner of text widget:
set w [winfo width $win]
set h [winfo height $win]
foreach {lb cb} [split [$win index @$h,$w] .] {break}
# tag all letters that fall into this rectangular region:
for {set i $la} {$i <= $lb} {incr i} {
for {set j $ca} {$j <= $cb+1} {incr j} {
set c [$win get $i.$j]
# only tag region, when there is no tag yet:
if {[llength [$win tag names $i.$j]]==0} {
$win tag add "Tag$c" $i.$j
}
}
}
}

and replace the lines

scrollbar .f3.sy -command "LuckyY $wt"
scrollbar .f3.sx -command "LuckyX $wt" -orient horizontal

in your proc InitWindow by these lines:

scrollbar .f3.sy -command "$wt yview"
scrollbar .f3.sx -command "$wt xview" -orient horizontal
bind $wx <ButtonRelease-1> [list TagRegion $wt]
bind $wy <ButtonRelease-1> [list TagRegion $wt]
bind $wt <Configure> [list TagRegion $wt]

then tagging will be quite smooth. But, of course, there is a little
delay when scrolling. Perhaps it would be a good idea to let this
tagging continue in the background as Bryan suggested ...

Your code often uses 'update'. Try to avoid this, make the thing
smoother by using 'after idle' instead. This also applies to my proc
above.

Good luck, Torsten

Jeff Hobbs

unread,

Sep 27, 2005, 11:41:22 AM9/27/05

to mou...@igbmc.u-strasbg.fr

mou...@igbmc.u-strasbg.fr wrote:
> Jeff I will try the tktable trick. If it really done to handle millions
> of cells, it maybe worth the pain. They wil be a good coding fun for
> editing, but ...

The tktable is editable ... of course on a cell-by-cell basis.
You may be able to manage the transition from cell to cell in
an efficient manner for the user (overriding the default table
bindings).

Bryan Oakley

unread,

Sep 27, 2005, 11:44:21 AM9/27/05

to

Torsten Reincke wrote:
> tagging quite easy. Thanks, Bryan! So if you just take this procedure:
>
> proc TagRegion {win} {

> ...

> }
>
> and replace the lines
>
> scrollbar .f3.sy -command "LuckyY $wt"
> scrollbar .f3.sx -command "LuckyX $wt" -orient horizontal
>
> in your proc InitWindow by these lines:
>
> scrollbar .f3.sy -command "$wt yview"
> scrollbar .f3.sx -command "$wt xview" -orient horizontal
> bind $wx <ButtonRelease-1> [list TagRegion $wt]
> bind $wy <ButtonRelease-1> [list TagRegion $wt]
> bind $wt <Configure> [list TagRegion $wt]
>

binding on button events is rather imprecise. What if the user scrolls
with a pagedown or down arrow or some other binding?

I'd argue it's better to do something like this (sorry, I don't have the
original code handy...):

text .text -yscrollcommand [list customScrollCommand .text .f3.sy]
proc customScrollCommand {text scrollbar args} {
# do the scroll command; note that args will be appended by
# tk when calling this proc
set result [eval $scrollbar set $args]

# tag the new visiable area
TagRegion $text

return $result
}

Does that make sense? There's no rule that says a -yscrollcommand (or
-xscrollcommand) must directly call a scrollbar's set command. It can
call a proc that calls the scrollbar command and then does whatever it
needs to do when the view changes. This becomes impervious to how the
widget is scrolled.

Torsten Reincke

unread,

Sep 27, 2005, 4:14:51 PM9/27/05

to

> binding on button events is rather imprecise. What if the user scrolls
> with a pagedown or down arrow or some other binding?
>
> I'd argue it's better to do something like this (sorry, I don't have the
> original code handy...):
>
> text .text -yscrollcommand [list customScrollCommand .text .f3.sy]
> proc customScrollCommand {text scrollbar args} {
> # do the scroll command; note that args will be appended by
> # tk when calling this proc
> set result [eval $scrollbar set $args]
>
> # tag the new visiable area
> TagRegion $text
>
> return $result
> }
>
> Does that make sense?

Sure! I didn't think of the user doing scrolling with the keyboard, so
you are of course so right. Thanks for your input, Bryan. But the
<Configure> binding is still needed.

Torsten