sort array by values

Offer Kaye

unread,

May 25, 2008, 3:02:17 AM5/25/08

to

Hi,
I have a problem requiring me to sort an array by it's values (instead
of it's keys). I tried the following but it didn't work:

foreach {k v} [lsort -index 1 [array get array_name]] {
puts "key=$k value=$v"
}

In Perl I can easily do for example a regular ascii (actually
alphanumeric) sort based on values:

foreach my $key (sort {$hash{$a} cmp $hash{$b}} keys %hash) {
print "key = $key , val = $hash{$key}\n"
}

This is possible because I can, in Perl, control what the sort
function sorts by. Is this not possible in Tcl??

Thanks,
Offer

bill...@alum.mit.edu

unread,

May 25, 2008, 3:26:16 AM5/25/08

to

You can't sort an array: arrays are by definition unordered. You can,
however, sort the indices of an array according to their values. The
key is to use the -command option to lsort, which allows you to
specify the comparison function to be used. This should take two
arguments and return -1, 0, or 1. You want a proc that compares the
values of the array elements, something like this:

proc avcmp {a b} {
return [cmp $::MyArray($a) $::MyArray($b)]
}

Here we pass the array indices and assume that we know what array is
to be used and that it is global. You would use it like this:

set SortedIndexList [lsort -command avcmp [array names MyArray]]

Of course, you may want a more general proc to which you also pass the
name of the array. This is a bit of a pain to do so if you're using
Tcl 8.5 you might be better of with a dict than an array.

Depending on what you are up to and whether the mapping from indices
to values is an injection, you could also go about this by inverting
the array, then sorting on the indices of the inverted array.

bill...@alum.mit.edu

unread,

May 25, 2008, 3:30:54 AM5/25/08

to

Oops, I neglected to say above that "cmp" is a placeholder for
whatever the appropriate comparison function is.

Aric Bills

unread,

May 25, 2008, 3:42:19 AM5/25/08

to

On May 24, 11:02 pm, Offer Kaye <offer.k...@gmail.com> wrote:
> Hi,
> I have a problem requiring me to sort an array by it's values (instead
> of it's keys). I tried the following but it didn't work:
>
> foreach {k v} [lsort -index 1 [array get array_name]] {
> puts "key=$k value=$v"
>
> }

Can you elaborate on what didn't work? Are you looking for:

foreach {k v} [lsort -index 1 -dictionary [array get array_name]]
{
puts "key=$k value=$v"
}

?

Offer Kaye

unread,

May 25, 2008, 4:11:51 AM5/25/08

to

On May 25, 10:26 am, billpo...@alum.mit.edu wrote:
>
> proc avcmp {a b} {
> return [cmp $::MyArray($a) $::MyArray($b)]
> }
>
> Here we pass the array indices and assume that we know what array is
> to be used and that it is global. You would use it like this:
>
> set SortedIndexList [lsort -command avcmp [array names MyArray]]
>
> Of course, you may want a more general proc to which you also pass the
> name of the array. This is a bit of a pain to do so if you're using
> Tcl 8.5 you might be better of with a dict than an array.
>

Hi Bill,
Thanks for the quick and helpful answer!
The above construct looks similar enough to Perl so that I'm now on
familiar ground :)

However I still have a couple of questions:
1. To do a alphanumeric sort I used in the avcmp the function "return
[string compare $::aaa($a) $:aaa($b)]". What would I used for a
numeric sort?
2. While the assumption that I know the name of the hash is good
enough for my current purposes I'd like to know for future reference,
how would I pass the array name to the avcmp proc? I'm afraid I don't
have access to 8.5 here so I'm stuck with array.

General question - is there a "Tcl sort tutorial" anywhere that covers
my questions and related ones?

Thanks again,
Offer

Fredderic

unread,

May 25, 2008, 4:13:31 AM5/25/08

to

On Sun, 25 May 2008 00:02:17 -0700 (PDT),
Offer Kaye <offer...@gmail.com> wrote:

> I have a problem requiring me to sort an array by it's values (instead
> of it's keys). I tried the following but it didn't work:
>
> foreach {k v} [lsort -index 1 [array get array_name]] {
> puts "key=$k value=$v"
> }

Yeah... This is something that's bugged me for ages...

You can sort and search based on a specific sub-item of a
list-of-lists, but you can't tell the sort and search functions to
consider every group of n items to be a sub-list. And yet, a flat list
of n-item groups is the native output format of several TCL built-in
commands. It's weird. *shrugs*

One nice generic solution I use is something like this: (not tested)

proc lgroup {n list} {
set out [list]; set ofs [expr {$n-1}]
for {set idx 0} {$idx < [llength $list]} {incr idx $n} {
lappend out [lrange $list $idx $idx+$ofs]
}
return $out
}

You can also play some tricks in there to speed it up, or build a more
restricted function along these lines:

proc lgroup2 {list} {
set out [list]
foreach {a b} {
lappend out [list $a $b]
}
return $out
}

Which is liable to be slightly faster. But in any case, from there you
can sort on any index you want. ie.

set l [lgroup 2 [array get array_name]]
foreach i [lsort -index 1 $l] }
lassign $i k v
puts "key=$k value=$v"
}

It's still kinda dodgy, really... Having [array] return a more useful
format, or [lsearch/lsort] able to understand that native format, would
be better.

Still another trick that's useful in some cases, is to invert the
array entirely:

foreach {k v} [array get array_name] {
lappend temp_array($v) $k
}
foreach v [lsort [array names temp_array]] {
foreach k $temp_array($v) {
puts "key=$k value=$v"
}
}

This works because TCL treats the key essentially as a binary blob, so
whatever can be in the value, can be in the key also. This method is
also reasonably good when you want to sort on values first, and then
keys.

There's other ways around it, but those are nice and straight forward.

Fredderic

Offer Kaye

unread,

May 25, 2008, 4:17:06 AM5/25/08

to

On May 25, 10:42 am, Aric Bills <aric.bi...@gmail.com> wrote:
>
> Can you elaborate on what didn't work? Are you looking for:
>
> foreach {k v} [lsort -index 1 -dictionary [array get array_name]]
> {
> puts "key=$k value=$v"
> }
>
> ?

Hi Aric,
The above code returns an error. Specifically, I tried:

array set aaa "
foo 1
bar 3
zooz 2
"
foreach {k v} [lsort -index 1 [array get aaa]] {
puts "key=$k value=$v"
}

And I got the error message:

element 1 missing from sublist "foo"
while executing
"lsort -index 1 [array get aaa]"
invoked from within
"foreach {k v} [lsort -index 1 [array get aaa]] {
puts "key=$k value=$v"
}
"

I guess the correct method to use is the one posted by Bill, i.e. use
a custom procedure to sort based on the values (or whatever is
required). It would have been nice if this could be done with an
anonymous proc, similar to the way Perl allows. But this is mostly
syntactic sugar and not really critical.

Regards,
Offer

Aric Bills

unread,

May 25, 2008, 5:04:23 AM5/25/08

to

Silly me. The result of [array get] is a flat list, not a list of
lists.

Bill's suggestion to use [lsort -command] probably comes as close as
possible to what you are looking for; note however that the -command
option is really slow, and not often necessary. Converting the array
into a nested list, as Fredderic has done, is more or less the
standard Tclsh way to deal with the problem of sorting arrays by
value; if you need to write a custom sorting command, converting the
array into a list of lists will allow you to write a more robust
command because you won't have to hard-code the array name into the
sort command or rely on an array that lives in a global variable.

It occurred to me that if you need to deal with array contents sorted
by value on a widespread basis, you could write your own control
structure for that purpose. While it's not the minimal solution you
were hoping for (in the sense that you have to write a proc to get the
functionality you want), it does provide some nice sugar once the
control proc is written:

package require Tcl 8.5

proc withSortedArray {arrayVar keyVar valueVar args} {

# All but the last element in $args will be
# passed to [lsort] as sort options;
# please don't specify -index in $args !
# The last item in $args is the code to execute
# for each element in the array.

if {[llength $args] < 1} {
error "wrong # args: should be arrayVar keyvar valueVar ?
sortOptions? body"
}
upvar $arrayVar array $keyVar key $valueVar value
foreach {k v} [array get array] {
lappend arrayvalues [list $k $v]
}
foreach item [lsort -index 1 {*}[lrange $args 0 end-1]
$arrayvalues] {
lassign $item key value
uplevel [lindex $args end]
}
}

array set aaa "
foo 1
bar 3
zooz 2
"

withSortedArray aaa key value -dictionary {
puts "key=$key value=$value"
}

Neil Madden

unread,

May 25, 2008, 7:59:57 AM5/25/08

to

Offer Kaye wrote:
> On May 25, 10:26 am, billpo...@alum.mit.edu wrote:
>> proc avcmp {a b} {
>> return [cmp $::MyArray($a) $::MyArray($b)]
>> }
>>
>> Here we pass the array indices and assume that we know what array is
>> to be used and that it is global. You would use it like this:
>>
>> set SortedIndexList [lsort -command avcmp [array names MyArray]]
>>
>> Of course, you may want a more general proc to which you also pass the
>> name of the array. This is a bit of a pain to do so if you're using
>> Tcl 8.5 you might be better of with a dict than an array.
>>
>
> Hi Bill,
> Thanks for the quick and helpful answer!
> The above construct looks similar enough to Perl so that I'm now on
> familiar ground :)
>
> However I still have a couple of questions:
> 1. To do a alphanumeric sort I used in the avcmp the function "return
> [string compare $::aaa($a) $:aaa($b)]". What would I used for a
> numeric sort?

return [expr {$::aaa($a) - $::aaa($b)}]

> 2. While the assumption that I know the name of the hash is good
> enough for my current purposes I'd like to know for future reference,
> how would I pass the array name to the avcmp proc? I'm afraid I don't
> have access to 8.5 here so I'm stuck with array.

Dict is available for 8.4 as an extension.

Use [upvar]:
proc avcmp {arName a b} {
upvar 1 $arName array
return [expr {$array($a) - $array($b)}]
}
lsort -command [list avcmp MyArray] [array names MyArray]

(Change the "1" to "#0" if the array is global).

In another post you asked about anonymous procedures. These are built-in
in Tcl 8.5, and can be simulated in earlier Tcl's (see
http://wiki.tcl.tk/lambda):

proc lambda {params body} { list ::apply [list $params $body] }
lsort -command [lambda {a b} {...}] $mylist

Generally, though, -command is a very slow option currently (even if the
-command is implemented in C). Pre-processing the array/dict into a
format suitable for [lsort] is faster in most cases:

proc zip2 xs {
set ys [list]
foreach {a b} $xs { lappend ys [list $a $b] }
return $ys
}
proc sortArray arName {
upvar 1 $arName arr
join [lsort -index 1 -integer [zip2 [array get arr]]]
}
foreach {k v} [sortArray MyArray] {
puts [format "%20s = %-s" $k $v]
}

>
> General question - is there a "Tcl sort tutorial" anywhere that covers
> my questions and related ones?

There isn't a tutorial specific to sorting in Tcl, that I know of, other
than the [lsort] manpage.

-- Neil

Donal K. Fellows

unread,

May 25, 2008, 8:03:03 AM5/25/08

to

Offer Kaye wrote:
> However I still have a couple of questions:
> 1. To do a alphanumeric sort I used in the avcmp the function "return
> [string compare $::aaa($a) $:aaa($b)]". What would I used for a
> numeric sort?

Something along the lines:
expr {$a<$b ? round(floor($a-$b)) : round(ceil($a-$b))}
(You can simplify to [expr {$a-$b}] if you're sorting integers; I was
just simulating round-away-from-zero. And you'll need to do extra
stuff to deal with the array; I assume you don't need your hand
holding for those bits.)

> 2. While the assumption that I know the name of the hash is good
> enough for my current purposes I'd like to know for future reference,
> how would I pass the array name to the avcmp proc? I'm afraid I don't
> have access to 8.5 here so I'm stuck with array.

The trick here is that the -command option to [lsort] takes a command
prefix, and not just a command name. This means you can do this:

proc avsort {arrayName a b} {
upvar 1 $arrayName ary
# Now compare $ary($a) and $ary($b)
}
lsort -command [list avsort MyArray] [array names MyArray]

Blend and season to taste. :-)

> General question - is there a "Tcl sort tutorial" anywhere that covers
> my questions and related ones?

That's a bit of an open-ended question; how related is "related"? :-)
The Tcl tutorial (http://www.tcl.tk/man/tcl8.5/tutorial/Tcl16.html is
the page that introduces [lsort]) or the Wiki (http://wiki.tcl.tk/1277
for the page on [lsort]) would be places I'd look. If the Wiki doesn't
have what you are looking for, we can change that of course...

Donal.

bill...@alum.mit.edu

unread,

May 25, 2008, 12:24:01 PM5/25/08

to

On May 25, 4:59 am, Neil Madden <n...@cs.nott.ac.uk> wrote:

>Generally, though, -command is a very slow option currently (even if the
>-command is implemented in C). Pre-processing the array/dict into a
>format suitable for [lsort] is faster in most cases:

It is quite true that in general in sorting it is more efficient to
create suitable keys by preprocessing than to use a special comparison
function, the reason being that you preprocess each key only once
while each key typically enters into a comparison many times.
Profiling of my fancy sort utility msort (http://billposer.org/
Software/msort.html), a C program (but with an optional Tcl/Tk GUI),
reveals that in typical cases more than 99% of the run time is spent
in preprocessing, less than 1% in the actual sort.

On the other hand, although -command does slow things down, in a great
many applications it doesn't matter - it is fast enough. I have used
this option without any problem to sort lists of thousands of items.
My advice is, give -command a try. If that proves to be too slow for
your purposes, then consider either preprocessing within Tcl or
outsourcing the sort to a faster sort utility.

bill...@alum.mit.edu

unread,

May 25, 2008, 12:44:43 PM5/25/08

to

Here's a little collection of examples of comparison functions:

proc NumericCompare {a b} {
return [expr $a - $b]
}

# Ignore leading + and - unless they are the only distinction between
the strings.
proc ClassCompare {a b} {
set cmp [string compare [string trimleft $a "+-"] [string trimleft
$b "+-"]]; if {$cmp != 0} {
return $cmp;
} else {
return [string compare $a $b];
}
}

#Compare using the fifth member of a list contained in an array.
proc FifthFieldOfArrayCompare {arrayName a b} {
upvar 1 $arrayName foo
return [string compare [lindex $foo($a) 4] [lindex $foo($b) 4]]
}

#Compare randomly (useful for unsorting)
proc RandomCompare {a b} {
set ::RandomSeed [expr ($::RandomSeed*9301 + 49297) % 233280]
return [expr int(3 * $::RandomSeed/double(233280)) -1]
}

#Compare first by string length (longer first), then lexicographically
proc lccomp {a b} {
set LenA [string length $a]
set LenB [string length $b]
set cmp [expr $LenB - $LenA]
if {$cmp != 0} {
return $cmp;
} else {
return [string compare $a $b];
}
}

#Sort on primary and secondary keys.
proc twocmp {a b} {
set r [$::PrimaryComparisonFunction $a $b]
if {$r == 0} {
set r [$::SecondaryComparisonFunction $a $b]
}
return $r
}

Kaitzschu

unread,

May 25, 2008, 7:41:11 PM5/25/08

to

On Sun, 25 May 2008, Fredderic wrote:

> One nice generic solution I use is something like this: (not tested)

Something you use but haven't tested? You are my kind of tester! :)

> proc lgroup {n list} {
> set out [list]; set ofs [expr {$n-1}]
> for {set idx 0} {$idx < [llength $list]} {incr idx $n} {
> lappend out [lrange $list $idx $idx+$ofs]
> }
> return $out
> }
>
> You can also play some tricks in there to speed it up

Sir, yes sir!

[code]

proc lgroupn {n list} {

set out [list]
set ofs [expr {$n-1}]

set ll [llength $list]
for {set idx 0} {$idx < $ll} {incr idx} {
lappend out [lrange $list $idx [incr idx $ofs]]
}
return $out
}

[/code]

Little bit of precomputing and avoiding index math (which has been tested
and denounced slow) gives remarkable 2/3 speed-up when splitting by 2, but
this degrades by split length, being mere 3 % when splitting by 10000.
Numbers follow (re-splittable list length = 300000, repeat counter 10,
times in microseconds):

split | lgroup 2 | lgroupn | factor
-------+-----------+----------+-------
1 | 1083257.9 | 331046.1 | 0.694
10 | 111723.2 | 37826.9 | 0.661
100 | 13946.0 | 6924.9 | 0.503
1000 | 5734.3 | 4312.7 | 0.248
10000 | 4102.7 | 3958.5 | 0.035
100000 | 3999.7 | 3897.4 | 0.026

> or build a more restricted function along these lines:

[snipped code for splitting by 2]

> Which is liable to be slightly faster.

Indeed. Still by a factor of 1.6 and over.

--
-Kaitzschu
set s TCL\ ;wh 1 {pu [set s [st ra $s 1 3][st in $s 0]]\033\[1A;af 99}

"Good thing programmers don't build bridges
[the same way as Kaitzschu writes code]."
--John Kelly in comp.lang.tcl

Kaitzschu

unread,

May 25, 2008, 7:44:26 PM5/25/08

to

On Mon, 26 May 2008, Kaitzschu wrote:

> split | lgroup 2 | lgroupn | factor

That "2" is of course a typing mistake, just testing your awareness :)

Uwe Klein

unread,

May 26, 2008, 4:15:37 AM5/26/08

to

Kaitzschu wrote:
> On Mon, 26 May 2008, Kaitzschu wrote:
>
>> split | lgroup 2 | lgroupn | factor
>
>
> That "2" is of course a typing mistake, just testing your awareness :)
>

would this help:

proc lgroupn {n list} {
set out list

incr n
while {[incr n -1]} {
lappend in $n
append out \ \$$n
}
puts stderr [list $in]
puts stderr [list $out]
foreach $in $list {
lappend ret [if 1 $out]
}
return $ret
}

set list [ array get ::env ]
puts $list

set list [ lgroun 2 $list ]
puts $list

uwe

Ralf Fassel

unread,

May 26, 2008, 5:41:46 AM5/26/08

to

* bill...@alum.mit.edu

| proc NumericCompare {a b} {
| return [expr $a - $b]
| }

ObNitpick:

proc NumericCompare {a b} {
return [expr {$a - $b}]
}

Always quote the arguments to 'expr', especially when handling
'untrusted' data.

R'

Larry W. Virden

unread,

May 27, 2008, 5:53:52 AM5/27/08

to

Over at http://wiki.tcl.tk/lsort has several suggestions for how to
use it to perform different sorts, plus there are references to other
wiki pages. If there are additional questions you have (or suggestions
that you have), you should add to that page.

Donal K. Fellows

unread,

May 27, 2008, 8:46:33 AM5/27/08

to

billposer wrote:
> proc NumericCompare {a b} {
> return [expr $a - $b]
> }

That's incorrectly named; you've got an integer comparator because the
result from the comparison command needs to be an integer (a
restriction of [lsort]).

Donal.

bill...@alum.mit.edu

unread,

May 28, 2008, 1:42:05 AM5/28/08

to

On May 27, 5:46 am, "Donal K. Fellows" <donal.k.fell...@man.ac.uk>
wrote:

True, but the name wasn't intended as a complete description of the
function. It came from a program that does several kinds of sorting
where the name was just intended to distinguish the routine used to
sort on unique ids, which are always integers, from those used to sort
in other ways.

Fredderic

unread,

Jun 3, 2008, 8:04:27 AM6/3/08

to

On Mon, 26 May 2008 02:41:11 +0300,
Kaitzschu <kait...@kaitzschu.cjb.net.nospam.plz.invalid> wrote:

> On Sun, 25 May 2008, Fredderic wrote:
>> One nice generic solution I use is something like this: (not
>> tested)
> Something you use but haven't tested? You are my kind of tester! :)

heh Actually, I mentioned it sometime last year when I was being
heckled about not providing real-life examples of stuff, but my
real-life instances of most code I might want to share are often full
of Fredderic-isms... Such things as a [proc] command with automatic
[upvar] for any argument whos name starts with & or *, and support for
partial continuations. I use my [keyed] ensemble (replaces [dict] with
a more complete set of sub-commands, and more consistent naming) which
I've presented on this newsgroup in the past. In addition to [set] I
also regularly use [get] and [take]. I have a semi-useful references
script, a modified [variable] command that's actually not insanely
irritating (it's one redeeming feature I've moved to a secondary
command with added usefulness), and an assortment of other bits and
pieces, most of the interesting ones I've mentioned in this newsgroup
at some time or another.

So cut'n'pasting from my own personal-use code (which is where just
about anything even remotely interesting generally resides) usually
leads to more head-scratching than just para-coding (as opposed to
para-phrasing) it on the spot with a "not tested" disclaimer... When I
do cut'n'paste, I have to then go through and "normalize" the code,
which has once or twice introduced a new bug (usually of the typo
variety) anyhow.

But at any rate... It's the thought that counts. ;)

>> You can also play some tricks in there to speed it up

> proc lgroupn {n list} {
> set out [list]
> set ofs [expr {$n-1}]
> set ll [llength $list]
> for {set idx 0} {$idx < $ll} {incr idx} {
> lappend out [lrange $list $idx [incr idx $ofs]]
> }
> return $out
> }

The trick with double-incrementing it is neat... I had a feeling there
was a way to get rid of $ofs, but couldn't quite pin it down on the
spot. Since you're not using $n anymore, you could just [incr n -1]
instead of creating the new variable (what I was hoping to do when I
wrote it).

proc lgroupn {n list} {
set out [list]

incr n -1

set ll [llength $list]
for {set idx 0} {$idx < $ll} {incr idx} {

lappend out [lrange $list $idx [incr idx $n]]
}
return $out
}

I thought [llength] was suppose to be bytecoded... I actually thought
it performed better than that... Hmmm.....

>> or build a more restricted function along these lines:
> [snipped code for splitting by 2]
>> Which is liable to be slightly faster.
> Indeed. Still by a factor of 1.6 and over.

Personally, I generally throw a [foreach] in-line to do that sort of
task, but I do have those functions for a couple smallish values of n
hanging around for some special call-back cases.

On Mon, 26 May 2008 10:15:37 +0200, Uwe Klein wrote:

> proc lgroupn {n list} {
> set out list

> incr n
> while {[incr n -1]} {
> lappend in $n
> append out \ \$$n
> }

> foreach $in $list {
> lappend ret [if 1 $out]
> }
> return $ret
> }

Now that's a trip... heh Took me a few moments to nut it out, but it
makes a frightening kind of sense... That [if 1] trick is something I
have to try and keep a little closer to mind, also...

What exactly is the go there, anyhow...?

Fredderic

Jonathan Bromley

unread,

Jun 3, 2008, 8:38:27 AM6/3/08

to

On Sun, 25 May 2008 18:13:31 +1000, Fredderic wrote:

> proc lgroup {n list} {
> set out [list]; set ofs [expr {$n-1}]
> for {set idx 0} {$idx < [llength $list]} {incr idx $n} {
> lappend out [lrange $list $idx $idx+$ofs]
> }
> return $out
> }

Anyone remember the "reshape" operator of APL? :-)
--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL * Verilog * SystemC * e * Perl * Tcl/Tk * Project Services

Doulos Ltd., 22 Market Place, Ringwood, BH24 1AW, UK
jonathan...@MYCOMPANY.com
http://www.MYCOMPANY.com

The contents of this message may contain personal views which
are not the views of Doulos Ltd., unless specifically stated.

Uwe Klein

unread,

Jun 3, 2008, 8:33:42 AM6/3/08

to

Fredderic wrote:

> On Mon, 26 May 2008 10:15:37 +0200, Uwe Klein wrote:
>
>
>>proc lgroupn {n list} {
>> set out list
>> incr n
>> while {[incr n -1]} {
>> lappend in $n
>> append out \ \$$n
>> }
>> foreach $in $list {
>> lappend ret [if 1 $out]
>> }
>> return $ret
>>}
>
>
> Now that's a trip... heh

you are welcome ;-)

> Took me a few moments to nut it out, but it
> makes a frightening kind of sense... That [if 1] trick is something I
> have to try and keep a little closer to mind, also...
>
> What exactly is the go there, anyhow...?

??
The body of [if] is supposed to get the benefit of bytecodecompilation.
With fixed body is is regularly _a lot_ faster than eval.

And i very much like building further code programmatically.

uwe