Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Fastest list sort for two criteria

154 views
Skip to first unread message

Alexandru

unread,
Jun 29, 2015, 5:15:16 AM6/29/15
to
Hi,

As far as I know there is no standard command in Tcl for this.

So what would be the fastest way in Tcl to sort a list for two criteria?

For example I have the list:

set l {1 a 2 a 1 c 1 b 3 b 4 c}

I want to sort first the integers and then the letters. So the sorted list looks like this:

1 a 1 b 1 c 2 a 3 b 4 c

My first approach is to sort the list using

set l [lsort -stride 2 $l]

which gives me

1 a 1 c 1 b 2 a 3 b 4 c

and then do a loop over all list elements and sort again according to the letters.

But is that the fastest algorithm?

Thank you!

P.S: My actual list is very big (millions of elements) and contains only real numbers instead of integers and letters.

Andreas Leitgeb

unread,
Jun 29, 2015, 5:32:23 AM6/29/15
to
Alexandru <alexandr...@meshparts.de> wrote:
> As far as I know there is no standard command in Tcl for this.
> So what would be the fastest way in Tcl to sort a list for two criteria?
> For example I have the list:
> set l {1 a 2 a 1 c 1 b 3 b 4 c}
> I want to sort first the integers and then the letters. So the sorted list looks like this:
> 1 a 1 b 1 c 2 a 3 b 4 c

try this:
set l [lsort -stride 2 [lsort -stride 2 -index 1 $l]]

The trick is to sort sequentially from least important criterion
to most important.
This works, because tcl's sort is stable, that means: elements
comparing as equal don't change their relative positions.

Alexandru

unread,
Jun 29, 2015, 6:35:03 AM6/29/15
to
> > 1 a 1 b 1 c 2 a 3 b 4 c
>
> try this:
> set l [lsort -stride 2 [lsort -stride 2 -index 1 $l]]
>

Hi Andreas,

your solution works perfectly and I'm pretty convinced, this is also the fastest one.

Thank you for your fast help!

heinrichmartin

unread,
Jun 30, 2015, 5:29:25 AM6/30/15
to
On Monday, June 29, 2015 at 12:35:03 PM UTC+2, Alexandru wrote:
> your solution works perfectly and I'm pretty convinced, this is also the fastest one.

My first thought was that the nested lsort involves
* unnecessary copies of the list and
* a possibly "bad" overall sorting order after the first iteration.
Both of these could significantly decrease performance.

However, this thought was wrong:

_______________
#!/usr/bin/env tclsh

puts [info patchlevel]

set n 1000000
set u [list]
for {set i 0} {$i < $n} {incr i} {
lappend u [expr {int(rand()*100)}] [format %c [expr {0x61+int(rand()*26)}]]
}

puts "lsort -stride: [time {set s1 [lsort -stride 2 [lsort -stride 2 -index 1 $u]]}]"
puts "lsort -command: [time {set s2 [lsort -stride 2 -command {apply {{a b} {lassign $a a1 a2; lassign $b b1 b2; if {$a1==$b1} then {return [string compare $a2 $b2]} else {return [expr {$a1-$b1}]}}}} $u]}]"

if {$s1 != $s2} {
puts stderr "ERROR: The results differ!"
}
_______________
8.6.3
lsort -stride: 1902518 microseconds per iteration
lsort -command: 17328756 microseconds per iteration
ERROR: The results differ!
_______________

CAUTION: The implementation with -command is not supported! -stride always defauts -index to 0 even if -command is given. Afaik there is no way to multi-sort with -command.

NOTE: apply seems to be as fast as a proc.

It looks like an extension to lsort like "-multisort {0 1}" would be a nice feature. If -multisort and -command are used, then the command shall accept more arguments, namely {a0 a1 b0 b1} in this case.

Alexandru

unread,
Jun 30, 2015, 6:33:13 AM6/30/15
to
Hi HeinrichMartin,

tank you for the benchmark.

Indeed, an extension to lsort for multisort would be the next logical step. I hope some core developers read this...

Andreas Leitgeb

unread,
Jun 30, 2015, 10:49:49 AM6/30/15
to
heinrichmartin <martin....@frequentis.com> wrote:
> puts "lsort -stride: [time {set s1 [lsort -stride 2 [lsort -stride 2 -index 1 $u]]}]"
> puts "lsort -command: [time {set s2 [lsort -stride 2 -command {apply {{a b} {lassign $a a1 a2; lassign $b b1 b2; if {$a1==$b1} then {return [string compare $a2 $b2]} else {return [expr {$a1-$b1}]}}}} $u]}]"
> if {$s1 != $s2} {
> puts stderr "ERROR: The results differ!"
> }
> ERROR: The results differ!
> CAUTION: The implementation with -command is not supported! -stride always
> defauts -index to 0 even if -command is given. Afaik there is no way to
> multi-sort with -command.

This differ is also explained by the different sorting criteria:
the first one sorts both elements in ascii order, thus placing "2"
after "10", whereas the command-variant would (even if it weren't
for the implied "-index 0") use integer comparison for the first
element.

This is however irrelevant to the OP (Alexandru), who wrote that
his list has really pairs of real numbers, thus he most likely
already has his relevant sort-order option in use.

Given that the command actually gets two single elements rather than
sublists, then of course the two "lassign"s in the command cause some
extra shimmering and thus even further worsen the performance.
Removing the lassign and just pretending that the respective
elements are each repeated for index 1, will drop the performance
quotient (msecs(command) / msecs(two sorts)) from above 10 to
about 7.

> It looks like an extension to lsort like "-multisort {0 1}" would be a
> nice feature. If -multisort and -command are used, then the command
> shall accept more arguments, namely {a0 a1 b0 b1} in this case.

I don't really see much value in the -multisort option. It seems to make
sense only where -command and -stride appear together, otherwise the
command already sees the complete sublist to compare. I think, there
must be another solution/fix to get these two options working together
even at the extra cost of preparing a pair of sublists for each comparison.

heinrichmartin

unread,
Jun 30, 2015, 5:14:46 PM6/30/15
to
On Tuesday, June 30, 2015 at 4:49:49 PM UTC+2, Andreas Leitgeb wrote:
> I don't really see much value in the -multisort option. It seems to make
> sense only where -command and -stride appear together, otherwise the
> command already sees the complete sublist to compare. I think, there
> must be another solution/fix to get these two options working together
> even at the extra cost of preparing a pair of sublists for each comparison.

Well, -stride and -multisort could go well together even without -command; it would solve Alexandru's request in a single call (probably in less time than nested lsorts).

For -command with -stride it might be enough to accept an empty list for -index {}, in which case the groups of elements are passed as lists.

lsort -stride 2 -integer [lsort -stride 2 -index 1 $u]
lsort -stride 2 -index {} -command {apply {{a b} {...}}} $u ;# should override the default -index 0

> Given that the command actually gets two single elements rather than
> sublists, then of course the two "lassign"s in the command cause some
> extra shimmering and thus even further worsen the performance.

Aaargh, so true. (Took me a while to understand what you meant.) The quotient is not very stable in my environment: it looks like the performance of the nested lsort highly depends on the values ...
0 new messages