Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Sorting a table

45 views
Skip to first unread message

Francogrex

unread,
Mar 31, 2011, 7:42:34 AM3/31/11
to
Hi, I'm looking at a way to sort a table (in lists form):

;; this is a rather simple question I have a list of lists (as a
table) where I need to sort the table while preserving the ties
between the elements in the lists. Example:

(defparameter *table*
(list
(list "B" "XC-01" 5)
(list "A" "XC-02" 27)
(list "B" "XB-01" 5)
(list "A" "XB-01" 15)
(list "A" "XC-01" 71)
(list "A" "XB-02" 15)))

;; If I sort on the first element then the next step would be to sort
on the
;; second element while preserving the ties

(setf *table* (sort *table* #'string-lessp :key #'car))

> (("A" "XC-02" 27) ("A" "XB-01" 15) ("A" "XC-01" 71) ("A" "XB-02" 15)
("B" "XC-01" 5) ("B" "XB-01" 5))

;; Now sorting on the second element but need to keep the order of the
first sort on the first element

;; The results I am after is this:

(("A" "XB-01" 15) ("A" "XB-02" 15) ("A" "XC-01" 71) ("A" "XC-02" 27)
("B" "XB-01" 5) ("B" "XC-01" 5))

Marco Antoniotti

unread,
Mar 31, 2011, 7:50:04 AM3/31/11
to

You need to use STABLE-SORT and you need to sort first on SECOND and then on FIRST.

CL-USER 5 > (defparameter *table*


(list
(list "B" "XC-01" 5)
(list "A" "XC-02" 27)
(list "B" "XB-01" 5)
(list "A" "XB-01" 15)
(list "A" "XC-01" 71)
(list "A" "XB-02" 15)))

*TABLE*

CL-USER 6 > (setf *table* (stable-sort *table* #'string-lessp :key #'second))
(("B" "XB-01" 5) ("A" "XB-01" 15) ("A" "XB-02" 15) ("B" "XC-01" 5) ("A" "XC-01" 71) ("A" "XC-02" 27))

CL-USER 7 > (setf *table* (stable-sort *table* #'string-lessp :key #'first))


(("A" "XB-01" 15) ("A" "XB-02" 15) ("A" "XC-01" 71) ("A" "XC-02" 27) ("B" "XB-01" 5) ("B" "XC-01" 5))

CL-USER 8 >


Marco

Zach Beane

unread,
Mar 31, 2011, 7:58:05 AM3/31/11
to
Francogrex <fra...@grex.org> writes:

I wrote a little bit about this idea here:

http://groups.google.com/group/comp.lang.lisp/msg/a14dcbfcf6cccd13

The general idea is to create a combined ordering predicate from any
number of pairs of key functions and ordering predicates.

Zach

Madhu

unread,
Mar 31, 2011, 10:53:16 AM3/31/11
to

* Francogrex <ba7a7c86-a8c8-4e2b...@ed10g2000vbb.googlegroups.com> :
Wrote on Thu, 31 Mar 2011 04:42:34 -0700 (PDT):

| Hi, I'm looking at a way to sort a table (in lists form):
|
| ;; this is a rather simple question I have a list of lists (as a
| table) where I need to sort the table while preserving the ties
| between the elements in the lists. Example:
|
| (defparameter *table*
| (list
| (list "B" "XC-01" 5)
| (list "A" "XC-02" 27)
| (list "B" "XB-01" 5)
| (list "A" "XB-01" 15)
| (list "A" "XC-01" 71)
| (list "A" "XB-02" 15)))
|
| ;; If I sort on the first element then the next step would be to sort
| on the
| ;; second element while preserving the ties


You may want to see S.E.Harris's comp.lang.lisp message
<News:q94vevs...@chlorine.gnostech.com>
dated "Mon, 06 Feb 2006 09:29:25 -0800",
<groups.google.com/group/comp.lang.lisp/msg/0aebec62d76ce650>


Just for lols, following the idea in that message (but without using
clos to dispatch the tests), we can define a mildly perverse function:


(defun make-lexicographic-comparator (&key (test #'<) ; XXX
(sort-keys (list 'identity))
(lhs-symbol 'x)
(rhs-symbol 'y))
;; docstring elided to honour kenzo
(labels ((make-simple-form (test lhs rhs func)
`(,test (,func ,lhs) (,func ,rhs)))
(make-ordering-form (lhs-sym rhs-sym &key (test '<) keys)
(when keys
(if (null (cdr keys))
(make-simple-form
(cond ((consp test)
(assert (null (cdr test)) nil
"The length of TESTS does not match that of SORT-KEYS.")
(car test))
(t test))
lhs-sym rhs-sym (car keys))
`(or ,(make-simple-form
(if (consp test) (car test) test)
lhs-sym rhs-sym (car keys))
(and (not ,(make-simple-form
(if (consp test) (car test) test)
rhs-sym lhs-sym (car keys)))
,(make-ordering-form lhs-sym rhs-sym
:test (if (consp test) (cdr test) test)
:keys (cdr keys))))))))
`(lambda (,lhs-symbol ,rhs-symbol)
,(make-ordering-form lhs-symbol rhs-symbol :test test
:keys sort-keys))))


| (setf *table* (sort *table* #'string-lessp :key #'car))
|
|> (("A" "XC-02" 27) ("A" "XB-01" 15) ("A" "XC-01" 71) ("A" "XB-02" 15)
| ("B" "XC-01" 5) ("B" "XB-01" 5))
|
| ;; Now sorting on the second element but need to keep the order of the
| first sort on the first element
|
| ;; The results I am after is this:
|
| (("A" "XB-01" 15) ("A" "XB-02" 15) ("A" "XC-01" 71) ("A" "XC-02" 27)
| ("B" "XB-01" 5) ("B" "XC-01" 5))

To get that result you would call

(sort (copy-seq *table*)
(coerce
(make-lexicographic-comparator :test 'string-lessp :sort-keys '(first second))
'function))

=> (("A" "XB-01" 15) ("A" "XB-02" 15) ("A" "XC-01" 71) ("A" "XC-02" 27)


("B" "XB-01" 5) ("B" "XC-01" 5))


As another example, if you wanted to sort numerically on the third
element and then string compare the first and second elements:

(sort (copy-seq *table*)
(coerce
(make-lexicographic-comparator
:test '(< string-lessp string-lessp) :sort-keys '(third first second))
'function))

=> (("B" "XB-01" 5) ("B" "XC-01" 5) ("A" "XB-01" 15) ("A" "XB-02" 15)
("A" "XC-02" 27) ("A" "XC-01" 71))

(This is what made the function perverse, since we want to pass multiple
TESTS along with the SORT-KEYS, but a happy syntax is just macro away)

---
Madhu

Raffael Cavallaro

unread,
Mar 31, 2011, 11:44:45 AM3/31/11
to
On 2011-03-31 07:58:05 -0400, Zach Beane said:

> I wrote a little bit about this idea here:
>
> http://groups.google.com/group/comp.lang.lisp/msg/a14dcbfcf6cccd13
>
> The general idea is to create a combined ordering predicate from any
> number of pairs of key functions and ordering predicates.

Nice! I must have missed this when you originally posted it. Using your
cascade-sort the OPs example becomes simply:

? (cascade-sort *table* #'string< #'car #'string< #'second)


(("A" "XB-01" 15) ("A" "XB-02" 15) ("A" "XC-01" 71) ("A" "XC-02" 27)
("B" "XB-01" 5) ("B" "XC-01" 5))

BTW, combining Marco's stable-sort idea with yours we can do your
cascade sort all in one go with no helper functions:

(defun cascade-sort-2 (sequence &rest tests-and-keys)
(let* ((funs
(reverse (loop for (test key) on tests-and-keys by #'cddr
collect (sort-fun test key)))))
(loop for fun in funs
for result = (stable-sort sequence fun)
then (stable-sort result fun)
finally (return result))))


an example:

(defparameter *test-data*


(list
(list "B" "XC-01" 5)
(list "A" "XC-02" 27)

(list "B" "XB-01" 2)


(list "B" "XB-01" 5)
(list "A" "XB-01" 15)

(list "A" "XB-01" 1)


(list "A" "XC-01" 71)
(list "A" "XB-02" 15)

(list "A" "XB-02" 3)))

? (cascade-sort-2 *test-data* #'string< #'car #'string< #'second #'< #'third)
(("A" "XB-01" 1) ("A" "XB-01" 15) ("A" "XB-02" 3) ("A" "XB-02" 15) ("A"
"XC-01" 71) ("A" "XC-02" 27) ("B" "XB-01" 2) ("B" "XB-01" 5) ("B"
"XC-01" 5))

Though this stable-sort based version will do more comparisons in some
implementations (e.g., ccl, sbcl). Lispworks seems to be clever here
with both versions running in the same time on large data sets (a
million 3 element lists like those above) and also turning in the
fastest times.

warmest regards,

Ralph

--
Raffael Cavallaro

WJ

unread,
Mar 31, 2011, 12:14:43 PM3/31/11
to
Marco Antoniotti wrote:

> On Thursday, March 31, 2011 1:42:34 PM UTC+2, Francogrex wrote:
> > Hi, I'm looking at a way to sort a table (in lists form):
> >
> > ;; this is a rather simple question I have a list of lists (as a
> > table) where I need to sort the table while preserving the ties
> > between the elements in the lists. Example:
> >

> > (defparameter table


> > (list
> > (list "B" "XC-01" 5)
> > (list "A" "XC-02" 27)
> > (list "B" "XB-01" 5)
> > (list "A" "XB-01" 15)
> > (list "A" "XC-01" 71)
> > (list "A" "XB-02" 15)))
> >
> > ;; If I sort on the first element then the next step would be to
> > sort on the
> > ;; second element while preserving the ties
> >

> > (setf table (sort table #'string-lessp :key #'car))


> >
> > > (("A" "XC-02" 27) ("A" "XB-01" 15) ("A" "XC-01" 71) ("A" "XB-02"
> > > 15)
> > ("B" "XC-01" 5) ("B" "XB-01" 5))
> >
> > ;; Now sorting on the second element but need to keep the order of
> > the first sort on the first element
> >
> > ;; The results I am after is this:
> >
> > (("A" "XB-01" 15) ("A" "XB-02" 15) ("A" "XC-01" 71) ("A" "XC-02" 27)
> > ("B" "XB-01" 5) ("B" "XC-01" 5))
>
> You need to use STABLE-SORT and you need to sort first on SECOND and
> then on FIRST.
>

> CL-USER 5 > (defparameter table


> (list
> (list "B" "XC-01" 5)
> (list "A" "XC-02" 27)
> (list "B" "XB-01" 5)
> (list "A" "XB-01" 15)
> (list "A" "XC-01" 71)
> (list "A" "XB-02" 15)))
> *TABLE*
>

> CL-USER 6 > (setf table (stable-sort table #'string-lessp :key


> #'second)) (("B" "XB-01" 5) ("A" "XB-01" 15) ("A" "XB-02" 15) ("B"
> "XC-01" 5) ("A" "XC-01" 71) ("A" "XC-02" 27))
>

> CL-USER 7 > (setf table (stable-sort table #'string-lessp :key


> #'first)) (("A" "XB-01" 15) ("A" "XB-02" 15) ("A" "XC-01" 71) ("A"
> "XC-02" 27) ("B" "XB-01" 5) ("B" "XC-01" 5))
>
> CL-USER 8 >
>
>
> Marco

Good idea. Of course, the first sort needn't be stable.

An even better idea is to use a high-level language like MatzLisp
instead of a low-level language like CL.

require "pp" # Pretty printing.

table = [
["B", "XC-01", 5],
["A", "XC-02", 27],
["B", "XB-01", 5],
["A", "XB-01", 15],
["A", "XC-01", 71],
["A", "XB-02", 15]]

pp table.sort_by{|list| list[0,2] }


[["A", "XB-01", 15],
["A", "XB-02", 15],
["A", "XC-01", 71],
["A", "XC-02", 27],
["B", "XB-01", 5],
["B", "XC-01", 5]]

Raffael Cavallaro

unread,
Mar 31, 2011, 12:19:02 PM3/31/11
to
On 2011-03-31 11:44:45 -0400, Raffael Cavallaro said:

> Lispworks seems to be clever here with both versions running in the
> same time on large data sets

Actually, further testing indicates that the stable-sort based version
is slightly slower even under lispworks, and it's significantly slower
under ccl and sbcl.

Pascal J. Bourguignon

unread,
Mar 31, 2011, 12:24:14 PM3/31/11
to
Raffael Cavallaro <raffaelc...@pas.despam.s.il.vous.plait.mac.com>
writes:

> BTW, combining Marco's stable-sort idea with yours we can do your
> cascade sort all in one go with no helper functions:
>
> (defun cascade-sort-2 (sequence &rest tests-and-keys)
> (let* ((funs
> (reverse (loop for (test key) on tests-and-keys by #'cddr
> collect (sort-fun test key)))))
> (loop for fun in funs
> for result = (stable-sort sequence fun)
> then (stable-sort result fun)
> finally (return result))))

O(k*n*log(n)) instead of O(n*log(n))

--
__Pascal Bourguignon__ http://www.informatimago.com/
A bad day in () is better than a good day in {}.

vanekl

unread,
Mar 31, 2011, 1:32:12 PM3/31/11
to
On Thursday, March 31, 2011 12:14:43 PM UTC-4, WJ wrote:
> Marco Antoniotti wrote:
snip

> > You need to use STABLE-SORT and you need to sort first on SECOND and
> > then on FIRST.
snip

> > Marco
>
> Good idea. Of course, the first sort needn't be stable.
>
snip

You're both wrong when it comes to stable-sort. But you're both right in a way, too, because the OP never asked for a stable sort so your results are still technically valid.

"while preserving the ties between the elements in the lists" was interpreted by Marco to mean the OP wanted a stable sort when -- as I interpret it -- all he wanted was a sort with both primary and secondary keys, without permuting the sublists.

Then WJ compounded the error by suggesting that a stable sort can be accomplished by making only the last sort stable. This is akin to saying that after one has sex they can still be a virgin if they remain celibate afterwards; once the damage has been done it cannot be (easily) fixed. If the OP required a stable sort (which I don't think he does) then EVERY intermediate sort operation would be required to be stable, not just the last.

WJ

unread,
Mar 31, 2011, 2:39:29 PM3/31/11
to
Raffael Cavallaro wrote:

> On 2011-03-31 07:58:05 -0400, Zach Beane said:
>
> > I wrote a little bit about this idea here:
> >
> > http://groups.google.com/group/comp.lang.lisp/msg/a14dcbfcf6cccd13
> >
> > The general idea is to create a combined ordering predicate from any
> > number of pairs of key functions and ordering predicates.
>
> Nice! I must have missed this when you originally posted it. Using
> your cascade-sort the OPs example becomes simply:
>

> ? (cascade-sort table #'string< #'car #'string< #'second)


> (("A" "XB-01" 15) ("A" "XB-02" 15) ("A" "XC-01" 71) ("A" "XC-02" 27)
> ("B" "XB-01" 5) ("B" "XC-01" 5))
>
> BTW, combining Marco's stable-sort idea with yours we can do your
> cascade sort all in one go with no helper functions:
>
> (defun cascade-sort-2 (sequence &rest tests-and-keys)
> (let* ((funs
> (reverse (loop for (test key) on tests-and-keys by #'cddr
> collect (sort-fun test key)))))
> (loop for fun in funs
> for result = (stable-sort sequence fun)
> then (stable-sort result fun)
> finally (return result))))
>
>
> an example:
>
> (defparameter *test-data*
> (list
> (list "B" "XC-01" 5)
> (list "A" "XC-02" 27)
> (list "B" "XB-01" 2)
> (list "B" "XB-01" 5)
> (list "A" "XB-01" 15)
> (list "A" "XB-01" 1)
> (list "A" "XC-01" 71)
> (list "A" "XB-02" 15)
> (list "A" "XB-02" 3)))
>

It's understandable that those with very little programming ability
love to hide behind the pomposity and prolixity of COBOL and CL.

Let's use Scheme instead:

(define (sort-fun test key) (lambda (a b) (test (key a) (key b))))

(define (cascade-sort seq . tests-and-keys)
(if (null? tests-and-keys)
seq
(apply cascade-sort
(stable-sort seq (apply sort-fun (take-right tests-and-keys 2)))
(drop-right tests-and-keys 2))))


guile> (pretty-print
(cascade-sort data string<? first string<? second < third))

Raffael Cavallaro

unread,
Mar 31, 2011, 3:07:34 PM3/31/11
to
On 2011-03-31 12:24:14 -0400, Pascal J. Bourguignon said:

> O(k*n*log(n)) instead of O(n*log(n))

Assuming merge sort, quicksort or some other O(n log n) sort, yes. But
for lispworks the multiplier is much smaller than the number of sorts -
it's about 1.2 for 3 sorts on 3 sets of tests-and-keys and a data list
of a million 3 item lists. For ccl and sbcl it's about 1.8.

Moreover, for smaller data (< 25000), the stable-sort based version is
actually *faster*. IOW, for smaller data that k is less than 1!

Raffael Cavallaro

unread,
Mar 31, 2011, 3:16:43 PM3/31/11
to
On 2011-03-31 14:39:29 -0400, WJ said:

> Let's use Scheme instead:
>
> (define (sort-fun test key) (lambda (a b) (test (key a) (key b))))
>
> (define (cascade-sort seq . tests-and-keys)
> (if (null? tests-and-keys)
> seq
> (apply cascade-sort
> (stable-sort seq (apply sort-fun (take-right tests-and-keys 2)))
> (drop-right tests-and-keys 2))))

Oh, I see; it's better to write two separate functions in the same
number of lines as the one function I wrote, and pollute your one
global function *and* variable namespace with a definition called
"sort-fun" in the process. Got it.

Pascal J. Bourguignon

unread,
Mar 31, 2011, 3:33:26 PM3/31/11
to
Raffael Cavallaro <raffaelc...@pas.despam.s.il.vous.plait.mac.com>
writes:

I thought it was obvious, but obviously it was not.

My k is the number of columns you have to sort on.

In your version you sort the whole array on each columns successively,
performing k stable sorts.

In Zach's version there's a single sort, with granted, an order function
slightly more complex. However, in comparing multiple columns, the
decision is often made soon, so I'd expect an even better advantage if
you count the calls to the elementary comparison functions.

Marco Antoniotti

unread,
Mar 31, 2011, 4:52:33 PM3/31/11
to
On Thursday, March 31, 2011 7:32:12 PM UTC+2, vanekl wrote:
> On Thursday, March 31, 2011 12:14:43 PM UTC-4, WJ wrote:
> > Marco Antoniotti wrote:
> snip
> > > You need to use STABLE-SORT and you need to sort first on SECOND and
> > > then on FIRST.
> snip
> > > Marco
> >
> > Good idea. Of course, the first sort needn't be stable.
> >
> snip
>
> You're both wrong when it comes to stable-sort. But you're both right in a way, too, because the OP never asked for a stable sort so your results are still technically valid.
>
> "while preserving the ties between the elements in the lists" was interpreted by Marco to mean the OP wanted a stable sort when -- as I interpret it -- all he wanted was a sort with both primary and secondary keys, without permuting the sublists.

AFAIK this is what a stable sort is. The first sort does not need to be stable (I'll grant you that) but the second (and subsequent ones) should.

>
> Then WJ compounded the error by suggesting that a stable sort can be accomplished by making only the last sort stable. This is akin to saying that after one has sex they can still be a virgin if they remain celibate afterwards; once the damage has been done it cannot be (easily) fixed. If the OP required a stable sort (which I don't think he does) then EVERY intermediate sort operation would be required to be stable, not just the last.

Yep. This is correct, but I don't understand what your interpretation is.

Marco


Marco Antoniotti

unread,
Mar 31, 2011, 4:59:22 PM3/31/11
to
On Thursday, March 31, 2011 6:19:02 PM UTC+2, Raffael Cavallaro wrote:
> On 2011-03-31 11:44:45 -0400, Raffael Cavallaro said:
>
> > Lispworks seems to be clever here with both versions running in the
> > same time on large data sets
>
> Actually, further testing indicates that the stable-sort based version
> is slightly slower even under lispworks, and it's significantly slower
> under ccl and sbcl.
>

That probably depends on which algorithm is used for stable-sort in the different implementations.

Marco

Raffael Cavallaro

unread,
Mar 31, 2011, 5:05:29 PM3/31/11
to
On 2011-03-31 15:33:26 -0400, Pascal J. Bourguignon said:

> I thought it was obvious, but obviously it was not.
>
> My k is the number of columns you have to sort on.

It is obvious. That's why I said:

"But for lispworks the multiplier is much smaller than the number of sorts"

i.e. the "number of sorts" is the "number of columns you have to sort on."


> In Zach's version there's a single sort, with granted, an order function
> slightly more complex. However, in comparing multiple columns, the
> decision is often made soon, so I'd expect an even better advantage if
> you count the calls to the elementary comparison functions.

Yes, I would have expected that as well. Nevertheless, for moderate
sized data (> 25k) the effect of the "order function slightly more
complex" completely overwhelms the time taken for multiple sorts on
simpler comparison functions. Doing 3 sorts on simple comparisons is
*faster* than doing one sort on the more complex compound comparison
function generated by cascaded-predicate when data are smaller than 25
thousand items. This is true in all three implementations, not just
lispworks.

My line "for smaller data that k is less than 1!" was an expression of
the fact that actual performance is the *reverse* of what analysis of
the algorithm would lead us to expect.

For larger data (~ a million), using Lispworks, the multiple sorts are
still competitive with the single sort on a more complex comparison
function, running in only 1.2x time.

vanekl

unread,
Mar 31, 2011, 5:15:37 PM3/31/11
to
On Mar 31, 4:52 pm, Marco Antoniotti <marc...@gmail.com> wrote:
> On Thursday, March 31, 2011 7:32:12 PM UTC+2, vanekl wrote:
> > On Thursday, March 31, 2011 12:14:43 PM UTC-4, WJ wrote:
> > > Marco Antoniotti wrote:
> > snip
> > > > You need to use STABLE-SORT and you need to sort first on SECOND and
> > > > then on FIRST.
> > snip
> > > > Marco
>
> > > Good idea.  Of course, the first sort needn't be stable.
>
> > snip
>
> > You're both wrong when it comes to stable-sort. But you're both right in a way, too, because the OP never asked for a stable sort so your results are still technically valid.
>
> > "while preserving the ties between the elements in the lists" was interpreted by Marco to mean the OP wanted a stable sort when -- as I interpret it -- all he wanted was a sort with both primary and secondary keys, without permuting the sublists.
>
> AFAIK this is what a stable sort is.  The first sort does not need to be stable (I'll grant you that) but the second (and subsequent ones) should.

This would be more convincing if you explained your reasoning.
I'm not saying that any of the sorts have to be stable. In fact, I'm
saying that it doesn't matter in this case because the OP didn't
request it. BUT IF HE DID, then all the sorts would have had to be
stable, because once you destroy the sort order by using a non-stable
sort it is difficult to restore.

For example, suppose you have the following collection of records:
'(("a" 123) ("a" 456))

If you use a stable sort using the CAR of each sublist as a key there
is only one valid return value.
On the other hand, if you use a non-stable sort there are two valid
answers to performing a sort op on this collection.


> > Then WJ compounded the error by suggesting that a stable sort can be accomplished by making only the last sort stable. This is akin to saying that after one has sex they can still be a virgin if they remain celibate afterwards; once the damage has been done it cannot be (easily) fixed. If the OP required a stable sort (which I don't think he does) then EVERY intermediate sort operation would be required to be stable, not just the last.
>
> Yep.  This is correct, but I don't understand what your interpretation is.
>
> Marco


A stable sort means that if key1 and key2 are identical (as far as key
comparison is concerned), the two records may not be swapped but MUST
remain in original order. Nothing else. It has nothing to do with
sublist permutation. Sublists will not be permuted whether you are
using a stable sort or non-stable sort. Stable sort is not required in
order to do multi-key sorting, at least in the context of the current
discussion.

Stable sort is not commonly used because 1) stability of the original
order is usually not critical, 2) it is not needed in order to do a
multi-key sort, and 3) in many cases it is slower.

The ONLY way stable-sort would be a necessary requirement is if the OP
had specified that the original record order HAD TO BE MAINTAINED even
if the sort keys were identical. Non-stable sort does not guarantee
this.

Raffael Cavallaro

unread,
Mar 31, 2011, 5:52:30 PM3/31/11
to
On 2011-03-31 17:05:29 -0400, Raffael Cavallaro said:

> moderate sized data (> 25k)

should be < 25k of course.

Pascal J. Bourguignon

unread,
Mar 31, 2011, 7:05:26 PM3/31/11
to
vanekl <lou....@gmail.com> writes:

>> AFAIK this is what a stable sort is.  The first sort does not need to
>> be stable (I'll grant you that) but the second (and subsequent ones)
>> should.
>
> This would be more convincing if you explained your reasoning.
> I'm not saying that any of the sorts have to be stable. In fact, I'm
> saying that it doesn't matter in this case because the OP didn't
> request it. BUT IF HE DID, then all the sorts would have had to be
> stable, because once you destroy the sort order by using a non-stable
> sort it is difficult to restore.
>
> For example, suppose you have the following collection of records:
> '(("a" 123) ("a" 456))
>
> If you use a stable sort using the CAR of each sublist as a key there
> is only one valid return value.
> On the other hand, if you use a non-stable sort there are two valid
> answers to performing a sort op on this collection.

Yes. So if you understand that, isn't it obvious why to implement a
multi-column sort, using successive sorts, you need to use stable sorts
for all but the first one?


So the reasoning is the following:

A multi-column sort on columns c1...cp of a vector v of size n is such
that:

∀i, i∈[0,n-2]
⇒ ∃j, ∀k, j∈[1,p] ∧ k∈[1,j-1] ⇒ ck(v(i)) = ck(v(i+1))
∧ cj(v(i)) < cj(v(i+1))
∨ ∀k, k∈[1,p] ⇒ ck(v(i)) = ck(v(i+1))


The first sort, done on the column cp, ensures that:

∀i, i∈[0,n-2] ⇒ cp(v(i)) ≤ cp(v(i+1))


The next stable sort, done on the column c(p-1), ensures that:

∀i, i∈[0,n-2]
⇒ ∃j, ∀k, j∈[p-1,p] ∧ k∈[p-1,j-1] (1)
⇒ ck(v(i)) = ck(v(i+1)) (2)
∧ cj(v(i)) < cj(v(i+1)) (3)
∨ ∀k, k∈[p-1,p] (4)
⇒ ck(v(i)) = ck(v(i+1)) (5)

If this second sort wasn't stable, we could have cj(v(i)) > cj(v(i+1))
instead of (3). Notice that (1) implies that j=p in (3).


So we can now assume that we've performed m sorts, m>1, (the m-1 last
being stable sorts), and that we have:

∀i, i∈[0,n-2]
⇒ ∃j, ∀k, j∈[p-m+1,p] ∧ k∈[p-m+1,j-1] ⇒ ck(v(i)) = ck(v(i+1))
∧ cj(v(i)) < cj(v(i+1))
∨ ∀k, k∈[1,p-m+1] ⇒ ck(v(i)) = ck(v(i+1))

If we perform another stable sort, this time on column c(p-m), then we
will have:

∀i, i∈[0,n-2]
⇒ ∃j, ∀k, j∈[p-m,p] ∧ k∈[p-m,j-1] (6)
⇒ ck(v(i)) = ck(v(i+1)) (7)
∧ cj(v(i)) < cj(v(i+1)) (8)
∨ ∀k, k∈[1,p-m] (9)
⇒ ck(v(i)) = ck(v(i+1)) (10)

thanks to the fact that this new sort is a stable sort. Otherwise we
could not ensure (8).

Eventually, when we will have performed p sorts, one on each column, we
will have the multi-column sort assertion stated above.

> A stable sort means that if key1 and key2 are identical (as far as key
> comparison is concerned), the two records may not be swapped but MUST
> remain in original order. Nothing else. It has nothing to do with
> sublist permutation. Sublists will not be permuted whether you are
> using a stable sort or non-stable sort.

I don't understand what you mean, because of the ambiguity of your use
of "sublist". Either it's wrong, or it's trivially true.


> Stable sort is not required in order to do multi-key sorting, at least
> in the context of the current discussion.

Stable sort is not required, Zach presented an algorithm not using it.
But you can implement a multicolumn sort using stable-sort, as
demonstrated by Marco's function.

> Stable sort is not commonly used because:

> 1) stability of the original order is usually not critical,

Indeed. Implementing multi-column sort using stable-sort, the stability
of the multi-colum sort depends only on the stability of the first sort,
which is otherwise irrelevant for the multi-column sort.


> 2) it is not needed in order to do a multi-key sort, and

Indeed. But it can be used. See above.

> 3) in many cases it is slower.

Irrelevant. Notably the fact that a stable-sort might be slower than
sort to sort the same array does not matter. What matters is that when
you use stable-sort to implement multi-column sort, you have to call it
as many times as you have key columns.


> The ONLY way stable-sort would be a necessary requirement is if the OP
> had specified that the original record order HAD TO BE MAINTAINED even
> if the sort keys were identical.

Yes. It seems to me you have some problem grasping the difference
between necessary and sufficient.

The use of stable-sort is not necessary, but it's sufficient.

> Non-stable sort does not guarantee this.

Do you think anybody participating in this thread doesn't understand
that?

Pascal J. Bourguignon

unread,
Mar 31, 2011, 7:09:23 PM3/31/11
to
Raffael Cavallaro <raffaelc...@pas.despam.s.il.vous.plait.mac.com>
writes:

That's surprizing. Perhaps we benefit from large cache sizes too.

vanekl

unread,
Mar 31, 2011, 7:37:42 PM3/31/11
to
Pascal, you're right, good sir. Thanks for the reasoning.


On Mar 31, 7:05 pm, "Pascal J. Bourguignon" <p...@informatimago.com>
wrote:

Thomas A. Russ

unread,
Mar 31, 2011, 6:11:22 PM3/31/11
to
vanekl <lou....@gmail.com> writes:

> On Thursday, March 31, 2011 12:14:43 PM UTC-4, WJ wrote:
> > Marco Antoniotti wrote:
> snip
> > > You need to use STABLE-SORT and you need to sort first on SECOND and
> > > then on FIRST.
> snip
> > > Marco
> >
> > Good idea. Of course, the first sort needn't be stable.
> >
> snip
>
> You're both wrong when it comes to stable-sort. But you're both right
> in a way, too, because the OP never asked for a stable sort so your
> results are still technically valid.

I have to disagree. When you perform multiple sorts, you do need to use
a stable sort on all but the first sort.

> "while preserving the ties between the elements in the lists" was
> interpreted by Marco to mean the OP wanted a stable sort when -- as I
> interpret it -- all he wanted was a sort with both primary and
> secondary keys, without permuting the sublists.

Right. Which is why you need to use a stable sort when performing the
multiple sort passes approach.

What you end up doing with the multiple sort approach is first sort on
the least significant key. So that means that the entire data set is
now in order by the (in this case) secondary key. The next phase moves
to the primary key, and it has to be stable because whenever the primary
keys are equivalent, you need to preserve the order from the secondary
keys.

The stable sort is needed to preserve the work already done in the
previous sorting rounds.



> Then WJ compounded the error by suggesting that a stable sort can be
> accomplished by making only the last sort stable. This is akin to
> saying that after one has sex they can still be a virgin if they
> remain celibate afterwards; once the damage has been done it cannot be
> (easily) fixed. If the OP required a stable sort (which I don't think
> he does) then EVERY intermediate sort operation would be required to
> be stable, not just the last.

It is correct that all but the first sort needs to be stable in order to
avoid disturbing the order of the partial results.

For example: Sorting ((2 2) (1 1) (2 1) (1 2)) by two keys you will end
up with one of the following intermediate states if you do a non-stable
sort:
a. ((1 1) (2 1) (2 2) (1 2))
b. ((1 1) (2 1) (1 2) (2 2))
c. ((2 1) (1 1) (2 2) (1 2))
d. ((2 1) (1 1) (1 2) (2 2))

For the sake of argument, let us suppose that we end up with state (c).
We then need to use a stable sort so that we can guarantee that (2 1)
will be sorted ahead of (2 2) and (1 1) ahead of (1 2), which will then
lead to the correct result
((1 1) (1 2) (2 1) (2 2))
If you use a non-stable sort for any but the first sort, then you cannot
have this guarantee and the non-stable sort can in essence ignore the
previous work and return
((1 2) (1 1) (2 1) (2 2))
or some other permutation where only the first key is respected.


--
Thomas A. Russ, USC/Information Sciences Institute

Mirko Vukovic

unread,
Apr 1, 2011, 1:44:17 PM4/1/11
to

This version does not use key or nested sorts. Instead it uses a function that compares to list items based on the first two items. I have no clue if this is more or less efficient:

(sort *table* #'(lambda (arg1 arg2)
(block top
(destructuring-bind (h1 r1 &rest c1) arg1
(declare (ignore c1))
(destructuring-bind (h2 r2 &rest c2) arg2
(declare (ignore c2))
(cond
((string< h1 h2) (return-from top t))
((and (string= h1 h2)
(string< r1 r2))
(return-from top t))
(t nil)))))))

Mirko

Pascal J. Bourguignon

unread,
Apr 1, 2011, 2:15:00 PM4/1/11
to
Mirko Vukovic <mirko....@gmail.com> writes:


> This version does not use key or nested sorts. Instead it uses a
> function that compares to list items based on the first two items. I
> have no clue if this is more or less efficient:
>
> (sort *table* #'(lambda (arg1 arg2)
> (block top
> (destructuring-bind (h1 r1 &rest c1) arg1
> (declare (ignore c1))
> (destructuring-bind (h2 r2 &rest c2) arg2
> (declare (ignore c2))
> (cond
> ((string< h1 h2) (return-from top t))
> ((and (string= h1 h2)
> (string< r1 r2))
> (return-from top t))
> (t nil)))))))

It is about the same as Zach's; it has the comparison function defined
at compilation time, so it'll be slightly faster to invoke (this could
be cached in Zach's solution).

The problem with your solution is that it lacks generality and
abstraction. If you need to sort another table with different columns
(even the same, on different columns), then you need to re-generate the
lambda yourself.

We're programming in Lisp, the #1=(programmable . #1#) programming
language, so instead of writing the programs yourself, you should try to
write programs that write programs for you. This is what Zach's
solution does.

Steven E. Harris

unread,
Apr 1, 2011, 3:59:53 PM4/1/11
to
Madhu <eno...@meer.net> writes:

> Just for lols, following the idea in that message (but without using
> clos to dispatch the tests), we can define a mildly perverse function:

[...]

It's nice to see your elaboration on the original idea. I happened to
have started with a CLOS mindset there, but your solution is more
general and more library-like.

--
Steven E. Harris

Mirko Vukovic

unread,
Apr 2, 2011, 9:20:42 AM4/2/11
to

Zach's code illustrates why common lisp has a steep learning curve (https://groups.google.com/d/topic/comp.lang.lisp/cfdVkK-Nil8/discussion) ;-)

It is a nice piece of code. It took me a while to digest it. Referring to a post in the quoted thread, the reason may be that I am waaay over 20's and 30's :-)

Mirko

0 new messages