sorting...

David Bakhash

unread,

Dec 14, 1998, 3:00:00 AM12/14/98

to

Just wanted to know...

suppose that you have a list. You want to sort it. suppose that you
knew ahead of time that the list would be much closer to being sorted
if you reversed it. Would it make sense to call sort on the reversed
list rather than the list itself? would people think that this would
save time, or take more time?

thanks,
dave

Gareth McCaughan

unread,

Dec 14, 1998, 3:00:00 AM12/14/98

to

David Bakhash wrote:

Totally dependent on the sorting method used by your implementation.
I'd expect it usually to take more time. (I just tried a single
test case using CMU CL, an array of 10^5 fixna with each being
on the order of 5 places out of reversed order, and the element
type not being known at compile time. In this particular case,
not reversing first was on the order of 10% faster.)

--
Gareth McCaughan Dept. of Pure Mathematics & Mathematical Statistics,
gj...@dpmms.cam.ac.uk Cambridge University, England.

Jens Kilian

unread,

Dec 14, 1998, 3:00:00 AM12/14/98

to

David Bakhash <ca...@bu.edu> writes:
> suppose that you have a list. You want to sort it. suppose that you
> knew ahead of time that the list would be much closer to being sorted
> if you reversed it. Would it make sense to call sort on the reversed
> list rather than the list itself? would people think that this would
> save time, or take more time?

Depends on the internals of the sorting algorithm. For example, some variants
of mergesort (the algorithm of choice for linked lists) wouldn't benefit.
The version I'm thinking of builds pre-sorted runs of elements and merges
them, like this:

Original list: (1 2 7 8 6 5 4 3)
Pre-sorted: (1 2 7 8) (3 4 5 6)
Merged: (1 2 3 4 5 6 7 8)

Pre-sorting and merging are both O(N); the number of merges is O(log M),
where M is the number of pre-sorted lists, which is always <= N/2.

Bye,
Jens.
--
mailto:j...@acm.org phone:+49-7031-14-7698 (HP TELNET 778-7698)
http://www.bawue.de/~jjk/ fax:+49-7031-14-7351
PGP: 06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]

Kent M Pitman

unread,

Dec 14, 1998, 3:00:00 AM12/14/98

to

David Bakhash <ca...@bu.edu> writes:

> Just wanted to know...

>
> suppose that you have a list. You want to sort it. suppose that you
> knew ahead of time that the list would be much closer to being sorted
> if you reversed it. Would it make sense to call sort on the reversed
> list rather than the list itself? would people think that this would
> save time, or take more time?

Well, reversing a list is a linear time operation. If there's already
a linear factor involved, and it's hard to imagine a sort operation
faster than that unless the object to be sorted is maintained in a
format that knows it's pre-sorted (beyond the scope of any Lisp
datastructure or the predefined SORT function lisp offers, but not, I
suppose, beyond the scope of data structures and sort operators you
can define yourself).

So the question comes down to the one of whether being mostly (or even
completely) sorted is an aid to your sort function. As I recall, some
sort functions are really fast if the list is almost sorted and others
are not. It's possible, though I don't recall, that some are actually
slower if the list is almost or fully sorted. But the point is, it
probably makes sense from a theoretical efficiency perspective to
reverse the list before sorting if (a) the sort algorithm is know to
be positively affected by having done the reverse, and (b) the list is
usually long enough that algorithmic efficiency (rather than constant
factors) dominates. For the Lisp SORT function, I don't think you know
enough information to use either of these factoids, so I don't know what
to advise you--if performance isn't critical, the reverse is probably
not worth it. If it is critical, I'd either hand-code a sort function
whose characteristics I knew or else I'd contact the specific vendors
involved and get their assurance that it was doing what I thought it was.
Either way, I'd put in a BIG comment saying what I was doing. e.g.,

;; PERFORMANCE NOTE: The list X comes in mostly sorted in reverse.
;; If performance becomes a bottleneck here, it might be worth
;; reversing it before sorting it.
(SETQ X (SORT X PRED))

or

;; KLUDGE ALERT: The list X comes in mostly sorted in reverse.
;; Because we're running in the ACME implementation, we happen to
;; know SORT here will work faster if we first NREVERSE the list
;; here. If we move to other implementations, that assumption
;; should be reconsidered!
(SETQ X (SORT (NREVERSE X) PRED))

If you're looking for the overriding meta-rules, I'd say it's that
they're these:

(1) performance tuning is a science. there's a rule i think
from poker that might be useful here. they say that in
every hand of poker there's a sucker, and if you don't know
who it is, it's probably you. so if you don't find yourself
doing science to figure out your performance, you might
consider not trying. a lot of performance tuning by dabblers
situations worse and obfuscates code to no good end.

(2) good, clean documentation is often enough. and properly
applied it will allow you to later dig out of small messes
you may make. failure to add such will leave you with
unintelligible pieces of code that didn't make sense at all
in the first place, require mystic mumbo jumbo to explain,
and that you're afraid to change in the future.

Thomas A. Russ

unread,

Dec 14, 1998, 3:00:00 AM12/14/98

to

This really depends on the sorting algorithm used. If you call the
built-in SORT function, then you don't really have a lot of control or
knowledge about exactly what algorithm they used.

There are algorithms that work particularly well on nearly sorted
input. Straight insertion sort is one example -- but it is a very poor
choice unless the input is nearly sorted.

There are other algorithms that can work particularly poorly on
input that is in reverse sorted order and some that can be very poor on
sorted order input. A Quicksort that uses the first element as the
partition element will have N^2 behavior with sorted input.

There are other algorithms that aren't affected by input order.

--
Thomas A. Russ, USC/Information Sciences Institute t...@isi.edu

Erik Naggum

unread,

Dec 15, 1998, 3:00:00 AM12/15/98

to

* David Bakhash <ca...@bu.edu>

| suppose that you have a list. You want to sort it. suppose that you
| knew ahead of time that the list would be much closer to being sorted
| if you reversed it. Would it make sense to call sort on the reversed
| list rather than the list itself? would people think that this would
| save time, or take more time?

this is all wrong as an approach to improving performance. instead,
study the available sort algorithms in the considerable literature and
implement one that fits your data after you have discovered that the
sorting time is significant to the rest of your work and worth fixing.

I do (sort (cons <element> <sorted-list>) <test>) when adding one element
at a time to a priority list. profiling shows that it is irrelevant to
the execution time of the program, although I'd bet that some form of
programmer instinct would be triggered by this particular example and
would try to rewrite it.

#:Erik
--
man who cooks while hacking eats food that has died twice.

Bernhard Pfahringer

unread,

Dec 15, 1998, 3:00:00 AM12/15/98

to

In article <31226719...@naggum.no>, Erik Naggum <er...@naggum.no> wrote:
>
> I do (sort (cons <element> <sorted-list>) <test>) when adding one element
> at a time to a priority list. profiling shows that it is irrelevant to
> the execution time of the program, although I'd bet that some form of
> programmer instinct would be triggered by this particular example and
> would try to rewrite it.
>

Yep, why not use:

(merge 'list (list <element>) <sorted-list> <test>)

Not much of a rewrite, O(n), might not bite as early for larger priority lists.

Bernhard
--
--------------------------------------------------------------------------
Bernhard Pfahringer
Austrian Research Institute for http://www.ai.univie.ac.at/~bernhard/
Artificial Intelligence bern...@ai.univie.ac.at

Erik Naggum

unread,

Dec 15, 1998, 3:00:00 AM12/15/98

to

* Marco Antoniotti <mar...@copernico.parades.rm.cnr.it>
| Ahem!

:)

| Insertion in a priority queue should be O(lg n). Most likely, you are
| doing at least O(n lg n) with ...

the priority queues are supposed to be depleted very quickly, and have an
average length of 1.2 elements before sorting. when a particular queue
gets stuck, it normally gets unstuck before the length reaches 10. when
a queue is stuck for a long time, which happens to about 1% of the queues
over the course of a week, the cost of monitoring it dwarfes the cost of
the insertions. so the sorting time really is irrelevant.

it's useful as an example of when a simple-minded algorithm is sufficient
because other parts of the system profit more from optimization than this
particular part. a long queue length really is an important problem, and
making the priority queue insert fast is irrelevant in the big picture.
popping off the highest-priority element when the queue starts to drain,
however, needs to be very fast and simple because in the event that the
queue has gotten quite long, releasing it is a serious hit to the system
all at once, compared to the very well distributed O(n lg n) costs of the
insertions.

I actually found it quite amusing that I was somewhat unnerved by this
code and spent a lot of real time testing whether it was a bottle-neck,
returning to it time and again to see if it could be fixed. however, as
is quite often the case with optimizations, my hunch was all wrong, and
it was way more important to make deletion from the queue O(1).

| Here is the solution. It is old code and I will be willing to include
| any changes and to change the copyright.

thanks. I appreciate it.

Erik Naggum

unread,

Dec 15, 1998, 3:00:00 AM12/15/98

to

* bern...@korb.ai.univie.ac.at (Bernhard Pfahringer)
| Yep, why not use: (merge 'list (list <element>) <sorted-list> <test>).

| Not much of a rewrite, O(n), might not bite as early for larger priority
| lists.

that's actually very useful advice that doesn't cost anything to adopt.
thank you.

Pierre Mai

unread,

Dec 15, 1998, 3:00:00 AM12/15/98

to

Marco Antoniotti <mar...@copernico.parades.rm.cnr.it> writes:

> > I do (sort (cons <element> <sorted-list>) <test>) when adding one element
> > at a time to a priority list. profiling shows that it is irrelevant to
> > the execution time of the program, although I'd bet that some form of
> > programmer instinct would be triggered by this particular example and
> > would try to rewrite it.
>

> Ahem! Insertion in a priority queue should be O(lg n). Most likely,

> you are doing at least O(n lg n) with
>

> (sort (cons <element> <sorted-list>) <test>)

Whereby you have just lost Erik's bet ;) The point (I think) Erik was
trying to make is that without profiling, you won't know whether it is
even worth _thinking_ about performance-issues in a piece of code.
And since Erik's profiling has shown that sort is good enough for his
particular program, it doesn't make any sense implementing any form of
heap or other advanced data-structure.

The solution with sort is faster to code, safer, better to maintain,
easier to document, etc. Switching to a more advanced algorithm
and/or data-structure is only worth it, if profiling and/or
anticipated usage patterns indicate that sort will be a bottle-neck
here.

All of your other arguments are all true, and if one could disregard
implementation complexity, heaps (or splay-trees, or any of the other
100 solutions for priority-queues) would always be the better
solutions than sort. But one can't disregard implementation
complexity, since the effort spent on one part of a project will
always detract from the effort available for other parts of a project.

Regs, Pierre.

PS: Re-use does of course alleviate the problem of implementation
complexity somewhat. But re-use brings with it another can of
worms...

--
Pierre Mai <pm...@acm.org> http://home.pages.de/~trillian/
"One smaller motivation which, in part, stems from altruism is Microsoft-
bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]

R. Toy

unread,

Dec 15, 1998, 3:00:00 AM12/15/98

to

David Bakhash wrote:
>
> Just wanted to know...

>
> suppose that you have a list. You want to sort it. suppose that you
> knew ahead of time that the list would be much closer to being sorted
> if you reversed it. Would it make sense to call sort on the reversed
> list rather than the list itself? would people think that this would
> save time, or take more time?
>

How about this: rather than reversing the list, complement the sorting
predicate. So if you wanted the list in ascending order, sort the list
in descending order. If the list must be in ascending order, then this
doesn't help.

If in doubt, always profile.

--
---------------------------------------------------------------------------
----> Raymond Toy rt...@mindspring.com
http://www.mindspring.com/~rtoy

Marco Antoniotti

unread,

Dec 16, 1998, 3:00:00 AM12/16/98

to

Erik Naggum <er...@naggum.no> writes:

> * Marco Antoniotti <mar...@copernico.parades.rm.cnr.it>
> | Ahem!
>
> :)
>

> | Insertion in a priority queue should be O(lg n). Most likely, you are

> | doing at least O(n lg n) with ...
>
> the priority queues are supposed to be depleted very quickly, and have an
> average length of 1.2 elements before sorting. when a particular queue
> gets stuck, it normally gets unstuck before the length reaches 10. when
> a queue is stuck for a long time, which happens to about 1% of the queues
> over the course of a week, the cost of monitoring it dwarfes the cost of
> the insertions. so the sorting time really is irrelevant.

...

> is quite often the case with optimizations, my hunch was all wrong, and
> it was way more important to make deletion from the queue O(1).

...

Of course. This is what you really need to do. As a matter of fact,
is it worth paying the O(lg n) insertion/deletion price only if you
need a change-key operation.

>
> | Here is the solution. It is old code and I will be willing to include
> | any changes and to change the copyright.
>
> thanks. I appreciate it.
>

You are welcome.

Cheers

--
Marco Antoniotti ===========================================
PARADES, Via San Pantaleo 66, I-00186 Rome, ITALY
tel. +39 - (0)6 - 68 10 03 17, fax. +39 - (0)6 - 68 80 79 26
http://www.parades.rm.cnr.it