suppose that you have a list. You want to sort it. suppose that you knew ahead of time that the list would be much closer to being sorted if you reversed it. Would it make sense to call sort on the reversed list rather than the list itself? would people think that this would save time, or take more time?
David Bakhash wrote: > suppose that you have a list. You want to sort it. suppose that you > knew ahead of time that the list would be much closer to being sorted > if you reversed it. Would it make sense to call sort on the reversed > list rather than the list itself? would people think that this would > save time, or take more time?
Totally dependent on the sorting method used by your implementation. I'd expect it usually to take more time. (I just tried a single test case using CMU CL, an array of 10^5 fixna with each being on the order of 5 places out of reversed order, and the element type not being known at compile time. In this particular case, not reversing first was on the order of 10% faster.)
-- Gareth McCaughan Dept. of Pure Mathematics & Mathematical Statistics, gj...@dpmms.cam.ac.uk Cambridge University, England.
David Bakhash <ca...@bu.edu> writes: > suppose that you have a list. You want to sort it. suppose that you > knew ahead of time that the list would be much closer to being sorted > if you reversed it. Would it make sense to call sort on the reversed > list rather than the list itself? would people think that this would > save time, or take more time?
Depends on the internals of the sorting algorithm. For example, some variants of mergesort (the algorithm of choice for linked lists) wouldn't benefit. The version I'm thinking of builds pre-sorted runs of elements and merges them, like this:
Pre-sorting and merging are both O(N); the number of merges is O(log M), where M is the number of pre-sorted lists, which is always <= N/2.
Bye, Jens. -- mailto:j...@acm.org phone:+49-7031-14-7698 (HP TELNET 778-7698) http://www.bawue.de/~jjk/ fax:+49-7031-14-7351 PGP: 06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish, 0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]
David Bakhash <ca...@bu.edu> writes: > Just wanted to know...
> suppose that you have a list. You want to sort it. suppose that you > knew ahead of time that the list would be much closer to being sorted > if you reversed it. Would it make sense to call sort on the reversed > list rather than the list itself? would people think that this would > save time, or take more time?
Well, reversing a list is a linear time operation. If there's already a linear factor involved, and it's hard to imagine a sort operation faster than that unless the object to be sorted is maintained in a format that knows it's pre-sorted (beyond the scope of any Lisp datastructure or the predefined SORT function lisp offers, but not, I suppose, beyond the scope of data structures and sort operators you can define yourself).
So the question comes down to the one of whether being mostly (or even completely) sorted is an aid to your sort function. As I recall, some sort functions are really fast if the list is almost sorted and others are not. It's possible, though I don't recall, that some are actually slower if the list is almost or fully sorted. But the point is, it probably makes sense from a theoretical efficiency perspective to reverse the list before sorting if (a) the sort algorithm is know to be positively affected by having done the reverse, and (b) the list is usually long enough that algorithmic efficiency (rather than constant factors) dominates. For the Lisp SORT function, I don't think you know enough information to use either of these factoids, so I don't know what to advise you--if performance isn't critical, the reverse is probably not worth it. If it is critical, I'd either hand-code a sort function whose characteristics I knew or else I'd contact the specific vendors involved and get their assurance that it was doing what I thought it was. Either way, I'd put in a BIG comment saying what I was doing. e.g.,
;; PERFORMANCE NOTE: The list X comes in mostly sorted in reverse. ;; If performance becomes a bottleneck here, it might be worth ;; reversing it before sorting it. (SETQ X (SORT X PRED))
or
;; KLUDGE ALERT: The list X comes in mostly sorted in reverse. ;; Because we're running in the ACME implementation, we happen to ;; know SORT here will work faster if we first NREVERSE the list ;; here. If we move to other implementations, that assumption ;; should be reconsidered! (SETQ X (SORT (NREVERSE X) PRED))
If you're looking for the overriding meta-rules, I'd say it's that they're these:
(1) performance tuning is a science. there's a rule i think from poker that might be useful here. they say that in every hand of poker there's a sucker, and if you don't know who it is, it's probably you. so if you don't find yourself doing science to figure out your performance, you might consider not trying. a lot of performance tuning by dabblers situations worse and obfuscates code to no good end.
(2) good, clean documentation is often enough. and properly applied it will allow you to later dig out of small messes you may make. failure to add such will leave you with unintelligible pieces of code that didn't make sense at all in the first place, require mystic mumbo jumbo to explain, and that you're afraid to change in the future.
This really depends on the sorting algorithm used. If you call the built-in SORT function, then you don't really have a lot of control or knowledge about exactly what algorithm they used.
There are algorithms that work particularly well on nearly sorted input. Straight insertion sort is one example -- but it is a very poor choice unless the input is nearly sorted.
There are other algorithms that can work particularly poorly on input that is in reverse sorted order and some that can be very poor on sorted order input. A Quicksort that uses the first element as the partition element will have N^2 behavior with sorted input.
There are other algorithms that aren't affected by input order.
-- Thomas A. Russ, USC/Information Sciences Institute t...@isi.edu
* David Bakhash <ca...@bu.edu> | suppose that you have a list. You want to sort it. suppose that you | knew ahead of time that the list would be much closer to being sorted | if you reversed it. Would it make sense to call sort on the reversed | list rather than the list itself? would people think that this would | save time, or take more time?
this is all wrong as an approach to improving performance. instead, study the available sort algorithms in the considerable literature and implement one that fits your data after you have discovered that the sorting time is significant to the rest of your work and worth fixing.
I do (sort (cons <element> <sorted-list>) <test>) when adding one element at a time to a priority list. profiling shows that it is irrelevant to the execution time of the program, although I'd bet that some form of programmer instinct would be triggered by this particular example and would try to rewrite it.
#:Erik -- man who cooks while hacking eats food that has died twice.
In article <3122671946890...@naggum.no>, Erik Naggum <e...@naggum.no> wrote:
> I do (sort (cons <element> <sorted-list>) <test>) when adding one element > at a time to a priority list. profiling shows that it is irrelevant to > the execution time of the program, although I'd bet that some form of > programmer instinct would be triggered by this particular example and > would try to rewrite it.
Not much of a rewrite, O(n), might not bite as early for larger priority lists.
Bernhard -- -------------------------------------------------------------------------- Bernhard Pfahringer Austrian Research Institute for http://www.ai.univie.ac.at/~bernhard/ Artificial Intelligence bernh...@ai.univie.ac.at
* Marco Antoniotti <marc...@copernico.parades.rm.cnr.it> | Ahem!
:)
| Insertion in a priority queue should be O(lg n). Most likely, you are | doing at least O(n lg n) with ...
the priority queues are supposed to be depleted very quickly, and have an average length of 1.2 elements before sorting. when a particular queue gets stuck, it normally gets unstuck before the length reaches 10. when a queue is stuck for a long time, which happens to about 1% of the queues over the course of a week, the cost of monitoring it dwarfes the cost of the insertions. so the sorting time really is irrelevant.
it's useful as an example of when a simple-minded algorithm is sufficient because other parts of the system profit more from optimization than this particular part. a long queue length really is an important problem, and making the priority queue insert fast is irrelevant in the big picture. popping off the highest-priority element when the queue starts to drain, however, needs to be very fast and simple because in the event that the queue has gotten quite long, releasing it is a serious hit to the system all at once, compared to the very well distributed O(n lg n) costs of the insertions.
I actually found it quite amusing that I was somewhat unnerved by this code and spent a lot of real time testing whether it was a bottle-neck, returning to it time and again to see if it could be fixed. however, as is quite often the case with optimizations, my hunch was all wrong, and it was way more important to make deletion from the queue O(1).
| Here is the solution. It is old code and I will be willing to include | any changes and to change the copyright.
thanks. I appreciate it.
#:Erik -- man who cooks while hacking eats food that has died twice.
* bernh...@korb.ai.univie.ac.at (Bernhard Pfahringer) | Yep, why not use: (merge 'list (list <element>) <sorted-list> <test>). | Not much of a rewrite, O(n), might not bite as early for larger priority | lists.
that's actually very useful advice that doesn't cost anything to adopt. thank you.
#:Erik -- man who cooks while hacking eats food that has died twice.
Marco Antoniotti <marc...@copernico.parades.rm.cnr.it> writes: > > I do (sort (cons <element> <sorted-list>) <test>) when adding one element > > at a time to a priority list. profiling shows that it is irrelevant to > > the execution time of the program, although I'd bet that some form of > > programmer instinct would be triggered by this particular example and > > would try to rewrite it.
> Ahem! Insertion in a priority queue should be O(lg n). Most likely, > you are doing at least O(n lg n) with
> (sort (cons <element> <sorted-list>) <test>)
Whereby you have just lost Erik's bet ;) The point (I think) Erik was trying to make is that without profiling, you won't know whether it is even worth _thinking_ about performance-issues in a piece of code. And since Erik's profiling has shown that sort is good enough for his particular program, it doesn't make any sense implementing any form of heap or other advanced data-structure.
The solution with sort is faster to code, safer, better to maintain, easier to document, etc. Switching to a more advanced algorithm and/or data-structure is only worth it, if profiling and/or anticipated usage patterns indicate that sort will be a bottle-neck here.
All of your other arguments are all true, and if one could disregard implementation complexity, heaps (or splay-trees, or any of the other 100 solutions for priority-queues) would always be the better solutions than sort. But one can't disregard implementation complexity, since the effort spent on one part of a project will always detract from the effort available for other parts of a project.
Regs, Pierre.
PS: Re-use does of course alleviate the problem of implementation complexity somewhat. But re-use brings with it another can of worms...
> suppose that you have a list. You want to sort it. suppose that you > knew ahead of time that the list would be much closer to being sorted > if you reversed it. Would it make sense to call sort on the reversed > list rather than the list itself? would people think that this would > save time, or take more time?
How about this: rather than reversing the list, complement the sorting predicate. So if you wanted the list in ascending order, sort the list in descending order. If the list must be in ascending order, then this doesn't help.
If in doubt, always profile.
-- --------------------------------------------------------------------------- ----> Raymond Toy r...@mindspring.com http://www.mindspring.com/~rtoy
Erik Naggum <e...@naggum.no> writes: > * Marco Antoniotti <marc...@copernico.parades.rm.cnr.it> > | Ahem!
> :)
> | Insertion in a priority queue should be O(lg n). Most likely, you are > | doing at least O(n lg n) with ...
> the priority queues are supposed to be depleted very quickly, and have an > average length of 1.2 elements before sorting. when a particular queue > gets stuck, it normally gets unstuck before the length reaches 10. when > a queue is stuck for a long time, which happens to about 1% of the queues > over the course of a week, the cost of monitoring it dwarfes the cost of > the insertions. so the sorting time really is irrelevant.
...
> is quite often the case with optimizations, my hunch was all wrong, and > it was way more important to make deletion from the queue O(1).
...
Of course. This is what you really need to do. As a matter of fact, is it worth paying the O(lg n) insertion/deletion price only if you need a change-key operation.
> | Here is the solution. It is old code and I will be willing to include > | any changes and to change the copyright.