J-SORT

1283 views
Skip to first unread message

Sergey Erokhin

unread,
Nov 26, 1997, 3:00:00 AM11/26/97
to


Приветствую Вас всезнающий ALL

Поделитесь вашими впечатлениями о SUBJ, интересует его сравнение с
известными сортировками (quick, shell, ....) и оценка эффективности
аля как в книжке Вирта.

Допустим что памяти хватит ;-)

Это страничка из I-NET, адреса там не было :-(

=== Cut ===
A revolutionary new sort from John Cohen

Introduction

Yes, I know I'm not the first to claim I have created one of the
fastest sorts, however please do not dismiss me before at least
reading this paragraph. I have not reinvented the postal sort or
anything like that. I have invented a new sort for linked lists
which runs in O(n) = n ln n time as any good sort should. However,
although the big-O is similar to many other sorts, this runs
consistently faster. I have tested it against the bubble (hah),
selection, insertion, merge, quick, shell, and heap sorts.
The J sort does not need special keys; the only thing you need
is a comparison function which determines if one data point is
less then, equal to, or greater then another. Thus, the sort is
applicable in any sorting situation.

General Overview

Here's the general idea. We have a doubly-linked list of elements
which we would like to sort. If there are a trivial number of
elements (two or fewer), it's trivial to sort them. If there are
fewer then some fixed number (studies show this should be around 35),
then the Strand sort is exectued. This is a sort which I devised that
takes advantage of natural runs in the data, and is very fast for small
numbers of elements. However, it is an n-squared sort, so the speed
advantage is lost with a sizeable number of elements. Therefore, with
any more then that fixed number of elements, another sort of my own
devising called the Shuffle sort is executed. This extracts a subset
of the elements into a new list, and then inserts the other elements
into this list using a very fast insertion routine. Then, to sort the
new list we can break the chore into a great many smaller sorting taskes,
which are accomplished with recursion.

Advantages & Limitations

For every size of list, from 2 elements to 5000, the J sort preformed
faster by a factor of 2 or more. Also, because most of the memory-related
activity is simply rearranging pointers in a linked list, it is very
conservative when it comes to memory usage. However, I would say that
just about any other sort would take less memory. Also, because of the
dynamic nature of the algorithm, it is not fesible to use the J sort on
data which are not in RAM. Therefore, unless all the data points can fit
into memory at once, the J sort is probably not a good choice, although
if you were doing a merge or quick sort, you could use the J sort when
the number of elements gets small enough to squeeze them into memory.
An far as parallel processing goes, the J sort was devised on a serial
computer, but it is actually is quite suited for parallel processing.
If anyone tries this, please let me know.

The Algorithm

As outlined in the General Overview, There are two main parts to
the J sort: the Strand sort which runs with small numbers of elements,
and the Shuffle sort.

First the Strand sort. Studies show that about 30 to 40 is a good threshold
for using Strand over the Shuffle. However, this number could change on
different platforms, languages, and implementations. After dealing with
the trivial cases, the Strand sort pulls out the first element in the
list, and uses it as the first element of a new list called the sub-list.
Then the remainder of the original list is scanned, and every time an
element is found which is greater then the last element of the sub-list,
it is appended to the sub-list, and removed from the original. Now we have
extracted a sorted sub-list from the original list. Putting that list aside,
we then extract another sub-list in the same manner. Then the two sub-lists
are merged. The process of extracting a sub-list and merging with the
others is continued until the original list is exhausted. The Strand sort
takes advantage of natural runs, or many successive elements which are
already in order. Now the Shuffle sort. Let n be the number of elements.
First the first n/8 elements are removed and placed into another list called
the key list. This list is then sorted recursiviy Then an array of n/8
pointers is created, and these are set to point to the n/8 elements in the
key list, in order. Now the remaining 7n/8 elements of the list is inserted
into the key list by making a binary search using the array. Now we have n/8
unordered lists between the n/8 key elements which are sorted. These lists
are then sorted recursiviy. Then the key list is totally sorted. This is very
fast because the array makes the binary search very fast, and the n/8 lists
are all only 8 elements long on the average no matter what the size of the
original list. Also, the insertion algorithm preserves natural runs which
the Strand sort can take advantage of. Here is the algorithm:

The J Sort

I. Let L be the list to be sorted in place, and let n be the number of
elements in L.

II. IF n < StrandThreshold, use the Strand sort,
otherwise use the Shuffle sort.

III. Strand Sort
A. If n = 0, return
B. If n = 1, put the element in L into S and return
C. If n = 2, compare the elements, switch if necessary, and return
D. Let S be a list which will hold the sorted elements of L, and let B
be a list which will be the sub-list
E. Loop through the following while there are still elements in L:
1. Put the first element of L into B
2. Loop through the following letting p point to the first, second,..., last
element of L,
a. If p > last element of BД append p to B, removing from L.
3. Merge B into S
F. Put S into L, and return

IV. Shuffle Sort

Let K be a list of the first n/8 elements of L (remove from L)
Sort K recursivly
Let A be an array of n/8 pointers to elements, and set them to the
elements of K
D. Let B be an array of n/8+1 empty lists. These correspond to the
lists which are inbetween, proceed, and proceed the elements in K
E. For the remaining elements in L, append each to the appropriate
list in B as determined by a binary search in A.
F. recursiviy sort even'' list in B
G. Construct the original list by appending the elements of the first
list in B, then the first element of K, then the second list in B,
then the second element of K, ..., then the nth element of K,
then the n+lth element of B, and return


The necessary operations are appending an element to a list, removing
an element from a list, and having access to the first and last elements
in a list. To maximize speed, use a doubly-lin'ked list with pointers
to the first and last nodes, but not header or trailer nodes.
Possible Modifications
The Strand sort threshold can certainly be adjusted for a particular
programming language, implementation, and platform. Likewise, the n/8
pick in the Shuffle sort may be faster with n/16 or n/4. You could
even have n/k where k is not a power of two. but that will make K
unbalanced, and therefore inefficient. Of course, the time may be
more then made up for, so it's worth a try.

I've already tried some variants to increase speed, but the given
algorithm is the fastest. I've tried sweeping both forward and
backward on the Strand sort, and also checking if an element is
either greater then the last or smaller then the first element of
the sub-list for appending or prepending.

Current Usage

Although many have read the page, only a handful have responded.
Some have claimed that it is mathematically impossible that this
sort betters the quick sort. but they did not actually try to
implement the sort to make the comparison themselves. Others have
claimed that the J Sort is 30 times as fast as the quick sort. Still
others have used it in studies and seminars of sorting on parallel
processing computers. It is the sorting algorithm in Prometheus.

7. Conclusion

The J Sort has a wide variety of applications since elements are
often in a linked-list, and there are no restrictions on the data
type or key type. With this environment it consistently beats all other
typical sorts. Additionally, it is quite suited for parallel processing.
However it does have some restrictions. Please contact me if you have
any questions, comments, or suggestions. Also, if you find the J Sort
useful, please let me k-now about it. Thanks.

=== Cut ===


Good luck
Sergey.


Reply all
Reply to author
Forward
0 new messages