1283 views

Skip to first unread message

Nov 26, 1997, 3:00:00 AM11/26/97

to

Приветствую Вас всезнающий ALL

Поделитесь вашими впечатлениями о SUBJ, интересует его сравнение с

известными сортировками (quick, shell, ....) и оценка эффективности

аля как в книжке Вирта.

Допустим что памяти хватит ;-)

Это страничка из I-NET, адреса там не было :-(

=== Cut ===

A revolutionary new sort from John Cohen

Introduction

Yes, I know I'm not the first to claim I have created one of the

fastest sorts, however please do not dismiss me before at least

reading this paragraph. I have not reinvented the postal sort or

anything like that. I have invented a new sort for linked lists

which runs in O(n) = n ln n time as any good sort should. However,

although the big-O is similar to many other sorts, this runs

consistently faster. I have tested it against the bubble (hah),

selection, insertion, merge, quick, shell, and heap sorts.

The J sort does not need special keys; the only thing you need

is a comparison function which determines if one data point is

less then, equal to, or greater then another. Thus, the sort is

applicable in any sorting situation.

General Overview

Here's the general idea. We have a doubly-linked list of elements

which we would like to sort. If there are a trivial number of

elements (two or fewer), it's trivial to sort them. If there are

fewer then some fixed number (studies show this should be around 35),

then the Strand sort is exectued. This is a sort which I devised that

takes advantage of natural runs in the data, and is very fast for small

numbers of elements. However, it is an n-squared sort, so the speed

advantage is lost with a sizeable number of elements. Therefore, with

any more then that fixed number of elements, another sort of my own

devising called the Shuffle sort is executed. This extracts a subset

of the elements into a new list, and then inserts the other elements

into this list using a very fast insertion routine. Then, to sort the

new list we can break the chore into a great many smaller sorting taskes,

which are accomplished with recursion.

Advantages & Limitations

For every size of list, from 2 elements to 5000, the J sort preformed

faster by a factor of 2 or more. Also, because most of the memory-related

activity is simply rearranging pointers in a linked list, it is very

conservative when it comes to memory usage. However, I would say that

just about any other sort would take less memory. Also, because of the

dynamic nature of the algorithm, it is not fesible to use the J sort on

data which are not in RAM. Therefore, unless all the data points can fit

into memory at once, the J sort is probably not a good choice, although

if you were doing a merge or quick sort, you could use the J sort when

the number of elements gets small enough to squeeze them into memory.

An far as parallel processing goes, the J sort was devised on a serial

computer, but it is actually is quite suited for parallel processing.

If anyone tries this, please let me know.

The Algorithm

As outlined in the General Overview, There are two main parts to

the J sort: the Strand sort which runs with small numbers of elements,

and the Shuffle sort.

First the Strand sort. Studies show that about 30 to 40 is a good threshold

for using Strand over the Shuffle. However, this number could change on

different platforms, languages, and implementations. After dealing with

the trivial cases, the Strand sort pulls out the first element in the

list, and uses it as the first element of a new list called the sub-list.

Then the remainder of the original list is scanned, and every time an

element is found which is greater then the last element of the sub-list,

it is appended to the sub-list, and removed from the original. Now we have

extracted a sorted sub-list from the original list. Putting that list aside,

we then extract another sub-list in the same manner. Then the two sub-lists

are merged. The process of extracting a sub-list and merging with the

others is continued until the original list is exhausted. The Strand sort

takes advantage of natural runs, or many successive elements which are

already in order. Now the Shuffle sort. Let n be the number of elements.

First the first n/8 elements are removed and placed into another list called

the key list. This list is then sorted recursiviy Then an array of n/8

pointers is created, and these are set to point to the n/8 elements in the

key list, in order. Now the remaining 7n/8 elements of the list is inserted

into the key list by making a binary search using the array. Now we have n/8

unordered lists between the n/8 key elements which are sorted. These lists

are then sorted recursiviy. Then the key list is totally sorted. This is very

fast because the array makes the binary search very fast, and the n/8 lists

are all only 8 elements long on the average no matter what the size of the

original list. Also, the insertion algorithm preserves natural runs which

the Strand sort can take advantage of. Here is the algorithm:

The J Sort

I. Let L be the list to be sorted in place, and let n be the number of

elements in L.

II. IF n < StrandThreshold, use the Strand sort,

otherwise use the Shuffle sort.

III. Strand Sort

A. If n = 0, return

B. If n = 1, put the element in L into S and return

C. If n = 2, compare the elements, switch if necessary, and return

D. Let S be a list which will hold the sorted elements of L, and let B

be a list which will be the sub-list

E. Loop through the following while there are still elements in L:

1. Put the first element of L into B

2. Loop through the following letting p point to the first, second,..., last

element of L,

a. If p > last element of BД append p to B, removing from L.

3. Merge B into S

F. Put S into L, and return

IV. Shuffle Sort

Let K be a list of the first n/8 elements of L (remove from L)

Sort K recursivly

Let A be an array of n/8 pointers to elements, and set them to the

elements of K

D. Let B be an array of n/8+1 empty lists. These correspond to the

lists which are inbetween, proceed, and proceed the elements in K

E. For the remaining elements in L, append each to the appropriate

list in B as determined by a binary search in A.

F. recursiviy sort even'' list in B

G. Construct the original list by appending the elements of the first

list in B, then the first element of K, then the second list in B,

then the second element of K, ..., then the nth element of K,

then the n+lth element of B, and return

The necessary operations are appending an element to a list, removing

an element from a list, and having access to the first and last elements

in a list. To maximize speed, use a doubly-lin'ked list with pointers

to the first and last nodes, but not header or trailer nodes.

Possible Modifications

The Strand sort threshold can certainly be adjusted for a particular

programming language, implementation, and platform. Likewise, the n/8

pick in the Shuffle sort may be faster with n/16 or n/4. You could

even have n/k where k is not a power of two. but that will make K

unbalanced, and therefore inefficient. Of course, the time may be

more then made up for, so it's worth a try.

I've already tried some variants to increase speed, but the given

algorithm is the fastest. I've tried sweeping both forward and

backward on the Strand sort, and also checking if an element is

either greater then the last or smaller then the first element of

the sub-list for appending or prepending.

Current Usage

Although many have read the page, only a handful have responded.

Some have claimed that it is mathematically impossible that this

sort betters the quick sort. but they did not actually try to

implement the sort to make the comparison themselves. Others have

claimed that the J Sort is 30 times as fast as the quick sort. Still

others have used it in studies and seminars of sorting on parallel

processing computers. It is the sorting algorithm in Prometheus.

7. Conclusion

The J Sort has a wide variety of applications since elements are

often in a linked-list, and there are no restrictions on the data

type or key type. With this environment it consistently beats all other

typical sorts. Additionally, it is quite suited for parallel processing.

However it does have some restrictions. Please contact me if you have

any questions, comments, or suggestions. Also, if you find the J Sort

useful, please let me k-now about it. Thanks.

=== Cut ===

Good luck

Sergey.

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu