Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Sorting linked lists

2 views

Skip to first unread message

Martin

unread,

Jan 9, 1999, 3:00:00 AM1/9/99

I´m trying to learn computerprogramming and i think i come quite a bit
now. I have come to the linked lists and in the little program that i
try to do i want to sort the linked list. the algorithm i made for it
works like this

1. Search the smallest post.

2. Change the post that point to this post so it will point to the
same post as the smallest post point on.

3 Set the smallest posts pointer to be equal whit the last sorted post
pointer and the last sorted posts pointer to be equal whit the new
sorted post

But it be quite a lot of pointers to keep track on so is there any
better way to sort linked list?

Thanks Martin

fred smith

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

Martin (Ja...@home.se) wrote:
: I´m trying to learn computerprogramming and i think i come quite a bit

: now. I have come to the linked lists and in the little program that i
: try to do i want to sort the linked list. the algorithm i made for it
: works like this

<snip>
: But it be quite a lot of pointers to keep track on so is there any

: better way to sort linked list?

Sorting a linked list which already exists can be tedious and resource-
intensive.

My preferred way is to make sure it is sorted when you build it. That is,
when constructing the list, make sure you only insert each node into what
will be the correct sorted place so you don't NEED to sort it after its
construction.

you can do that by making yourself a small group of routines which do
the following functions:

-one to traverse the list, returns the address of the first node it finds
whose key is GREATER THAN the key of the node you want to insert. Special
cases are: 1) empty list (no nodes in it yet), 2) the new item belongs at
the beginnng (front) of the list, 3) the new item belongs at the end of the
list (i.e., the last item).
-one to insert a node in front of the above-mentioned key (see special cases
mentioned above);
-one to delete the above-mentioned key (special cases: no nodes yet in the
list);

not terribly hard to write, but will save you tons of headaches and
years of CPU time.

Fred
--
---- Fred Smith -- fre...@fcshome.stoneham.ma.us -----------------------------
The eyes of the Lord are everywhere,
keeping watch on the wicked and the good.
----------------------------- Proverbs 15:3 (niv) -----------------------------

Chris Torek

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

>Martin (Ja...@home.se) wrote:
>> ... is there any better way to sort linked list?

In article <F5BpF...@fcshome.stoneham.ma.us> fred smith

<fre...@fcshome.stoneham.ma.us> wrote:
>Sorting a linked list which already exists can be tedious and resource-
>intensive.

Actually, a merge sort is generally both simple and efficient.
The hardest part about a merge sort is breaking a list in half.
If you do not care about "stable" sorts, you can break it in half
the easy way, by alternation:

/*
* split the list (turning this into a function, or inserting
* it into mergesort() below, is left as an exercise to the reader,
* as are better methods of splitting).
*/
struct list *left, *right, **lp, **rp;

/* input: list != NULL; it has an even or odd number of entries */
lp = &left;
rp = &right;
do {
/* put one node on the left-hand list */
*lp = list;
lp = &list->next;
list = list->next;

/* that could be the last node, so check */
if (list == NULL)
break;

/* put another node on the right-hand list */
*rp = list;
rp = &list->next;
list = list->next;
/* and repeat until we run out of nodes */
} while (list != NULL);

/*
* Make sure the two lists are properly terminated. One of
* these two assignments is unnecessary, but so what?
*/
*lp = NULL;
*rp = NULL;

Once you have broken the list into two halves, the left and right
halves should be merge-sorted independently, and then the two lists
can be merged together. This is like taking a deck of cards that
has been shuffled, giving each half to a friend and having him sort
it, and then taking the two sorted piles and merging them pairwise:

/* merge left and right lists (of any length) */
struct list *merge(struct list *left, struct list *right,
int (*compare)(struct list *, struct list *)) {
struct list *new, **np;

/*
* Build the new list by putting in the smaller of each pair.
*/
np = &new;
while (left != NULL && right != NULL) {
if (compare(left, right) > 0) { /* left > right */
*np = right; /* so put the right node */
np = &right->next; /* onto the new list */
right = right->next;
} else { /* left <= right */
*np = left; /* so put the left node */
np = &left->next; /* onto the new list */
left = left->next;
}
}

/*
* Now at least one of "left" and "right" is NULL, but not both.
* If the left list is non-empty, the right one is empty, and we
* just need to tack the left one on to the total list. If the
* right list is non-empty, the left one is empty and we just need
* to tack on the right-hand one. If both are empty, we need to
* set *np = NULL, but in this case, "*np = right" will do the
* trick too, so one single ?: expression suffices.
*/
*np = left != NULL ? left : right;
return (new);
}

That leaves only the degenerate case of sorting an empty list at the
top of the overall sort routine, and you get something like this:

struct list *mergesort(struct list *list,
int (*compare)(struct list *, struct list *)) {
struct list *left, *right;

if (list == NULL) /* nothing to split */
return NULL;
split(list, &left, &right);
left = mergesort(left, compare);
right = mergesort(right, compare);
return merge(left, right, compare);
}

In other words, each friend who has to sort half the card-deck does
*his* job by splitting *that* half-deck in half, having two more
friends sort those, and then merging those back. (Given the
construction above, unlike with people, it is easier to hand off
"no cards" and take back "no cards" than it is to recognize that
"one card" is already sorted. Other merge-sort implementations
are possible, including doing an initial "count the nodes" pass;
in this case, you can stop at "count < 2".)

>My preferred way is to make sure it is sorted when you build it.

In some applications, this is a good approach; however, it tends to
be O(n^2), while merge sort is O(n log n). Generally, if something
should be kept sorted as you go along, some kind of tree structure
is appropriate.
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc
El Cerrito, CA Domain: to...@bsdi.com +1 510 234 3167
http://claw.bsdi.com/torek/ (not always up) I report spam to abuse@.

Doug Spurrell

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

fred smith wrote in message ...

>My preferred way is to make sure it is sorted when you build it. That is,
>when constructing the list, make sure you only insert each node into what
>will be the correct sorted place so you don't NEED to sort it after its
>construction.
>
>you can do that by making yourself a small group of routines which do
>the following functions:

I am also in the process of learning C and linked lists. I am developing a
program for class that handles an inventory for a store. On of the fields is
a Product Code, an integer value I have been using as the key.

My question is regarding saving the data file to the disk. I have been
appending new products to the end of the file and just saving those. Would
it be better to re-save the entire data file already sorted? Rather than
just add the changes?

Thanks

Doug

Quang Trinh

unread,

Jan 10, 1999, 3:00:00 AM1/10/99

to Doug Spurrell

If you already have the list sorted after you insert the new product, then
you should save the sorted list to file. This way, the next time when you
read the file, you don't have to sort the list again.

Chris Torek

unread,

Jan 11, 1999, 3:00:00 AM1/11/99

In article <77aai8$f...@elf.bsdi.com> I wrote a skeleton linked-list
merge sort routine:

> while (left != NULL && right != NULL)

...

> /*
> * Now at least one of "left" and "right" is NULL, but not both.

Oops, this comment is broken. (It started out a little differently,
and I rearranged it and inserted this bug.) The last line should
just read:

* Now at least one of "left" and "right" is NULL.

The rest of the comment asserts the possibility that both are NULL,
and claiming otherwise in the first part is clearly nonsense.

James Weisberg

unread,

Jan 13, 1999, 3:00:00 AM1/13/99

In article <F5BpF...@fcshome.stoneham.ma.us>,

fred smith <fre...@fcshome.stoneham.ma.us> wrote:
>Martin (Ja...@home.se) wrote:
>: I´m trying to learn computerprogramming and i think i come quite a bit
>: now. I have come to the linked lists and in the little program that i
>: try to do i want to sort the linked list. the algorithm i made for it
>: works like this
><snip>

>: But it be quite a lot of pointers to keep track on so is there any

>: better way to sort linked list?
>

>Sorting a linked list which already exists can be tedious and resource-
>intensive.
>

>-one to traverse the list, returns the address of the first node it finds
>whose key is GREATER THAN the key of the node you want to insert. Special
>cases are: 1) empty list (no nodes in it yet), 2) the new item belongs at
>the beginnng (front) of the list, 3) the new item belongs at the end of the
>list (i.e., the last item).
>-one to insert a node in front of the above-mentioned key (see special cases
>mentioned above);
>-one to delete the above-mentioned key (special cases: no nodes yet in the
>list);
>
>not terribly hard to write, but will save you tons of headaches and
>years of CPU time.

Bull! Simply traversing the list is an O(n) operation. You
want to do that for *every* element you add to the list? That's
n*O(n)! Horrible!
In addition, if you only use singly-linked lists, simply adding
values in the middle is a hassle. You *must* traverse the list just
to find the node prior to the place you need to enter the new value.
Finally, you have no flexibility in your sorting metric. What
if the list contains pointers to a struct and you want to sort on
more than one element of the struct? You can't have a list sorted
both by name *and* by age at the same time!

In my experience, gaurenteeing a sorted list for every new
element added is rare. Usually you do some operations on the list,
and then need to sort the result. Thus sorting can be left as an
operation to do *when it needs to be done* instead of every time
an element is added.
But thinking about the nature of your data set can significantly
increase the speed of your code. I see two general cases:

1) Data is constantly changing.
In this case, the list itself might grow to a certain fixed,
or near-fixed size, but the data in the list gets constantly
reshuffled. In this case, I prefer turning a list into an
array of data pointers, and then using a beefed up quicksort
to sort the data. If done carefully, the array can be re-used
so long as the list doesn't grow or shrink. However, it does
use more memory. But memory is cheap.

2) Size is constantly changing.
In this case, the nodes of the list are expanding and shrinking.
If a list is the data structure used, then a mergesort is the
way to go.

There is an additional concern if you happen to use double-
linked lists. The benefit is that you can quickly insert data
anywhere into the list, especially the tail, but a mergesort is
now going to have to re-establish twice as many node pointers.
By turning lists into arrays, you can avoid this problem.

--
World's Greatest Living Poster

William Boyle

unread,

Jan 14, 1999, 3:00:00 AM1/14/99

Well put! Converting a list into an array, sorting the array (qsort is
usually good for randomly ordered lists), then rebuilding the list, is
usually a reasonable approach for any but the smallest of lists. Keep a
list sorted when it is getting a lot of insertions is totally painful.
Better to use a tree or array. When I don't want the complexity of tree
structures I use arrays with some modified bsearch routines which return
the location of the insertion point in the array, allowing insertion by
moving the elements below it down (memmove() works good) and copying the
new element into its position. Worst situation is if there is no room to
move the bottom part of the array down, in which case you have to
realloc() the array. It's still incredibly faster than keeping a
linked-list sorted. The algorithm falls down when you are continuously
at the top of the array and the array is quite big. Then you spend a lot
of machine cycles moving memory around. On the average, however, it is
very effective, but it is still MUCH faster than finding the bottom of
an equally sized linked-list.

-Bill Boyle

Lawrence Kirby

unread,

Jan 14, 1999, 3:00:00 AM1/14/99

In article <77jaaa$d...@tekka.wwa.com>
chad...@news.wwa.com "James Weisberg" writes:

...

>>-one to traverse the list, returns the address of the first node it finds
>>whose key is GREATER THAN the key of the node you want to insert. Special
>>cases are: 1) empty list (no nodes in it yet), 2) the new item belongs at
>>the beginnng (front) of the list, 3) the new item belongs at the end of the
>>list (i.e., the last item).
>>-one to insert a node in front of the above-mentioned key (see special cases
>>mentioned above);
>>-one to delete the above-mentioned key (special cases: no nodes yet in the
>>list);
>>
>>not terribly hard to write, but will save you tons of headaches and
>>years of CPU time.
>
> Bull! Simply traversing the list is an O(n) operation. You
>want to do that for *every* element you add to the list? That's
>n*O(n)! Horrible!

Indeed this approach is rather inefficient compared to some of the
alternatives.

> In addition, if you only use singly-linked lists, simply adding
>values in the middle is a hassle. You *must* traverse the list just
>to find the node prior to the place you need to enter the new value.
> Finally, you have no flexibility in your sorting metric. What
>if the list contains pointers to a struct and you want to sort on
>more than one element of the struct? You can't have a list sorted
>both by name *and* by age at the same time!

In that case you'd need to create an extra index, but that would be
necessary however you do it.

> In my experience, gaurenteeing a sorted list for every new
>element added is rare. Usually you do some operations on the list,
>and then need to sort the result. Thus sorting can be left as an
>operation to do *when it needs to be done* instead of every time
>an element is added.
> But thinking about the nature of your data set can significantly
>increase the speed of your code. I see two general cases:
>
> 1) Data is constantly changing.
> In this case, the list itself might grow to a certain fixed,
> or near-fixed size, but the data in the list gets constantly
> reshuffled. In this case, I prefer turning a list into an
> array of data pointers, and then using a beefed up quicksort
> to sort the data. If done carefully, the array can be re-used
> so long as the list doesn't grow or shrink. However, it does
> use more memory. But memory is cheap.

a list merge sort is good for this, It is comparable with the best
comparison based array sort.

> 2) Size is constantly changing.
> In this case, the nodes of the list are expanding and shrinking.
> If a list is the data structure used, then a mergesort is the
> way to go.
>
>

> There is an additional concern if you happen to use double-
>linked lists. The benefit is that you can quickly insert data
>anywhere into the list, especially the tail, but a mergesort is
>now going to have to re-establish twice as many node pointers.
>By turning lists into arrays, you can avoid this problem.

Unfortunately the solution is worse than the problem. To merge sort a
doubly linked list you sort it as a singly linked list and finally
perform a single pass through the list reestablishing the reverse links.
This is a trivial O(N) step which is insignificant compared to the
sort as a whole (with any reasonable number of elements). If you use
arrays to sort you change this to 2 steps, one to create the array and
one to reestablish the linked list. If you mean dispense with the linked
list completely then fine, but this decision shouldn't be based on sorting
method (since arrays and linked lists can both be sorted efficiently)
but on what other access/update requirements are to be placed on the
datastructure.

The only practical advantage of array sorts is that C has the qsort()
function available in the standard library (unless you start going into
things like radix sorts). Given that you can grab pretty much stock
list merge sort code this isn't much of an advantage.

--
-----------------------------------------
Lawrence Kirby | fr...@genesis.demon.co.uk
Wilts, England | 7073...@compuserve.com
-----------------------------------------

James Weisberg

unread,

Jan 14, 1999, 3:00:00 AM1/14/99

In article <916321...@genesis.demon.co.uk>,

Lawrence Kirby <fr...@genesis.demon.co.uk> wrote:
>In article <77jaaa$d...@tekka.wwa.com>
> chad...@news.wwa.com "James Weisberg" writes:

expanding more on this topic...

>> Finally, you have no flexibility in your sorting metric. What
>>if the list contains pointers to a struct and you want to sort on
>>more than one element of the struct? You can't have a list sorted
>>both by name *and* by age at the same time!
>
>In that case you'd need to create an extra index, but that would be
>necessary however you do it.

Yes, but the original poster wanted to keep a list sorted as
it was built. That process is intrinsically ambiguous when your
list data has multiple members. That's why the approach is what
I would call bogus. It might be easy on the programmer for a simple
list of a simple data type, but the efficiency is poor as is the
versatility.

>> In my experience, gaurenteeing a sorted list for every new
>>element added is rare. Usually you do some operations on the list,
>>and then need to sort the result. Thus sorting can be left as an
>>operation to do *when it needs to be done* instead of every time
>>an element is added.
>> But thinking about the nature of your data set can significantly
>>increase the speed of your code. I see two general cases:
>>
>> 1) Data is constantly changing.
>> In this case, the list itself might grow to a certain fixed,
>> or near-fixed size, but the data in the list gets constantly
>> reshuffled. In this case, I prefer turning a list into an
>> array of data pointers, and then using a beefed up quicksort
>> to sort the data. If done carefully, the array can be re-used
>> so long as the list doesn't grow or shrink. However, it does
>> use more memory. But memory is cheap.
>
>a list merge sort is good for this, It is comparable with the best
>comparison based array sort.

True. However, there are tricks that can be built into qsort for
"almost" sorted arrays that switch to an insertion sort algorithm
which is almost linear. There are also ways to tailor array sorting
algorithms which beef up on swaps while reducing comparisions. Swaps
are always favorable because comparisions can take much longer.
Basically there's more flexibility in using array-based sorting
algorithms which can be exploited if you know something about the
nature of your data set. However, if you want a straightforward and
easy maintenance approach, the list merge sort is fine, as I had
written originally.

>> 2) Size is constantly changing.
>> In this case, the nodes of the list are expanding and shrinking.
>> If a list is the data structure used, then a mergesort is the
>> way to go.
>>
>>
>> There is an additional concern if you happen to use double-
>>linked lists. The benefit is that you can quickly insert data
>>anywhere into the list, especially the tail, but a mergesort is
>>now going to have to re-establish twice as many node pointers.
>>By turning lists into arrays, you can avoid this problem.
>
>Unfortunately the solution is worse than the problem. To merge sort a
>doubly linked list you sort it as a singly linked list and finally
>perform a single pass through the list reestablishing the reverse links.
>This is a trivial O(N) step which is insignificant compared to the
>sort as a whole (with any reasonable number of elements). If you use
>arrays to sort you change this to 2 steps, one to create the array and
>one to reestablish the linked list. If you mean dispense with the linked
>list completely then fine, but this decision shouldn't be based on sorting
>method (since arrays and linked lists can both be sorted efficiently)
>but on what other access/update requirements are to be placed on the
>datastructure.

I have a couple responses to this, which, could really fill up a
very long post; but I'll try to be brief.
When you say, "if you mean to dispense with the linked list
completely", that broaches on one aspect of what I had touched on.
If the list is basically of static size for a period of time, and
only the data is changing, then you can keep that array and reuse
it over and over again. In fact, you only have to create the array
once and unless you *need* the list itself sorted, you don't have
to go back and reset the list data pointers to match the array.
Even if you do want the list sorted, you might not always *need*
it sorted. That is to say, when the list is called upon in the
context where sorted contents need to be reached, then and only
then do you have to go back and "reestablish the linked list" as
you say. Meanwhile, you can enjoy the benefits of array-based
sorting and searching while the data is nicely stored in that
indexed form.
Yet, if you *really* want to get tricky, there is a way to
have both list and array sorted at the same time without the need
to go back and reestablish the list. That is to say, when the
array is sorted, so is the list. The way to do this is to have
double pointers to the data. In this way, there's an extra ring
as I like to think about it, between the list nodes and the
data, and while you're shuffling the data pointers around, you're
keeping the ring intact. Thus, both the array and the list see
the ring the same but the data hanging off each node in the ring
is now sorted. I admit this is rather befuddling, however, and
allocating this extra level of pointers to achieve such a
generality is itself time-consuming. Simply getting to the data
takes a double indirection. There may be some warped scenarios
where this approach could actually prove fastest, but I tried
it merely as a curious exercise. Anyone looking over your
shoulder at this mess could suffer serious brain damage. ;-)

>The only practical advantage of array sorts is that C has the qsort()
>function available in the standard library (unless you start going into
>things like radix sorts). Given that you can grab pretty much stock
>list merge sort code this isn't much of an advantage.

Heck, for some OSes, it's a decided *dis*advantage! I have
found the library qsort() to blow on most implementations. The
gnu version is superior if you want to incorporate that into
your code. Still, I have generally improved on that as well
because I don't need to worry about the size of the data that
I'm passing in there. I know it's always going to be a character
pointer and so I can dispense with anything that uses size.
In fact, this is a big gain because the SWAP routine in any
library implementation will actually copy the data at the
pointer locations if need be. My swaps are *always* pointer
swaps.
There are other things I've fiddled as well, but outlining
them isn't particularly within the scope of this post. I bring
this up because turning lists into arrays and back again can
be customized much better than simply being able to use the
library qsort(). If you're sitting there thinking that your
library implementation is superior and qsort() is the general
way to go, I'd caution you that there are qsorts and then there
are qsorts!

James Weisberg

unread,

Jan 14, 1999, 3:00:00 AM1/14/99

In article <369DA0BE...@mediaone.net>,

William Boyle <wob...@mediaone.net> wrote:
>Well put! Converting a list into an array, sorting the array (qsort is
>usually good for randomly ordered lists), then rebuilding the list, is
>usually a reasonable approach for any but the smallest of lists. Keep a
>list sorted when it is getting a lot of insertions is totally painful.
>Better to use a tree or array. When I don't want the complexity of tree
>structures I use arrays with some modified bsearch routines which return
>the location of the insertion point in the array, allowing insertion by
>moving the elements below it down (memmove() works good) and copying the
>new element into its position. Worst situation is if there is no room to
>move the bottom part of the array down, in which case you have to
>realloc() the array. It's still incredibly faster than keeping a
>linked-list sorted. The algorithm falls down when you are continuously
>at the top of the array and the array is quite big. Then you spend a lot
>of machine cycles moving memory around. On the average, however, it is
>very effective, but it is still MUCH faster than finding the bottom of
>an equally sized linked-list.

Yeah, I've had some experience coding up fancy arrays to
act like lists. This is sort of the reverse of what I was talking
about before in coding up fancy lists to act like arrays. Frankly,
I don't do enough work requiring heavy use of data structures to
tell you which is better and why. I tend to play with this stuff
as a hobby.
But anyway, like my lists, my arrays have a header node which
keeps track of some information about the data/size of the structure.
Each node knows where the head and tail of the array is at present,
so it's possible to do stack/queue-like operations. However, the
thing is also an array so I can also do stuff like pick out the
nth-member of the list and even sort it. However, sorting the
array first requires that elements hanging around at the end of
the array get copied down to the the beginning such that a[0..m-1]
is filled while a[m..n] are empty. Thus I can pass off the array of
size m to my standard library of sorting routines and have it act
on it just like it was a normal array.
I believe Perl uses dynamic arrays for most (all?) of its list-
like operations. At present, my arrays don't grow automatically. I
require that the array is initialized to a size when first created
(much like pre-extending an array in Perl). And then if it turns
out you need more space, you can call a resize function to allocate
more memory and keep all the other stuff intact. I could put this
into the routines which add elements to the array so resizing is
done automatically, but I figure if you chose to use this data
structure, you have some idea about the bounds of your data set.
Thus I don't envision calling malloc/realloc but once or twice
as those calls are expensive anyway. The worst thing to do in
both list and array implementations is to wind up calling memory
allocation routines for every node added. This more than anything
else will put the zap on a cpu. I've seen some rather poor
memory allocation libraries out there as well.

Lawrence Kirby

unread,

Jan 14, 1999, 3:00:00 AM1/14/99

In article <77l886$n...@tekka.wwa.com>
chad...@news.wwa.com "James Weisberg" writes:

>In article <916321...@genesis.demon.co.uk>,
>Lawrence Kirby <fr...@genesis.demon.co.uk> wrote:
>>In article <77jaaa$d...@tekka.wwa.com>
>> chad...@news.wwa.com "James Weisberg" writes:
>
>expanding more on this topic...
>
>>> Finally, you have no flexibility in your sorting metric. What
>>>if the list contains pointers to a struct and you want to sort on
>>>more than one element of the struct? You can't have a list sorted
>>>both by name *and* by age at the same time!
>>
>>In that case you'd need to create an extra index, but that would be
>>necessary however you do it.
>
> Yes, but the original poster wanted to keep a list sorted as
>it was built.

I can't find anything in the original article that implied that.

>That process is intrinsically ambiguous when your
>list data has multiple members. That's why the approach is what
>I would call bogus. It might be easy on the programmer for a simple
>list of a simple data type, but the efficiency is poor as is the
>versatility.

It depends on whether you need to look up on more than one member or not,
very often you don't. Again, there was nothing int the original article
of the thread that mentions multiple members.

...

> True. However, there are tricks that can be built into qsort for
>"almost" sorted arrays that switch to an insertion sort algorithm
>which is almost linear.

There are similar tricks for merge sorts.

>There are also ways to tailor array sorting
>algorithms which beef up on swaps while reducing comparisions. Swaps
>are always favorable because comparisions can take much longer.

Merge sort is already within a hair's breadth of the theoretical
minimum of average number of comparisons for a comparison sort. If you're
talking about mostly ordered data this is a similar issue to the last point.
Also I disagree that swaps are necessarily cheaper than comparisons.

> Basically there's more flexibility in using array-based sorting
>algorithms which can be exploited if you know something about the
>nature of your data set. However, if you want a straightforward and
>easy maintenance approach, the list merge sort is fine, as I had
>written originally.

I think you're underestimating the felxibility available in list based
sorting.

...

>>Unfortunately the solution is worse than the problem. To merge sort a
>>doubly linked list you sort it as a singly linked list and finally
>>perform a single pass through the list reestablishing the reverse links.
>>This is a trivial O(N) step which is insignificant compared to the
>>sort as a whole (with any reasonable number of elements). If you use
>>arrays to sort you change this to 2 steps, one to create the array and
>>one to reestablish the linked list. If you mean dispense with the linked
>>list completely then fine, but this decision shouldn't be based on sorting
>>method (since arrays and linked lists can both be sorted efficiently)
>>but on what other access/update requirements are to be placed on the
>>datastructure.
>
> I have a couple responses to this, which, could really fill up a
>very long post; but I'll try to be brief.
> When you say, "if you mean to dispense with the linked list
>completely", that broaches on one aspect of what I had touched on.
>If the list is basically of static size for a period of time, and
>only the data is changing, then you can keep that array and reuse
>it over and over again.

Similarly for the list elements.

>In fact, you only have to create the array
>once and unless you *need* the list itself sorted, you don't have
>to go back and reset the list data pointers to match the array.
>Even if you do want the list sorted, you might not always *need*
>it sorted. That is to say, when the list is called upon in the
>context where sorted contents need to be reached, then and only
>then do you have to go back and "reestablish the linked list" as
>you say. Meanwhile, you can enjoy the benefits of array-based
>sorting and searching while the data is nicely stored in that
>indexed form.

If you want an array you probably shouldn't have been using a list in
the first place. My point is that if you're having to convert between
forms like this there's probably something wrong with the program's
design.

> Yet, if you *really* want to get tricky, there is a way to
>have both list and array sorted at the same time without the need
>to go back and reestablish the list. That is to say, when the
>array is sorted, so is the list. The way to do this is to have
>double pointers to the data. In this way, there's an extra ring
>as I like to think about it, between the list nodes and the
>data, and while you're shuffling the data pointers around, you're
>keeping the ring intact. Thus, both the array and the list see
>the ring the same but the data hanging off each node in the ring
>is now sorted. I admit this is rather befuddling, however, and
>allocating this extra level of pointers to achieve such a
>generality is itself time-consuming. Simply getting to the data
>takes a double indirection. There may be some warped scenarios
>where this approach could actually prove fastest, but I tried
>it merely as a curious exercise. Anyone looking over your
>shoulder at this mess could suffer serious brain damage. ;-)

Now you would need to sort both the list and the array. You'd probably be
better off with some sort of tree datastructure.

>>The only practical advantage of array sorts is that C has the qsort()
>>function available in the standard library (unless you start going into
>>things like radix sorts). Given that you can grab pretty much stock
>>list merge sort code this isn't much of an advantage.
>
> Heck, for some OSes, it's a decided *dis*advantage! I have
>found the library qsort() to blow on most implementations. The
>gnu version is superior if you want to incorporate that into
>your code.

Doesn't that use merge sort as its primary algorithm? :-)

>Still, I have generally improved on that as well
>because I don't need to worry about the size of the data that
>I'm passing in there. I know it's always going to be a character
>pointer and so I can dispense with anything that uses size.
>In fact, this is a big gain because the SWAP routine in any
>library implementation will actually copy the data at the
>pointer locations if need be. My swaps are *always* pointer
>swaps.

Yes, there are unavoidable overheads in qsort() implementations, also
of course function calls to the comparison function.

> There are other things I've fiddled as well, but outlining
>them isn't particularly within the scope of this post. I bring
>this up because turning lists into arrays and back again can
>be customized much better than simply being able to use the
>library qsort(). If you're sitting there thinking that your
>library implementation is superior and qsort() is the general
>way to go, I'd caution you that there are qsorts and then there
>are qsorts!

That's certainly true.

James Weisberg

unread,

Jan 14, 1999, 3:00:00 AM1/14/99

In article <916352...@genesis.demon.co.uk>,

Lawrence Kirby <fr...@genesis.demon.co.uk> wrote:
>>> chad...@news.wwa.com "James Weisberg" writes:
>>expanding more on this topic...
>>
>>>> Finally, you have no flexibility in your sorting metric. What
>>>>if the list contains pointers to a struct and you want to sort on
>>>>more than one element of the struct? You can't have a list sorted
>>>>both by name *and* by age at the same time!
>>>
>>>In that case you'd need to create an extra index, but that would be
>>>necessary however you do it.
>>
>> Yes, but the original poster wanted to keep a list sorted as
>>it was built.
>
>I can't find anything in the original article that implied that.

The sentence which garnered my original reply was the following:

fred smith <fre...@fcshome.stoneham.ma.us> wrote:
>My preferred way is to make sure it is sorted when you build it. That is,
>when constructing the list, make sure you only insert each node into what
>will be the correct sorted place so you don't NEED to sort it after its
>construction.

And this approach *obviously* fails when the very definition of
sorted is ambiguous. A list that contains pointers to a struct with
various members cannot be kept sorted while built unless you pick
one of the members to sort on. So what happens when you want to sort
on a different key? You have to totally rebuild a new list. This is
an obvious flaw in the design philosophy of keeping a list sorted
while its being built (to say nothing of efficiency issues).

>> True. However, there are tricks that can be built into qsort for
>>"almost" sorted arrays that switch to an insertion sort algorithm
>>which is almost linear.
>
>There are similar tricks for merge sorts.

I'm sure. I'm not as experienced in that regard.

>Also I disagree that swaps are necessarily cheaper than comparisons.

They are when all you're doing is pointer swaps. Any and all swaps
I make inside a sorting algorithm is defined simply as:

#define SWAP(a, b) { swaptmp=(*a); (*a)=(*b); (*b)=swaptmp; }

That's about as clean as it gets. Simply calling the comparision
function is probably worse than calling the SWAP, even if it did nothing.
Since my library is generic, passing the comparision function is
required. A priori, that function can be quite complicated. That's
why it's always best to save comparisions in favor of swaps.

>> Basically there's more flexibility in using array-based sorting
>>algorithms which can be exploited if you know something about the
>>nature of your data set. However, if you want a straightforward and
>>easy maintenance approach, the list merge sort is fine, as I had
>>written originally.
>
>I think you're underestimating the felxibility available in list based
>sorting.

Could be. I'm still in the process of evaluating these
techniques.

[...]

>> I have a couple responses to this, which, could really fill up a
>>very long post; but I'll try to be brief.
>> When you say, "if you mean to dispense with the linked list
>>completely", that broaches on one aspect of what I had touched on.
>>If the list is basically of static size for a period of time, and
>>only the data is changing, then you can keep that array and reuse
>>it over and over again.
>
>Similarly for the list elements.

Yes, but once you got the array, you don't have to worry
about making these passes through it to assign the list data
pointers after you've done it once. However, for every merge
sort, you do have to worry about the list pointers. And if
you're using double lists, for every merge sort, you'll have
to go back and perform that O(n) operation on its prev pointers.

>>In fact, you only have to create the array
>>once and unless you *need* the list itself sorted, you don't have
>>to go back and reset the list data pointers to match the array.
>>Even if you do want the list sorted, you might not always *need*
>>it sorted. That is to say, when the list is called upon in the
>>context where sorted contents need to be reached, then and only
>>then do you have to go back and "reestablish the linked list" as
>>you say. Meanwhile, you can enjoy the benefits of array-based
>>sorting and searching while the data is nicely stored in that
>>indexed form.
>
>If you want an array you probably shouldn't have been using a list in
>the first place. My point is that if you're having to convert between
>forms like this there's probably something wrong with the program's
>design.

That's a better argument which I can't neccessarily answer.
What I've done in practice is simply put in the intelligence into
my list routines such that they can become arrays and indexed if
need be. The instructions to do this involve little overheard on
the basic list operations while the flexibility is there to do
array sorting and searching routines.
Anyone could sit back and say that for any particular application,
this or that data structure is superior; but when it comes to writing
said application, sometimes a more generic library is a timesaver.
Look at Perl -- one comfortable with associative arrays could use
them in almost any context even though they aren't always the best
way to arrange data.

>> Yet, if you *really* want to get tricky, there is a way to
>>have both list and array sorted at the same time without the need
>>to go back and reestablish the list. That is to say, when the
>>array is sorted, so is the list. The way to do this is to have
>>double pointers to the data. In this way, there's an extra ring
>>as I like to think about it, between the list nodes and the
>>data, and while you're shuffling the data pointers around, you're
>>keeping the ring intact. Thus, both the array and the list see
>>the ring the same but the data hanging off each node in the ring
>>is now sorted. I admit this is rather befuddling, however, and
>>allocating this extra level of pointers to achieve such a
>>generality is itself time-consuming. Simply getting to the data
>>takes a double indirection. There may be some warped scenarios
>>where this approach could actually prove fastest, but I tried
>>it merely as a curious exercise. Anyone looking over your
>>shoulder at this mess could suffer serious brain damage. ;-)
>
>Now you would need to sort both the list and the array. You'd probably be
>better off with some sort of tree datastructure.

?? No. Not at all. Maybe a picture would help. Here's part of a
document I wrote describing the procedure:

One thing to think about at this point is the way in which Perl handles
something like this. The sort operator in Perl is not an in place sorting
operator. Results of a sort must be copied back to another array (or
the same array). Hence, we could grow a new list and copy the data at
those pointers to the new list and then have a sorted list. However, I
consider this wasteful and it's also very slow. I want to sort a list in
place without doing any data copying! How do you do that?

The answer is to add a new level of indirection to the story. Let's
begin with a new picture:

(D1) -> "Hi" (D2) -> "There"
^ ^
| |
(P1) (P2)
^ ^
| |
(N0) <=> (N1) <======> (N2) <=> (N3) ... (NM)

The swap happens at the D1 and D2 level but the P1 and P2 level
remain the same. The resulting picture looks like this:

(D2) -> "There" (D1) -> "Hi"
^ ^
| |
(P1) (P2)
^ ^
| |
(N0) <=> (N1) <========> (N2) <=> (N3) ... (NM)

The list nodes are N0 thru NM. They aren't touched at all. Neither
are the P0..PM, which get stuck into the array. Thus, sorting the
array *also* sorts the list without any modification to the list
whatsoever.

Like I said, unless you want to inflict some damage to your
brain, I don't suggest using this method as a list merge sort will
do nicely. However, I was playing with the idea of having a duel-use
structure -- something that can perform as both a list and an array
simultaneously from the point of view of the user. Perl is pretty
good with this but it copies a lot of data around in the process.
What I outlined above is a way to have that duel use structure
without doing all the data copying.

>>Still, I have generally improved on that as well
>>because I don't need to worry about the size of the data that
>>I'm passing in there. I know it's always going to be a character
>>pointer and so I can dispense with anything that uses size.
>>In fact, this is a big gain because the SWAP routine in any
>>library implementation will actually copy the data at the
>>pointer locations if need be. My swaps are *always* pointer
>>swaps.
>
>Yes, there are unavoidable overheads in qsort() implementations, also
>of course function calls to the comparison function.

Well that's an unavoidable overhead unless you want to write
a qsort specifically for a dedicated application. But worrying
about the size of the data you're sorting is not really neccessary
if you gaurentee that you're always going to pass char pointers to
data. Those are all the same size.

>> There are other things I've fiddled as well, but outlining
>>them isn't particularly within the scope of this post. I bring
>>this up because turning lists into arrays and back again can
>>be customized much better than simply being able to use the
>>library qsort(). If you're sitting there thinking that your
>>library implementation is superior and qsort() is the general
>>way to go, I'd caution you that there are qsorts and then there
>>are qsorts!
>
>That's certainly true.

And so far the best generic implementation I've seen is the
gnu version. It's always beaten other library implementations I've
come across. I'd strongly suggest checking it out if you're doing
some serious sorting applications.

Richard Heathfield

unread,

Jan 27, 1999, 3:00:00 AM1/27/99

(long posting on list sorting omitted for brevity)

Pardon my ignorance, but why not just use a binary tree here?

0 new messages