Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Priority queue efficiency

23 views
Skip to first unread message

Dann Corbit

unread,
Oct 6, 1999, 3:00:00 AM10/6/99
to
Nick Maclaren <nm...@cus.cam.ac.uk> wrote in message
news:7tgm26$j62$1...@pegasus.csx.cam.ac.uk...
> In article <V6OK3.231$hf5.4311@client>,
> Dann Corbit <dco...@solutionsiq.com> wrote:
> >What is the most efficient priority queue for "real-life" use which has
> >actually been tried?
>
> One of these days I really must find out what people are on about
> when they talk about 'priority queues' and all that. I know what a
> queue is, in the traditional sense, and priorities have nothing to
> do with it :-)
>
> I suspect that you mean a dynamic (i.e. updated) sorted list, and
> thinking in the latter terms may help.
Priority queues pointer:
http://members.xoom.com/killough/heaps.html

> >I have a subcase which is especially interesting to me. There will be a
> >small queue of only 50 or less items. The queue will never grow larger
than
> >50, though it may get smaller. Each operation on the queue will consist
of
> >the following:
> >0. Remove the smallest item.
> >1. Insert an item in the queue from wherever the smallest came from,
unless
> >that source is empty {the smallest item knows where it came from}.
> >2. Repeat until the queue is empty.
> >
> >The above sequence would be executed hundreds of millions of times. The
> >*only* operations I need are insert and delete min. Are there any
special
> >queues that lend themselves to this particular application?
>
> Um. You are asking a very strange question. By specifying a small
> queue, you are necessarily not using complexity theory, and therefore
> the details of the computational primitives are critical.
>
> In practice, one very efficient way of handling this is to use a
> vector (or, rather, two) and simply insert the item in the appropriate
> place. The cost of a 50-item copy (in cache, in this context) is
> minimal compared to the cost of the logic on almost all modern
> systems.
Sorting stinks, though binary insertion might be acceptable with the right
data structure. There are supposed to be priority queues which have O(1)
for every operation except deletemin() which is O(log(n)). This thing needs
to be *stupidly* fast. I have tried Fibbonaci heaps, and they don't [cough]
perform as advertized. The fastest thing I have found is binomial priority
queues (as far as implementations go). There are supposed to be things that
are much, much faster. I was wondering "Where are they?" "Do they really
work as advertized, or do they only perform better as the queue size
approaches infinity" ... things of that sort.

> But I don't think that is what you are asking for. Can you
> clarify the question?
Imagine a set of ordered lists of arbitrary length. The head of each list
is poked into a priority queue. The priority of the item in the list is the
value of the item's key. I want to:
0. Extract the smallest item from the queue (which deletes the head of the
list (smallest item) and inserts the next item from the list into the
priority queue. This {replacement} item is not necessarily the smallest
item in the priority queue, but we know it is the same size or bigger than
the one it replaced (since the lists are ordered).
2. Go to 0 until all the ordered lists are empty {if ever}.

Since there will be an enormous number of items (hundreds of millions to
billions) the algorithm's efficiency is [cough] important.
--
C-FAQ: http://www.eskimo.com/~scs/C-faq/top.html
"The C-FAQ Book" ISBN 0-201-84519-9
C.A.P. Newsgroup http://www.dejanews.com/~c_a_p
C.A.P. FAQ: ftp://38.168.214.175/pub/Chess%20Analysis%20Project%20FAQ.htm

Nick Maclaren

unread,
Oct 6, 1999, 3:00:00 AM10/6/99
to
In article <V6OK3.231$hf5.4311@client>,
Dann Corbit <dco...@solutionsiq.com> wrote:
>What is the most efficient priority queue for "real-life" use which has
>actually been tried?

One of these days I really must find out what people are on about
when they talk about 'priority queues' and all that. I know what a
queue is, in the traditional sense, and priorities have nothing to
do with it :-)

I suspect that you mean a dynamic (i.e. updated) sorted list, and
thinking in the latter terms may help.

>I have a subcase which is especially interesting to me. There will be a


>small queue of only 50 or less items. The queue will never grow larger than
>50, though it may get smaller. Each operation on the queue will consist of
>the following:
>0. Remove the smallest item.
>1. Insert an item in the queue from wherever the smallest came from, unless
>that source is empty {the smallest item knows where it came from}.
>2. Repeat until the queue is empty.
>
>The above sequence would be executed hundreds of millions of times. The
>*only* operations I need are insert and delete min. Are there any special
>queues that lend themselves to this particular application?

Um. You are asking a very strange question. By specifying a small
queue, you are necessarily not using complexity theory, and therefore
the details of the computational primitives are critical.

In practice, one very efficient way of handling this is to use a
vector (or, rather, two) and simply insert the item in the appropriate
place. The cost of a 50-item copy (in cache, in this context) is
minimal compared to the cost of the logic on almost all modern
systems.

But I don't think that is what you are asking for. Can you
clarify the question?


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QG, England.
Email: nm...@cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679

Dann Corbit

unread,
Oct 6, 1999, 3:00:00 AM10/6/99
to
In priority queues, it has been reported that it is possible to implement a
queue with all operations O(1), except delete which is O(log(n)).

My question is: "Is this ever realizable in practice, or is it just
mathematical goo, with the truth coming at infinity?"

The reason I ask is that I have tried a number of implementations of
Fibonnaci heaps, etc. and find that Binomial heaps stomp them in real
"measured" performance. By orders of magnitude for large sets of data.

What is the most efficient priority queue for "real-life" use which has
actually been tried?

I have a subcase which is especially interesting to me. There will be a


small queue of only 50 or less items. The queue will never grow larger than
50, though it may get smaller. Each operation on the queue will consist of
the following:
0. Remove the smallest item.
1. Insert an item in the queue from wherever the smallest came from, unless
that source is empty {the smallest item knows where it came from}.
2. Repeat until the queue is empty.

The above sequence would be executed hundreds of millions of times. The
*only* operations I need are insert and delete min. Are there any special
queues that lend themselves to this particular application?

--

Glenn

unread,
Oct 6, 1999, 3:00:00 AM10/6/99
to
"Dann Corbit" <dco...@solutionsiq.com> writes:

>> >What is the most efficient priority queue for "real-life" use which has
>> >actually been tried?

>There are supposed to be priority queues which have O(1)


>for every operation except deletemin() which is O(log(n)). This thing needs
>to be *stupidly* fast. I have tried Fibbonaci heaps, and they don't [cough]
>perform as advertized.
>The fastest thing I have found is binomial priority
>queues (as far as implementations go). There are supposed to be things that
>are much, much faster. I was wondering "Where are they?" "Do they really
>work as advertized, or do they only perform better as the queue size
>approaches infinity" ... things of that sort.

You can try pairing heaps. Their complexity is similar to Fibonacci heaps
but they are simpler and they are supposed to work well in practice (I've
never tried them).

If you are interested in the lastest complexity results, I can direct
you to my officemate who has some new results on pairing heaps.

Jakob Pagter

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
nm...@cus.cam.ac.uk (Nick Maclaren) writes:

> In article <W7RK3.278$hf5.5742@client>,


> Dann Corbit <dco...@solutionsiq.com> wrote:
> >>
> >> I suspect that you mean a dynamic (i.e. updated) sorted list, and
> >> thinking in the latter terms may help.
> >

> >Priority queues pointer:
> >http://members.xoom.com/killough/heaps.html
>

> Ah. Computer science terminology strikes again. Yes, those are
> simple updateable sorted lists, and are not really queues at all.
> The term was taken from probability theory and, as is so common,
> mangled somewhat on the way. But at least "priority queues" is
> not ambiguous, as there is no such thing in probability theory as
> far as I know.

No, "priority queues" are not simply "updateable sorted lists". (In
the comparison model) I can make N insertions into a priority queue
(of my choice) in time O(N), whereas this requires time \Omega(n\lg n)
for a sorted list.

BTW, I am sure that the term might have been taken from probability
theory, as well many other words used in computer science were not
invented by computer scientists (like computer, science, etc.), but it
is a rather standard expression that you'll find in basically
algorithms and data structures book.

-- Jakob

Dann Corbit

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
Nick Maclaren <nm...@cus.cam.ac.uk> wrote in message
news:7tj6qu$lc$1...@pegasus.csx.cam.ac.uk...
> In article <UX4L3.1337$XB5.3524@client>,

> Dann Corbit <dco...@solutionsiq.com> wrote:
> >Nick Maclaren <nm...@cus.cam.ac.uk> wrote in message
> >news:7til9k$fh4$1...@pegasus.csx.cam.ac.uk...
> >[snip]
> >> In particular, if you claim that you can locate an item amongst N
> >> sorted items in unit time, you are using a somewhat unusual
> >> computational model.
> >
> >In the case of a priority queue, you are looking at finding only the
"top"
> >item, so it is the first item in the list. The item is not arbitrary.
>
> All right, make that finding the largest item in N unsorted items
> in unit time. I don't care. EITHER of them is an unusual computational
> model!
>
> Incidentally, I don't regard a thousand million insertion/removals
> as "enormous" on a Pentium II with 128 MB of RAM - that is the
> specification of my home computer. Do it right, and it doesn't
> take forever.

One interesting property of a priority queue is that if you never remove
anything, you can always find the most important item in O(1) [with certain
queue implementations] and never have *any* operation that is O(log(n))
since there are {theoretical at least} priority queues with all operations
except deletemin() which are O(1). On the other hand, for just about any
real case we will delete the minimum item or the size will grow to infinity.
So you have to do log(n) operations "somehow" either on insert or delete or
somewhere. If you put n items into the queue, the real performance is
O(n*log(n)) {as expected}.

Which generates another question:
What about hashing queues or distribution queues? Has anyone done research
in this area? I don't see why you could not make one.

Nick Maclaren

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
In article <W7RK3.278$hf5.5742@client>,

Dann Corbit <dco...@solutionsiq.com> wrote:
>>
>> I suspect that you mean a dynamic (i.e. updated) sorted list, and
>> thinking in the latter terms may help.
>
>Priority queues pointer:
>http://members.xoom.com/killough/heaps.html

Ah. Computer science terminology strikes again. Yes, those are
simple updateable sorted lists, and are not really queues at all.
The term was taken from probability theory and, as is so common,
mangled somewhat on the way. But at least "priority queues" is
not ambiguous, as there is no such thing in probability theory as
far as I know.

>Sorting stinks, though binary insertion might be acceptable with the right
>data structure. There are supposed to be priority queues which have O(1)


>for every operation except deletemin() which is O(log(n)). This thing needs
>to be *stupidly* fast. I have tried Fibbonaci heaps, and they don't [cough]
>perform as advertized. The fastest thing I have found is binomial priority
>queues (as far as implementations go). There are supposed to be things that
>are much, much faster. I was wondering "Where are they?" "Do they really
>work as advertized, or do they only perform better as the queue size
>approaches infinity" ... things of that sort.
>

>Since there will be an enormous number of items (hundreds of millions to
>billions) the algorithm's efficiency is [cough] important.

That's enormous? Not by supercomputing standards, it isn't, nor by
the standards of modern workstations. My point is that, with very
short lists of less than 50, the performance is going to be dominated
by the details of the code, language, compiler and hardware and not
by the algorithmic efficiency.

If you think about it, the ability to perform every operation in O(1)
immediately gives you an O(N) search and an O(N) sort for general
data, which is well-known to be impossible. So any such claims are
either nonsense, or based on one of the usual misanalyses that lose
factors of O(log N).

Consider the following:

Your old data is sorted in a vector A(1...N). Locate the new
item by binary chop, using at most 8 (sic) tests, and assume that
it is located after item K.

Copy A(2...M), the new item and A(M+1...N) to a vector B(1...N)
and repeat using B as the old vector.

You will be using at most 1K bytes of workspace, which will stay
well within the primary cache, and should get the code down to
fewer than 20 conditional branches. It should go like the clappers
on most modern CPUs, and the time will be dominated by the time
obtaining the new item and servicing the one that is removed.

Dann Corbit

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
[snip]

> Which generates another question:
> What about hashing queues or distribution queues? Has anyone done
research
> in this area? I don't see why you could not make one.
A groovy idea has occurred to me:

A linear time hashing queue.

Consider the following (as a 'frinstance'):

The key for the queue is a short unsigned integer that ranges from 0 to
65535.
For each of these values, we have a linked list. So to get the top priority
item, we just get the first item from list[0] unless that is empty. In that
case, we go to list[1], etc. In that way, every single operation has
constant time.

A priority queue with *all* operations O(1)!!

You may object that this method assumes a lot about the data. However, I
have a solution to that problem also. In any case, variants on this theme
could be used to create partly hashing queues, etc. (hash on just a part of
the key to search only a subset).

Please note: I am *NOT* talking about the 50 item queue anymore. This is a
completely separate idea.

Dann Corbit

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
Nick Maclaren <nm...@cus.cam.ac.uk> wrote in message
news:7til9k$fh4$1...@pegasus.csx.cam.ac.uk...
[snip]
> In particular, if you claim that you can locate an item amongst N
> sorted items in unit time, you are using a somewhat unusual
> computational model.

In the case of a priority queue, you are looking at finding only the "top"
item, so it is the first item in the list. The item is not arbitrary.

Nick Maclaren

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to

In article <yokogeb...@daimi.au.dk>,

Jakob Pagter <pag...@daimi.au.dk> writes:
|>
|> BTW, I am sure that the term might have been taken from probability
|> theory, as well many other words used in computer science were not
|> invented by computer scientists (like computer, science, etc.), but it
|> is a rather standard expression that you'll find in basically
|> algorithms and data structures book.

I first saw it about 15-20 years back, and have more than once tried
to track down a definition. In particular, it is critical when doing
complexity analysis to know the EXACT specification of the primitives
that it is built on and the ones it provides. If it isn't a simple
queue, with the extra primitive of addition at an arbitrary point,
what the heck IS it?

Yes, the term is extensively used, but you would be amazed at how few
algorithmic texts contain a precise definition of any of their terms,
and how many alternative interpretations there are of many of them.
This is the root cause of many of the flame wars about whether Fred's
pet sorting method is O(N/log N) or O(N log^2 N). Oh, and neither
the Web reference that the previous posting mentioned nor most of
the things that it referred to do :-)

In particular, if you claim that you can locate an item amongst N
sorted items in unit time, you are using a somewhat unusual
computational model.

Jakob Pagter

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
nm...@cus.cam.ac.uk (Nick Maclaren) writes:

> In article <yokogeb...@daimi.au.dk>,
> Jakob Pagter <pag...@daimi.au.dk> writes:
> |>
> |> BTW, I am sure that the term might have been taken from probability
> |> theory, as well many other words used in computer science were not
> |> invented by computer scientists (like computer, science, etc.), but it
> |> is a rather standard expression that you'll find in basically
> |> algorithms and data structures book.
>
> I first saw it about 15-20 years back, and have more than once tried
> to track down a definition. In particular, it is critical when doing
> complexity analysis to know the EXACT specification of the primitives
> that it is built on and the ones it provides. If it isn't a simple
> queue, with the extra primitive of addition at an arbitrary point,
> what the heck IS it?

> Yes, the term is extensively used, but you would be amazed at how few
> algorithmic texts contain a precise definition of any of their terms,
> and how many alternative interpretations there are of many of them.

From Cormen, Leiserson and Rivest: "Introduction to Algorithms".

==quote==

A *priority queue* is a data structure for maintaining a set S of
elements, each with an associated value called a key. A priority queue
supports the following operations:

o Insert(S,x) inserts x into the set S. This operation could be
written as S<-S 'union' {x}

o Maximum(S) returns the element with the largest key.

o Extract-Max(S) removes and returns the element with the largest
key.

==unquote==

Pretty much the standard definition, although some people like to use
Mimimum instead of Maximum.

To discuss the model, let's just assume than you can only access your
keys through comparisons.

Off, course in time (#comparisons) O(\lg n) I can do Insert using a
sorted list (based on the keys), and the other two operations in
constant time. However, if I where to perform n Inserts followed by 1
Maximum operations, I can do this in time (#comparisons) O(N), by just
remembering all the elements in an unsorted list, and finding the max
when required. Using a sorted list would require time (#comparisons)
\Omega(n\lg n).

-- Jakob


Nick Maclaren

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
In article <UX4L3.1337$XB5.3524@client>,

Dann Corbit <dco...@solutionsiq.com> wrote:
>Nick Maclaren <nm...@cus.cam.ac.uk> wrote in message
>news:7til9k$fh4$1...@pegasus.csx.cam.ac.uk...
>[snip]
>> In particular, if you claim that you can locate an item amongst N
>> sorted items in unit time, you are using a somewhat unusual
>> computational model.
>
>In the case of a priority queue, you are looking at finding only the "top"
>item, so it is the first item in the list. The item is not arbitrary.

All right, make that finding the largest item in N unsorted items


in unit time. I don't care. EITHER of them is an unusual computational
model!

Incidentally, I don't regard a thousand million insertion/removals
as "enormous" on a Pentium II with 128 MB of RAM - that is the
specification of my home computer. Do it right, and it doesn't
take forever.

Dennis Yelle

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
Nick Maclaren wrote:
>
> In article <yokogeb...@daimi.au.dk>,
> Jakob Pagter <pag...@daimi.au.dk> writes:
> |>
> |> BTW, I am sure that the term might have been taken from probability
> |> theory, as well many other words used in computer science were not
> |> invented by computer scientists (like computer, science, etc.), but it
> |> is a rather standard expression that you'll find in basically
> |> algorithms and data structures book.
>
> I first saw it about 15-20 years back, and have more than once tried
> to track down a definition. In particular, it is critical when doing
> complexity analysis to know the EXACT specification of the primitives
> that it is built on and the ones it provides. If it isn't a simple
> queue, with the extra primitive of addition at an arbitrary point,
> what the heck IS it?

Thinking that a "priority queue" must be a kind of "queue" (whatever
that is) is usually a mistake.

The important thing about a "priority queue" is that it holds
items and those items each have a priority and those priorities can
be compared (usually with < and/or >) and that the "priority queue"
provides a way to quickly find and/or remove the item that has the
highest priority.
Usually a "priority queue" does not actually keep its items completely
sorted. Instead it maintains what is often called the "heap property".
One way to do this is to keep the highest priority item
(or a pointer to it) in position 1 of an array and
to arrange the other items such that
test_heap() will return true:

int test_heap( Heap h, int size)
{
// In order to keep the code simple, we do NOT use h[0].
for( int i=1;; i++) {
int j = i+i;
if ( j > size) return true;
if ( h[i]->priority < h[j]->priority ) return false;
j++;
if ( j > size) return true;
if ( h[i]->priority < h[j]->priority ) return false;
}
}

In this data structure, assuming that the array h never needs
to grow, insertion of an arbitrary item, and removal of the
item with the highest priority, can each be done with O( log2 n)
operations where n is the current number of items in the heap.

It is likely that you can find code to do this in <queue> if
you have a C++ compiler available to you. If not, let me know
and I will send you a simple version that does satisfy O( log2 n)
for each insertion and each removal (assuming that the array h does
NOT need to grow).

The item with the highest priority is always found at h[1]
(unless the heap is empty). I first learned of the "heap property"
when I saw a program call "heap sort" which basically sorted
numbers by inserting them into a heap and then repeatedly removing
the one at h[1] and readjusting the heap. This was in 1972 or 1973.

[...]


> In particular, if you claim that you can locate an item amongst N
> sorted items in unit time, you are using a somewhat unusual
> computational model.

It is easy if you always want the one at h[1]. :)

Bjarne Stroustrup writes about priority_queue and heap in his book
_The C++ Programming Language Third Edition_. See priority_queue
in the index.

Dennis Yelle

--
Want to get paid for using the internet?
If so, go to: http://alladvantage.com/go.asp?refid=BAL536
or: http://www.sendmoreinfo.com/SubMakeCookie.cfm?ExtractId=87656

Dann Corbit

unread,
Oct 7, 1999, 3:00:00 AM10/7/99
to
Nick Maclaren <nm...@cus.cam.ac.uk> wrote in message
news:7thkmd$ebi$1...@pegasus.csx.cam.ac.uk...
> In article <W7RK3.278$hf5.5742@client>,
It's "enormous" if it is only a Pentium II with 128 megs of ram or something
very un-cray like.

> If you think about it, the ability to perform every operation in O(1)
> immediately gives you an O(N) search and an O(N) sort for general
> data, which is well-known to be impossible. So any such claims are
> either nonsense, or based on one of the usual misanalyses that lose
> factors of O(log N).
>
> Consider the following:
>
> Your old data is sorted in a vector A(1...N). Locate the new
> item by binary chop, using at most 8 (sic) tests, and assume that
> it is located after item K.
>
> Copy A(2...M), the new item and A(M+1...N) to a vector B(1...N)
> and repeat using B as the old vector.
>
> You will be using at most 1K bytes of workspace, which will stay
> well within the primary cache, and should get the code down to
> fewer than 20 conditional branches. It should go like the clappers
> on most modern CPUs, and the time will be dominated by the time
> obtaining the new item and servicing the one that is removed.

I already suggested this technique (exactly) in my previous message. What
you call "binary chop" is what I called "binary insertion". I was hoping
for some kind of data structure that would allow me to avoid the memcpy()
calls. You're probably right about waiting on the queues. I will have a
separate thread servicing each, but the data still has to come from disk at
some point, and memory operations are many orders of magnitude faster than
disk so (for this application) the bottleneck is probably not the priority
queue. However, I actually had two questions (I obviously did not make this
clear). I use priority queues for lots of things and having and
understanding superior algorithms is very important to me. There will be
other occasions where the operation of the queue itself is crucial and is
the rate limiting step.

I wrote the post (in general) because I benchmarked about 40 different queue
implementations and found that the measurements did not match up with the
theory. I really want to know which queues are best and when and why. The
50 item queue is just a particular sub-problem. It is kind of special in
many ways (the queue does not grow in size -- only two operations (insert
and deletemin) which could be generalized to a single operation "update" and
I thought there might be some especially clever sort of solution.

KSG

unread,
Oct 8, 1999, 3:00:00 AM10/8/99
to

Nick Maclaren wrote:

>
>
> Starting with an empty list, adding and removing N items,
> with the list averaging and not exceeding O(K) (in a sense
> that could be made precise), what is the complexity?
>
> I believe that the answer is O(N log K), irrespective of whether
> the list is maintained fully sorted or partially sorted. If
> there is a good mathematical proof that it is different between
> the cases, I haven't seen it.

I'm probably missing something very obvious here (such as the fact that you needn't
use arrays), but how do you do the above in O(N log K) maintaining a fully sorted
list? Assume the case where you had a list of K elements and you are going to
simply insert/extract/insert/extract/insert/etc... for a total of N elements.

Won't that take O(KN) operations if you keep the list fully sorted, because every
element you insert will on average require moving an average of K/2 elements in the
array?


--
Peace,

KSG
Droppin' Science in REAL AUDIO on KSDT Sundays 8pm-10pm (PST)
ARCHIVED SHOWS: http://scw.ucsd.edu/droppinscience
Personal: http://www.cs.ucsd.edu/~kgatlin

Dennis Yelle

unread,
Oct 8, 1999, 3:00:00 AM10/8/99
to
Nick Maclaren wrote:
[...]
> When I last looked at this area (25 years ago!), I decided that
> the advantages of maintaining a partially sorted list (i.e. what now
> seems to be called a priority queue) are illusory in practice. In
> particular:

>
> Starting with an empty list, adding and removing N items,
> with the list averaging and not exceeding O(K) (in a sense
> that could be made precise), what is the complexity?
>
> I believe that the answer is O(N log K), irrespective of whether
> the list is maintained fully sorted or partially sorted. If
> there is a good mathematical proof that it is different between
> the cases, I haven't seen it.

Clearly the priority queue version will have this behavior.
Each reorganization after insertion requires no more than ceil( log2 k)
key compares and no more than ceil( log2 k) exchanges. Each reorg after
removal requires no more than 2 * ceil( log2 k) key compares and
no more than ceil( log2 k) exchanges. For a total of roughly
3 * N * ceil( log2 K) key compares and 2 * N * ceil( log2 K) exchanges.
If K is large, a simple optimization can reduce the complexity
of the exchanges by 1/2: after exchanging a and b we will
next compare (the new) b and c and possibly exchange then,
so keep b in a temporary (register if possible).
If each removal is often followed by an
insertion, one can set a flag after each removal, and not bother
with the reorg until the following insertion or removal.
When the removals and insertions alternate,
this will reduce the exchanges by another factor of 2, and
reduce the compares by 33%.

If you keep your items fully sorted in an array,
as you indicated in an earlier posting, I don't see
how you can get O( N log K). I think each reorg after
insertion is going to cost you, on average, K/2, or
K/4 if you can add items at either end. For a total
of O( N*K/4) == O(N*K) (assuming N >> K ).

> To put it into algorithmic terms, heapsort-style heaps have only
> minor advantages (and usually only a constant factor in time or
> space) over B-trees.

Yes, I agree with this, but the constant factor can be large, in both
time AND space, and the code for the priority queue can be much
smaller and easier to understand, and easier to get right.
One can think of a priority queue as a tree:
each item i has two sons, 2*i and 2*i+1 (when they each exist).
But the pointers don't actually exist! So you don't have to store
them, and you don't have to change them.

Nick Maclaren

unread,
Oct 8, 1999, 3:00:00 AM10/8/99
to
In article <02aL3.1473$XB5.6419@client>,

Dann Corbit <dco...@solutionsiq.com> wrote:
>[snip]
>> Which generates another question:
>> What about hashing queues or distribution queues? Has anyone done
>research
>> in this area? I don't see why you could not make one.
>A groovy idea has occurred to me:
>
>A linear time hashing queue.
>
>Consider the following (as a 'frinstance'):
>
>The key for the queue is a short unsigned integer that ranges from 0 to
>65535.
>For each of these values, we have a linked list. So to get the top priority
>item, we just get the first item from list[0] unless that is empty. In that
>case, we go to list[1], etc. In that way, every single operation has
>constant time.
>
>A priority queue with *all* operations O(1)!!

Been there. Done that. Yes, it works. No, it isn't uniformly O(1)
if you do the analysis correctly.

Nick Maclaren

unread,
Oct 8, 1999, 3:00:00 AM10/8/99
to

In article <37FD160A...@jps.net>, Dennis Yelle <denn...@jps.net> writes:
|>
|> Thinking that a "priority queue" must be a kind of "queue" (whatever
|> that is) is usually a mistake.

That was, indeed, my mistake! Queuing theory was well established
long before "computer science" was, and is sadly neglected by most
computer scientists. In particular, it is essential to a proper
understanding of how communication and client-server protocols will
behave under load.

|> It is likely that you can find code to do this in <queue> if
|> you have a C++ compiler available to you. If not, let me know
|> and I will send you a simple version that does satisfy O( log2 n)
|> for each insertion and each removal (assuming that the array h does
|> NOT need to grow).

No need - it was a standard technique decades before I first saw
the term. That helped with causing the confusion :-)

|> The item with the highest priority is always found at h[1]
|> (unless the heap is empty). I first learned of the "heap property"
|> when I saw a program call "heap sort" which basically sorted
|> numbers by inserting them into a heap and then repeatedly removing
|> the one at h[1] and readjusting the heap. This was in 1972 or 1973.

Yes. Despite the fact that it used to be hyped up as a good general
purpose sort, heap sort is more a proof of concept than anything
else. It is a very good (indeed, almost the only good) way of
finding the Kth largest item in N items, where N >> K >> 1, but that
is a different problem.

|> > In particular, if you claim that you can locate an item amongst N
|> > sorted items in unit time, you are using a somewhat unusual
|> > computational model.
|>
|> It is easy if you always want the one at h[1]. :)

Um. When I last looked at this area (25 years ago!), I decided that


the advantages of maintaining a partially sorted list (i.e. what now
seems to be called a priority queue) are illusory in practice. In
particular:

Starting with an empty list, adding and removing N items,
with the list averaging and not exceeding O(K) (in a sense
that could be made precise), what is the complexity?

I believe that the answer is O(N log K), irrespective of whether
the list is maintained fully sorted or partially sorted. If
there is a good mathematical proof that it is different between
the cases, I haven't seen it.

To put it into algorithmic terms, heapsort-style heaps have only


minor advantages (and usually only a constant factor in time or
space) over B-trees.

Nick Maclaren

unread,
Oct 9, 1999, 3:00:00 AM10/9/99
to
In article <37FE7A0F...@jps.net>, Dennis Yelle <denn...@jps.net> wrote:
>
>If you keep your items fully sorted in an array,
>as you indicated in an earlier posting, I don't see
>how you can get O( N log K). I think each reorg after
>insertion is going to cost you, on average, K/2, or
>K/4 if you can add items at either end. For a total
>of O( N*K/4) == O(N*K) (assuming N >> K ).

Oh, yes. But, as I posted earlier, that was in response to a question
of what to do when K was known to be at most 50. In such cases, the
overheads of any fancy data structure are likely to make it slower
than using simple arrays. 25 years ago, I would have said that was
true for K of less than 20 - but the machine balances have changed.

>> To put it into algorithmic terms, heapsort-style heaps have only
>> minor advantages (and usually only a constant factor in time or
>> space) over B-trees.
>

>Yes, I agree with this, but the constant factor can be large, in both
>time AND space, and the code for the priority queue can be much
>smaller and easier to understand, and easier to get right.

Um. Yes, it is significant, but "quite large"? I did once work it
out precisely, and remember vaguely that it is perhaps a factor of
3-5 in the worst case, and typically negligible or even negative
(in cases where the data handling dominates.)

You may find heapsort code easier to understand, but I don't, and
I have seen a lot of broken implementations. The main advantages
of keeping the data fully sorted are:

1) It is possible to run a simple cross-check for consistency,
which is a major advantage when debugging or when validation is
important. I don't know of a cross-check for heapsort structures.

2) If you need to change the code to look at the Nth item, or
the first one after value X, then it is trivial to do. With a
heapsort structure, you have to reengineer the code.

So I would tend to use B-trees (or whatever) unless performance
was critical. You might tend to use heapsort structures unless
you need the extra functionality. Both approaches are justifiable.

Nick Maclaren

unread,
Oct 9, 1999, 3:00:00 AM10/9/99
to
In article <37FE92CF...@cs.ucsd.edu>, KSG <kga...@cs.ucsd.edu> wrote:

>Nick Maclaren wrote:
>
>> Starting with an empty list, adding and removing N items,
>> with the list averaging and not exceeding O(K) (in a sense
>> that could be made precise), what is the complexity?
>>
>> I believe that the answer is O(N log K), irrespective of whether
>> the list is maintained fully sorted or partially sorted. If
>> there is a good mathematical proof that it is different between
>> the cases, I haven't seen it.
>
>I'm probably missing something very obvious here (such as the fact that you needn't
>use arrays), but how do you do the above in O(N log K) maintaining a fully sorted
>list? Assume the case where you had a list of K elements and you are going to
>simply insert/extract/insert/extract/insert/etc... for a total of N elements.
>
>Won't that take O(KN) operations if you keep the list fully sorted, because every
>element you insert will on average require moving an average of K/2 elements in the
>array?

If you use vectors, yes, it will. My advice to the original poster
to use vectors was based on the fact that K was less than 50. For
arbitrary K, use a data structure like height-balanced trees, where
all relevant operations are O(log K). Hence my remark about B-trees.

0 new messages