How to choose a Data-Structure for a Priority Queue ?

arnuld

unread,

Nov 6, 2009, 2:42:44 AM11/6/09

to

I have been asked to make a Priority Queue (PQ) and after making lots of
threads on comp.lang.c and writing lots of code I am still confused. The
PQ has these requirements:

1) Process approximately 10 million elements, around 1 million at a
time.

2) There is no idea how many elements we will be having in the
beginning or at any point of time. The least amount will be 10,000.

3) It needs to be CPU efficient.

4) 64 MB of RAM is available.

5) It has to be in C.

First I used used a Binary-Search-Tree, then I used array based heap but
arrays don't reallocated. I think i simple single linked list will do
fine for the requirements I mentioned. what do you advise ?

--
www.lispmachine.wordpress.com
my email is @ the above blog.

Ben Bacarisse

unread,

Nov 6, 2009, 8:09:21 AM11/6/09

to

arnuld <sun...@invalid.address> writes:

> I have been asked to make a Priority Queue (PQ) and after making lots of
> threads on comp.lang.c and writing lots of code I am still confused. The
> PQ has these requirements:
>
> 1) Process approximately 10 million elements, around 1 million at a
> time.

What does "at a time mean"? Do you mean that the queue might grow to
about 1M elements? With only 64M available (is that data+code?) you
might have trouble unless the items are small.

> 2) There is no idea how many elements we will be having in the
> beginning or at any point of time. The least amount will be 10,000.
>
> 3) It needs to be CPU efficient.

This is hard to pin down. In many real-time apps you can say,
roughly, how much time you have per operation. For example, if you
expect traffic to be processed at a rate of 1,000,000 queue/dequeue
operations per second, you know you have only 1µs per queue/dequeue.
Do you now anything of that sort? If you do, what processor will be
running the code? With all that information you know how much you
need to worry!

Of course, if this is part of bigger project and you are asked simply
to be a "good citizen" and use the least possible resource, then the
estimates get more complicated, but you can sometimes negotiate a
resource budget from the team.

> 4) 64 MB of RAM is available.
>
> 5) It has to be in C.
>
> First I used used a Binary-Search-Tree, then I used array based heap but
> arrays don't reallocated. I think i simple single linked list will do
> fine for the requirements I mentioned. what do you advise ?

I'd still advise an array, but you point to it rather than declaring
it. In fact, wrap the queue in an opaque struct:

typedef struct p_queue p_queue;

and expose only those operations that you need to expose:

pq_insert(p_queue *q, element_t e);
element_t pr_remove(p_queue *q);

etc. In the module that implements the queue you define:

struct p_queue {
size_t capacity, size;
element_t *heap_array;
};

This is a rough sketch of course -- there are lots of variation: You
may need to point to elements rather that have the in the array
itself. You may want to put the * inside the initial typedef, and so
on.

--
Ben.

Dann Corbit

unread,

Nov 6, 2009, 3:17:46 PM11/6/09

to

In article <pan.2009.11...@invalid.address>,
sun...@invalid.address says...

>
> I have been asked to make a Priority Queue (PQ) and after making lots of
> threads on comp.lang.c and writing lots of code I am still confused. The
> PQ has these requirements:
>
>
> 1) Process approximately 10 million elements, around 1 million at a
> time.

If the items are 6.4 bytes or more in length seems like a big problem to
me.

> 2) There is no idea how many elements we will be having in the
> beginning or at any point of time. The least amount will be 10,000.
>
> 3) It needs to be CPU efficient.
>
> 4) 64 MB of RAM is available.
>
> 5) It has to be in C.

Low ram points to a skiplist (requires about 1.25 - 1.3 pointers per
element. This is considerably less than most other tree type data
structures (which typically require 2 or more pointers per element).

> First I used used a Binary-Search-Tree, then I used array based heap but
> arrays don't reallocated. I think i simple single linked list will do
> fine for the requirements I mentioned. what do you advise ?

Considering the data volume and the low memory, I think you will have to
consider what happens when the data structure spills to disk.

This thing is out of this world for that purpose, but it is C++:
https://sourceforge.net/projects/stxxl/files/

If you have to process 64 M objects but only hold one or million in
memory at a time, then this is a viable alternative:
https://sourceforge.net/projects/skiplist/files/
and it does have a C implementation.

Your project borders on being infeasible. Consider what happens when the
problem size scales up (and it always does).

Gene

unread,

Nov 6, 2009, 3:42:12 PM11/6/09

to

On Nov 6, 3:17 pm, Dann Corbit <dcor...@connx.com> wrote:
> In article <pan.2009.11.06.07.39...@invalid.address>,
> sunr...@invalid.address says...

>
>
>
> > I have been asked to make a Priority Queue (PQ) and after making lots of
> > threads on comp.lang.c and writing lots of code I am still confused. The
> > PQ has these requirements:
>
> > 1) Process approximately 10 million elements, around 1 million at a
> > time.
>
> If the items are 6.4 bytes or more in length seems like a big problem to
> me.
>
> > 2) There is no idea how many elements we will be having in the
> > beginning or at any point of time. The least amount will be 10,000.
>
> > 3) It needs to be CPU efficient.
>
> > 4) 64 MB of RAM is available.
>
> > 5) It has to be in C.
>
> Low ram points to a skiplist (requires about 1.25 - 1.3 pointers per
> element. This is considerably less than most other tree type data
> structures (which typically require 2 or more pointers per element).

Skip lists are cool, but an array-based heap (growing the array in
multiples as needed) requires zero pointers per element and is very
fast unless a merge operation is required.

The OP really needs to specify the operations that are required for
the application.

arnuld

unread,

Nov 7, 2009, 3:05:55 AM11/7/09

to

> Gene <gene.r...@gmail.com>
> wrote on Date: Fri, 6 Nov 2009 12:42:12 -0800 (PST)

> Skip lists are cool, but an array-based heap (growing the array in
> multiples as needed) requires zero pointers per element and is very
> fast unless a merge operation is required.

It will be an array of pointers.

> The OP really needs to specify the operations that are required
> for the application.

- Delete element with Max priority
- Insert an element with specific priority
- No elements with same priority (means priority will be unique for
each element

arnuld

unread,

Nov 7, 2009, 3:12:38 AM11/7/09

to

> On Fri, 06 Nov 2009 13:09:21 +0000, Ben Bacarisse wrote:

> What does "at a time mean"? Do you mean that the queue might grow to
> about 1M elements?

There could be 1 million elements in the queue but that is a rare
condition. Millions of elements will be processed but they will be
deleted quite fast limiting the memory usage. ON an average it will be
around 100,000 at one time occupying the memory, not more than that.

> ...snip...

> This is a rough sketch of course -- there are lots of variation: You may
> need to point to elements rather that have the in the array itself. You
> may want to put the * inside the initial typedef, and so on.

That's a nice sketch. I too don't have much information except what I
have told here. I have already created a singly linked list based queue
but I think Array based Binary-Heap is a good idea.

Jonathan Campbell

unread,

Nov 7, 2009, 8:10:37 AM11/7/09

to

arnuld wrote:
>> Gene <gene.r...@gmail.com>
>> wrote on Date: Fri, 6 Nov 2009 12:42:12 -0800 (PST)
>
>> Skip lists are cool, but an array-based heap (growing the array in
>> multiples as needed) requires zero pointers per element and is very
>> fast unless a merge operation is required.
>
> It will be an array of pointers.
>
>
>
>
>> The OP really needs to specify the operations that are required
>> for the application.
>
> - Delete element with Max priority
> - Insert an element with specific priority
> - No elements with same priority (means priority will be unique for
> each element
>

Does the last statement not mean that any 'queue' requirement is removed?

So, use some sort of binary search tree: delete max., O(1); insert,
O(log N).

Jon C.

--
Jonathan Campbell www.jgcampbell.com BT48, UK.

Ben Bacarisse

unread,

Nov 7, 2009, 7:39:27 PM11/7/09

to

Jonathan Campbell <jg.camp...@gmail.com> writes:

> arnuld wrote:
<snip>

>> - Delete element with Max priority
>> - Insert an element with specific priority
>> - No elements with same priority (means priority will be unique
>> for each element
>>
>
> Does the last statement not mean that any 'queue' requirement is
> removed?

It is certainly add odd requirement (I am wondering how it comes about
in practise) but elements still have to be queued. You are right that
the priority alone now determines the next item to be removed so, as
you say, a tree could be used. But a tree could be used for a proper
queue (one with duplicate priorities) since we can use a counter to
construct a new priority p' from the original priority, p, such that
p' is unique and elements with equal p get removed in the order they
were inserted.

> So, use some sort of binary search tree: delete max., O(1); insert,
> O(log N).

But what BST has O(1) delete (if that is indeed what you are
suggesting above)?

--
Ben.

Jonathan Campbell

unread,

Nov 8, 2009, 7:14:10 AM11/8/09

to

Ben Bacarisse wrote:
> Jonathan Campbell <jg.camp...@gmail.com> writes:
>
>> arnuld wrote:
> <snip>
>>> - Delete element with Max priority
>>> - Insert an element with specific priority
>>> - No elements with same priority (means priority will be unique
>>> for each element
>>>
>> Does the last statement not mean that any 'queue' requirement is
>> removed?
>
> It is certainly add odd requirement (I am wondering how it comes about
> in practise) but elements still have to be queued. You are right that

[...]

>
> But what BST has O(1) delete (if that is indeed what you are
> suggesting above)?
>

Oops, yes. I was still thinking about the linked list solution that I
suggested in response to an earlier post.

Moi

unread,

Nov 8, 2009, 12:53:16 PM11/8/09

to

How fast should it be ?
My naive half-balanced BST did 6M inserts and 6M deletes
in 4.5 seconds on a 3.2 GHz intel machine. given an estimated
node size of 16 bytes it will fit into 64MB.

But as Dann wrote: given tight memory, skiplists should be a first choice.
Judy-trees are probably better, but take a lot more code
(and are difficult to get right).

HTH,
AvK

arnuld

unread,

Nov 10, 2009, 12:09:07 AM11/10/09

to

> On Sun, 08 Nov 2009 12:14:10 +0000, Jonathan Campbell wrote:

>> Ben Bacarisse wrote:
>> But what BST has O(1) delete (if that is indeed what you are suggesting
>> above)?
> Oops, yes. I was still thinking about the linked list solution that I
> suggested in response to an earlier post.

I already implemented a singly linked list solution, it was pretty simple
to create and even seems pretty simple to maintain as compared to a
binary-heap which feels much more complicated. I even wrote a simple unit
test function to test this PQ. I am working on Binary Heap now. Here is
the code for singly linked list based PQ:

/* A Priority Queue (PQ) implemented as singly linked list where elements
are always added according to the priority and the element
* with highest priority is always at the beginning. Removal is done
only by removing the element with highest priority. Priority
* is a /int/.
*
*
* Insertion: O(N) + O(N) in worst case, one for insertion, one for
uniqueness check.
* Find Max: 0(1)
* Deletion: O(1)
*
* VERSION 0.0
*
*/

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stddef.h>

enum BOOL { NO = 0,
YES = 1,
VAL_FAIL = NO,
VAL_SUCC = YES};

enum { SIZE_NAME = 30 };

struct pq_element
{
int priority;
struct pq_element* next;
};

struct pql
{
struct pq_element* head;
struct pq_element* tail;
};

struct pql* PQL_init(void);
struct pq_element* PQL_create_element(int d);

void PQL_insert(struct pql* s, struct pq_element* e);
void PQ_add_element(struct pql* s, struct pq_element* e);
void PQL_remove(struct pql* s);
void PQL_free_single(struct pq_element* e);

void PQL_print(struct pql* s);
void PQL_print_single(struct pq_element* p);
void PQL_unit_test_insertion(struct pql* s);
void PQL_unit_test_removal(struct pql* s);

int main(void)
{
struct pql* pql = PQL_init();

struct pq_element* a1 = PQL_create_element(1);
struct pq_element* a2 = PQL_create_element(2);
struct pq_element* a0 = PQL_create_element(0);
struct pq_element* a6 = PQL_create_element(6);

if(NULL == pql)
{
fprintf(stderr, "IN: %s: malloc() failed\n", __func__);
exit(EXIT_FAILURE);
}
else if(NULL == a1 ||
NULL == a2 ||
NULL == a0 ||
NULL == a6 )
{
fprintf(stderr, "LINE: %d, IN: %s: malloc() failed\n", __LINE__,
__func__);
exit(EXIT_FAILURE);
}

/*
PQL_insert(pql, a1);
PQL_insert(pql, a2);
PQL_insert(pql, a6);
PQL_insert(pql, a0);
PQL_print(pql);

*/
PQL_unit_test_insertion(pql);
PQL_print(pql);

printf("\n-------------- Removing element ----------------\n\n");
PQL_unit_test_removal(pql);
PQL_print(pql);

return 0;
}

struct pq_element* PQL_create_element(int d)
{
struct pq_element* p = malloc(1 * sizeof*p);

if(p)
{
p->priority = d;
}

return p;
}

void PQL_insert(struct pql* s, struct pq_element* e)
{
if(NULL == s)
{
fprintf(stderr, "IN: %s: Can not add element to a NULL list\n",
__func__);
return;
}
else if(NULL == e)
{
fprintf(stderr, "IN: %s: Can not add NULL element to a list\n",
__func__);
return;
}
else if(NULL == s->head) /* adding very first element */
{
printf("IN: %s: Adding very first element\n", __func__);
s->head = s->tail = e;
}
else
{
/* printf("IN: %s: Adding element to PQL\n", __func__); */
PQ_add_element(s,e);
}
}

/* This function works in conjuction with PQ_insert(), hecne there is no
need to check the arguments for NULL */
void PQ_add_element(struct pql* s, struct pq_element* e)
{
struct pq_element* h;
for( h = s->head; h; h = h->next)
{
if(e->priority > h->priority) /* given element has higher priroity
than head of list */
{
struct pq_element* temp = h->next;

if( temp && temp->priority == e->priority)
{
fprintf(stderr, "IN: %s: Element already exists, can not add
\n", __func__);
return;
}
else
{
s->head = e;
e->next = h;
break;
}
}
else if(NULL == h->next) /* we have reached the end of list and
all elements have higher priority then given element */
{
h->next = e;
e->next = NULL;
s->tail = e; /* update tail of list */
break;
}
}
}

void PQL_remove(struct pql* s)
{
if(s && s->head)
{
struct pq_element* h = s->head;
s->head = s->head->next;
PQL_free_single(h);

if(NULL == s->head)
{
s->head = s->tail = NULL;
}
}
}

/* ------------------ printing and small functions ------------------- */
struct pql* PQL_init(void)
{
struct pql* p = malloc(1 * sizeof*p);
if(p) p->head = p->tail = NULL;

return p;
}

void PQL_free_single(struct pq_element* e)
{
free(e);
}

void PQL_print(struct pql* s)
{
struct pq_element* h = NULL;

for( h = s->head; h; h = h->next)
{
PQL_print_single(h);
}
}

void PQL_print_single(struct pq_element* p)
{
if(p)
{
printf("%d\n", p->priority);
}
}

/* ------------------- Unit Testing fucntions ---------------------------
*/
void PQL_unit_test_insertion(struct pql* s)
{
const int limit = 10000000;
int i, r;
struct pq_element* e;

for(i = 0; i != limit; ++i)
{
r = rand();
e = PQL_create_element(r);

if(e)
{
PQL_insert(s, e);
}
}
}

void PQL_unit_test_removal(struct pql* s)
{
const int limit = 6;
int i;

for(i = 0; i != limit; ++i)
{
PQL_remove(s);
}
}

========================== OUTPUT ================================
[arnuld@dune programs]$ gcc -ansi -pedantic -Wall -Wextra single.c
[arnuld@dune programs]$ ./a.out
IN: PQL_insert: Adding very first element
186687031
180494694
82812627
31597025
15019312
8496173
2493077
1183876
74808
16105
13935
2638
95
37

-------------- Removing element ----------------

2493077
1183876
74808
16105
13935
2638
95
37
[arnuld@dune programs]$

arnuld

unread,

Nov 10, 2009, 12:12:52 AM11/10/09

to

> On Sun, 08 Nov 2009 18:53:16 +0100, Moi wrote:

> How fast should it be ?
> My naive half-balanced BST did 6M inserts and 6M deletes in 4.5 seconds
> on a 3.2 GHz intel machine. given an estimated node size of 16 bytes it
> will fit into 64MB.

If by 6M you meant 6 Million then 12 millions of operations in 4.5
seconds is hell faster than what I need. 6 millions can take 3-4 minutes,
fast enough for me.

> But as Dann wrote: given tight memory, skiplists should be a first
> choice. Judy-trees are probably better, but take a lot more code (and
> are difficult to get right).

I am looking into Skiplists link provided by Dann Corbit (his name is so
easy to remember, ever for an Indian :) . Currently I am focusing all of
my efforts on my half finished, complexity based Binary-Heap. Will post
the code when I am done with something to show.

Chris M. Thomasson

unread,

Nov 9, 2009, 1:17:46 PM11/9/09

to

"Moi" <ro...@invalid.address.org> wrote in message
news:eaa4$4af7058c$5350c024$22...@cache100.multikabel.net...
[...]

> But as Dann wrote: given tight memory, skiplists should be a first choice.
> Judy-trees are probably better, but take a lot more code
> (and are difficult to get right).

I thought Judy Trees are patented by HP:

Chris M. Thomasson

unread,

Nov 9, 2009, 1:18:52 PM11/9/09

to

"Chris M. Thomasson" <n...@spam.invalid> wrote in message
news:hdb0hp$3pa$1...@aioe.org...

http://www.google.com/patents?id=0SYPAAAAEBAJ

;^)

Moi

unread,

Nov 10, 2009, 3:29:31 AM11/10/09

to

On Mon, 09 Nov 2009 10:18:52 -0800, Chris M. Thomasson wrote:

> "Chris M. Thomasson" <n...@spam.invalid> wrote in message

>> I thought Judy Trees are patented by HP:
>
> http://www.google.com/patents?id=0SYPAAAAEBAJ
>
> ;^)

Oops. Damn patents.
Anyway they were not invented here ;-)

AvK

Ben Bacarisse

unread,

Nov 10, 2009, 8:19:35 AM11/10/09

to

arnuld <sun...@invalid.address> writes:

>> On Sun, 08 Nov 2009 12:14:10 +0000, Jonathan Campbell wrote:
>
>>> Ben Bacarisse wrote:
>>> But what BST has O(1) delete (if that is indeed what you are suggesting
>>> above)?
>> Oops, yes. I was still thinking about the linked list solution that I
>> suggested in response to an earlier post.
>
> I already implemented a singly linked list solution, it was pretty simple
> to create and even seems pretty simple to maintain as compared to a
> binary-heap which feels much more complicated. I even wrote a simple unit
> test function to test this PQ.

It may well be fast enough. I have little data so it is hard to
say. You say there will typically be about 10,000 elements in the
queue, so you might think you will have to look though 5,000 elements
on average to find the insertion point, but priority queues rarely
work like that. In some applications, the majority of added elements
go at either the front or the end, so you can speed things up by
special casing those. In others, you get an exponential distribution
of insertion points so insertions far down the list are rare.

Your case is odd because you can't have duplicate priorities and that
means I can't even imagine what the application is.

If you use the "opaque type" suggestion you can use your list solution
(with minor changes) now and simply replace it with a faster queue if
it turns out that you need to.

<snip>
--
Ben.

arnuld

unread,

Nov 11, 2009, 7:06:17 AM11/11/09

to

> On Tue, 10 Nov 2009 13:19:35 +0000, Ben Bacarisse wrote:

> It may well be fast enough. I have little data so it is hard to say.
> You say there will typically be about 10,000 elements in the queue, so
> you might think you will have to look though 5,000 elements on average
> to find the insertion point, but priority queues rarely work like that.
> In some applications, the majority of added elements go at either the
> front or the end, so you can speed things up by special casing those.
> In others, you get an exponential distribution of insertion points so
> insertions far down the list are rare.
>
> Your case is odd because you can't have duplicate priorities and that
> means I can't even imagine what the application is.

Now go ahead and kill me for this new requirement that I have gotten from
my boss:

"There will be duplicates, can be 1 or 10. e.g. I am adding persons
into the queue with their name, age, address and phone". Then there will
be lots of persons with one age (say 30). So you need to add all
duplicates but when u add them (duplicates) one added first will be
removed first (FIFO), except this everything will added according to the
priority and person with minimum priority will always be removed first."

So it means, I have to search 2 times now: one for whether there is any
duplicate and 2nd if duplicate is there then add it after that duplicate
(s) else add it according to the priority.

Do you have any better ideas ?

Willem

unread,

Nov 11, 2009, 7:11:29 AM11/11/09

to

arnuld wrote:
) Now go ahead and kill me for this new requirement that I have gotten from
) my boss:
)
)
) "There will be duplicates, can be 1 or 10. e.g. I am adding persons
) into the queue with their name, age, address and phone". Then there will
) be lots of persons with one age (say 30). So you need to add all
) duplicates but when u add them (duplicates) one added first will be
) removed first (FIFO), except this everything will added according to the
) priority and person with minimum priority will always be removed first."
)
)
) So it means, I have to search 2 times now: one for whether there is any
) duplicate and 2nd if duplicate is there then add it after that duplicate
) (s) else add it according to the priority.
)
)
) Do you have any better ideas ?

How many different priorities are there ?
You could simply implement one simple FIFO for each priority.

Also, what programming language are you using ?
There are a number of languages where this is easy peasy and fast too.
(Perl for example).

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Ben Bacarisse

unread,

Nov 11, 2009, 8:39:42 AM11/11/09

to

arnuld <sun...@invalid.address> writes:

>> On Tue, 10 Nov 2009 13:19:35 +0000, Ben Bacarisse wrote:
>
>> It may well be fast enough. I have little data so it is hard to say.
>> You say there will typically be about 10,000 elements in the queue, so
>> you might think you will have to look though 5,000 elements on average
>> to find the insertion point, but priority queues rarely work like that.
>> In some applications, the majority of added elements go at either the
>> front or the end, so you can speed things up by special casing those.
>> In others, you get an exponential distribution of insertion points so
>> insertions far down the list are rare.
>>
>> Your case is odd because you can't have duplicate priorities and that
>> means I can't even imagine what the application is.
>
> Now go ahead and kill me for this new requirement that I have gotten from
> my boss:

Why would we mind? You are the one with the quixotic boss!

> "There will be duplicates, can be 1 or 10. e.g. I am adding persons
> into the queue with their name, age, address and phone". Then there will
> be lots of persons with one age (say 30). So you need to add all
> duplicates but when u add them (duplicates) one added first will be
> removed first (FIFO), except this everything will added according to the
> priority and person with minimum priority will always be removed first."
>
> So it means, I have to search 2 times now: one for whether there is any
> duplicate and 2nd if duplicate is there then add it after that duplicate
> (s) else add it according to the priority.
>
> Do you have any better ideas ?

One trick that might work for you is to use a counter. You increment
it every time you add an item and you store it with the priority p to
form a pair (p, c). This forms the real priority you use for
organising your heap:

(p, c) < (p', c') if p < p' but if p == p' use c < c'

This extra storage may be show-stopper, and you may not be able to
live with the fact that you can't reset the counter without some O(N)
work, but it is worth considering.

If not, use multiple non-priority queues (plain FIFOs) organised into
a heap as has already been suggested by Willem.

--
Ben.

Alan Morgan

unread,

Nov 11, 2009, 1:22:26 PM11/11/09

to

In article <slrnhflafh...@turtle.stack.nl>,

Willem <wil...@stack.nl> wrote:
>arnuld wrote:
>) Now go ahead and kill me for this new requirement that I have gotten from
>) my boss:
>)
>) "There will be duplicates, can be 1 or 10. e.g. I am adding persons
>) into the queue with their name, age, address and phone". Then there will
>) be lots of persons with one age (say 30). So you need to add all
>) duplicates but when u add them (duplicates) one added first will be
>) removed first (FIFO), except this everything will added according to the
>) priority and person with minimum priority will always be removed first."

Is age supposed to be the priority here, because if it isn't I fail to
see why it's a complication.

>How many different priorities are there ?
>You could simply implement one simple FIFO for each priority.

Or make the key be priority+timestamp rather than just priority and
modify the comparison operation slightly.

Arnuld, I'd suggest you write up a design document (if you haven't
already). This is a great way to get the requirements down on paper
and will permit you to defined your design choices. It shouldn't
take more than a couple of days and shouldn't be more than 3 pages.

IMHO, the absolute worst thing a developer can do is work in secret.
Tell people what you are going to do, tell them how you are going to
do it, do it, and then tell them what you did.

Alan
--
Defendit numerus

Dann Corbit

unread,

Nov 11, 2009, 2:48:34 PM11/11/09

to

In article <pan.2009.11...@invalid.address>,
sun...@invalid.address says...
>

Use the SQLite database in memory mapped mode. (It's written in C).
It will handle duplicates, and any sort of strange requirements that pop
up. I guess it will be fast enough for your purposes and it will give
you a permanent store for the processed data for subsequent operations.
This embedded database is small enough to fit on phone applications.

Something is not making sense to me. We have a huge volume of data
being processed, but it seems that you have some sort of pipe or filter
you need to write. If you are discarding the data, then the exercise is
pointless, so I suggest examination of the system as a whole.

What is really wanted here? We have a stream of data to process. Where
is the data to reside, eventually? What sort of processing requirements
exist for the incoming stream, and what sort of processing requirements
exist for the data after it exits the pipe?

The way that this project is developing is very, very, very bad:
"Oh, by the way, -- it also has to stop a herd of stampeding yak..."
is my name for this model. As you are moving along, the requirements
keep changing. That is a formal recipie for disaster, virtually 100% of
the time. So I urge you:
1. Take a step back.
2. Collect the actual requirements for this project. (What are the
inputs, what are the outputs, what are the tasks that need to be done,
have the workers who are going to use this tool been interviewed for how
this thing fits into the work flow, what are the documenation needs,
etc.)
3. After you have requirements (it is very clear that requirements
gathering has never taken place yet) then you can design a solution.

It is literally impossible to design a tool when the specification is
constantly moving.

arnuld

unread,

Nov 12, 2009, 1:57:29 AM11/12/09

to

> On Wed, 11 Nov 2009 12:11:29 +0000, Willem wrote:

> How many different priorities are there ? You could simply implement one
> simple FIFO for each priority.

That seems impossible because there could be a million types of
priorities.

> Also, what programming language are you using ? There are a number of
> languages where this is easy peasy and fast too. (Perl for example).

C

arnuld

unread,

Nov 12, 2009, 2:16:43 AM11/12/09

to

> On Wed, 11 Nov 2009 13:39:42 +0000, Ben Bacarisse wrote:

> One trick that might work for you is to use a counter. You increment it
> every time you add an item and you store it with the priority p to form
> a pair (p, c). This forms the real priority you use for organising your
> heap:
>
> (p, c) < (p', c') if p < p' but if p == p' use c < c'
>
> This extra storage may be show-stopper, and you may not be able to live
> with the fact that you can't reset the counter without some O(N) work,
> but it is worth considering.

That is new one. How is my idea compared to it which takes advantage of
the fact that items are added according to the priority:

when we add an item with duplicate priority we add it just after the
original one, so a block of linked list queue will look like this:

(p0, item1) -> (p1, item2) -> (p1, item3) -> (p4, item4)

Since we always remove the elements from the head, we are always going to
remove that oldest one.

During addition of a duplicate when we come across the original we can
check its next pointer to if next one has same priority, and then next
one, till we get a different priority. That way we will need only 2 extra
pointers but not extra O(N) search.

> If not, use multiple non-priority queues (plain FIFOs) organised into a
> heap as has already been suggested by Willem.

I don't think its good idea because I may or may not have duplicates,
even if I have them, still then there could be million of different
priorities and I don't think making millions of different queues is a
good idea even when they are going to be deleted soon.

arnuld

unread,

Nov 12, 2009, 2:26:37 AM11/12/09

to

> On Wed, 11 Nov 2009 11:48:34 -0800, Dann Corbit wrote:

> Use the SQLite database in memory mapped mode. (It's written in C). It
> will handle duplicates, and any sort of strange requirements that pop
> up. I guess it will be fast enough for your purposes and it will give
> you a permanent store for the processed data for subsequent operations.
> This embedded database is small enough to fit on phone applications.

I will show it to my boss and see if we can use it.

> Something is not making sense to me. We have a huge volume of data
> being processed,

> ...SNIP...

I don't like it too. I would be very happy if I was told what I going to
do exactly, what is the software as a whole and how the different parts
interact with each other and what part I have to design, that way I have
the design and whys of all the questions. Sadly its not the case.

Ben Bacarisse

unread,

Nov 12, 2009, 8:54:49 AM11/12/09

to

arnuld <sun...@invalid.address> writes:

>> On Wed, 11 Nov 2009 13:39:42 +0000, Ben Bacarisse wrote:
>
>
>> One trick that might work for you is to use a counter. You increment it
>> every time you add an item and you store it with the priority p to form
>> a pair (p, c). This forms the real priority you use for organising your
>> heap:
>>
>> (p, c) < (p', c') if p < p' but if p == p' use c < c'
>>
>> This extra storage may be show-stopper, and you may not be able to live
>> with the fact that you can't reset the counter without some O(N) work,
>> but it is worth considering.
>
>
> That is new one. How is my idea compared to it which takes advantage of
> the fact that items are added according to the priority:
>
>
> when we add an item with duplicate priority we add it just after the
> original one, so a block of linked list queue will look like this:
>
> (p0, item1) -> (p1, item2) -> (p1, item3) -> (p4, item4)
>
> Since we always remove the elements from the head, we are always going to
> remove that oldest one.

Sure, but that's not a heap. If a sorted list is good enough, use
that. It is far simpler than any other suggestion.

> During addition of a duplicate when we come across the original we can
> check its next pointer to if next one has same priority, and then next
> one, till we get a different priority. That way we will need only 2 extra
> pointers but not extra O(N) search.

Why do you any extra pointers? And what is the "extra O(N) search"?

If you use a list, the search for where the item goes (in a sorted
list) is O(N). I was offering you a way to insert into a heap.

>> If not, use multiple non-priority queues (plain FIFOs) organised into a
>> heap as has already been suggested by Willem.
>
> I don't think its good idea because I may or may not have duplicates,
> even if I have them, still then there could be million of different
> priorities and I don't think making millions of different queues is a
> good idea even when they are going to be deleted soon.

You could implement it in such a way that you convert a single item to
a queue only when there is a duplicate.

This all sounds like a class exercise, not a work requirement. if it
really is a job, your design process is badly broken (has someone has
already said).

--
Ben.

arnuld

unread,

Nov 13, 2009, 1:08:43 AM11/13/09

to

> On Thu, 12 Nov 2009 13:54:49 +0000, Ben Bacarisse wrote:

> Sure, but that's not a heap. If a sorted list is good enough, use that.
> It is far simpler than any other suggestion.

Even my experience of writing both taught me that list is 10 times
simpler than working with a binary heap. and its fast enough as per my
unit test.

> Why do you any extra pointers? And what is the "extra O(N) search"?

Eh.. I was thinking you were suggesting a solution for list but I see you
made it clear that the solution was for heap actually, hence the above
statement of mine (extra O(N) search) no longer applies.

> ..SNIP...

> This all sounds like a class exercise, not a work requirement. if it
> really is a job, your design process is badly broken (has someone has
> already said).

Yes, but I have no idea of whats going on senior level. Good (or Best ?)
part is I was asked to write an algorithm and I learned Binary Heap and
linked lists :)

arnuld

unread,

Nov 13, 2009, 3:16:36 AM11/13/09

to

> On Thu, 12 Nov 2009 13:54:49 +0000, Ben Bacarisse wrote:

> Sure, but that's not a heap. If a sorted list is good enough, use that.
> It is far simpler than any other suggestion.

I have implemented it and it works fine except that the addition of
duplicates does not work. I know that the problem lies in
PQL_add_duplicate() as I have printed the relevant values which show that
some assignments are not working but I am unable to figure out why:

/* A Priority Queue (PQ) implemented as singly linked list where elements
are always added according to the priority and the element
* with highest priority is always at the beginning. Removal is done
only by removing the element with highest priority. Priority
* is a /int/.
*
*

* Insertion: O(N) in worst case

* Find Max: 0(1)
* Deletion: O(1)
*
* VERSION 0.0
*
*/

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <stddef.h>

#include <limits.h>

enum BOOL { NO = 0,
YES = 1,
VAL_FAIL = NO,
VAL_SUCC = YES};

enum { SIZE_NAME = 30 };

struct pq_element
{
int priority;
struct pq_element* next;
};

struct pql
{
struct pq_element* head;
struct pq_element* tail;
};

struct pql* PQL_init(void);
struct pq_element* PQL_create_element(int d);

void PQL_insert(struct pql* s, struct pq_element* e);
void PQ_add_element(struct pql* s, struct pq_element* e);

void PQL_add_duplicate(struct pq_element* head, struct pq_element* t);

void PQL_remove(struct pql* s);
void PQL_free_single(struct pq_element* e);

void PQL_print(struct pql* s);
void PQL_print_single(struct pq_element* p);

int main(void)

{
struct pql* pql = PQL_init();

struct pq_element* a1 = PQL_create_element(1);
struct pq_element* a2 = PQL_create_element(2);
struct pq_element* a0 = PQL_create_element(0);
struct pq_element* a6 = PQL_create_element(6);

if(NULL == pql)
{
/* fprintf(stderr, "IN: %s: malloc() failed\n", __func__); */

exit(EXIT_FAILURE);
}
else if(NULL == a1 ||
NULL == a2 ||
NULL == a0 ||
NULL == a6 )
{

/* fprintf(stderr, "LINE: %d, IN: %s: malloc() failed\n",
__LINE__, __func__); */
exit(EXIT_FAILURE);

}

PQL_insert(pql, a1);
PQL_insert(pql, a2);
PQL_insert(pql, a6);
PQL_insert(pql, a0);
PQL_print(pql);

printf("\n-------------- Adding 1 Duplicate ----------------\n\n");
PQL_insert(pql, a6);
PQL_print(pql);

return 0;
}

struct pq_element* PQL_create_element(int d)
{
struct pq_element* p = malloc(1 * sizeof*p);

if(p)
{
p->priority = d;
}

return p;
}

void PQL_insert(struct pql* s, struct pq_element* e)
{
if(NULL == s)
{

/* fprintf(stderr, "IN: %s: Can not add element to a NULL list
\n", __func__); */

return;
}
else if(NULL == e)
{

/* fprintf(stderr, "IN: %s: Can not add NULL element to a list
\n", __func__); */

return;
}
else if(NULL == s->head) /* adding very first element */
{

/* printf("IN: %s: Adding very first element\n", __func__); */

s->head = s->tail = e;
}
else
{
/* printf("IN: %s: Adding element to PQL\n", __func__); */
PQ_add_element(s,e);
}
}

/* This function works in conjuction with PQ_insert(), hecne there is no
need to check the arguments for NULL */
void PQ_add_element(struct pql* s, struct pq_element* e)
{
struct pq_element* h;

for( h = s->head; h; h = h->next)
{
if(e->priority > h->priority) /* given element has higher priroity
than head of list */
{
struct pq_element* temp = h->next;

if( temp && temp->priority == e->priority)
{

/* fprintf(stderr, "IN: %s: Element already exists, can not
add\n", __func__); */

return;
}
else
{
s->head = e;
e->next = h;
break;
}
}

else if(e->priority == h->priority) /* We got a duplicate */
{
PQL_add_duplicate(h, e);

}
else if(NULL == h->next) /* we have reached the end of list and
all elements have higher priority then given element */
{
h->next = e;
e->next = NULL;
s->tail = e; /* update tail of list */
break;
}
}
}

/* This function depends on the argumenst provided from PQ_add_element(),
it does not work standalone. Both arguments have already
been checked for NULL values in PQL_insert() & PQL_add_element().
Hence there is no need for a check here
*/
void PQL_add_duplicate(struct pq_element* pt, struct pq_element* t)
{
struct pq_element* p;
struct pq_element* q;

for(p = pt; p; p = p->next)
{
if(p->priority == t->priority)
{
printf("--------- Breaking out of loop --------------------\n");
printf("* p %d\n", p->priority);
printf("* t %d\n", t->priority);
break;
}
}

q = p->next;
p->next = t;
t->next = q;

printf("** q %d\n", q->priority);
printf("** t %d\n", t->priority);
printf("** t->next %d\n", t->next->priority);
printf("** p %d\n", p->priority);
printf("** p->next %d\n", p->next->priority);

printf("\n----------------------------------\n\n");
}

void PQL_remove(struct pql* s)
{
if(s && s->head)
{
struct pq_element* h = s->head;
s->head = s->head->next;
PQL_free_single(h);

if(NULL == s->head)
{
s->head = s->tail = NULL;
}
}
}

/* ------------------ printing and small functions ------------------- */
struct pql* PQL_init(void)
{
struct pql* p = malloc(1 * sizeof*p);
if(p) p->head = p->tail = NULL;

return p;
}

void PQL_free_single(struct pq_element* e)
{
free(e);
}

void PQL_print(struct pql* s)
{
struct pq_element* h = NULL;

for( h = s->head; h; h = h->next)
{
PQL_print_single(h);
}
}

void PQL_print_single(struct pq_element* p)
{
if(p)
{
printf("%d\n", p->priority);
}
}

======================= OUTPUT ==================================
[arnuld@dune programs]$ gcc -ansi -pedantic -Wall -Wextra PQ-list.c
[arnuld@dune programs]$ ./a.out
6
2
1
0

-------------- Adding 1 Duplicate ----------------

--------- Breaking out of loop --------------------
* p 6
* t 6
** q 2
** t 6
** t->next 2
** p 6
** p->next 2

----------------------------------

6
2
1
0
[arnuld@dune programs]$

stan

unread,

Nov 14, 2009, 12:59:44 AM11/14/09

to

I'm pretty sure this doesn't do what you think. Try reversing the
insert operations:

from

> PQL_insert(pql, a1);
> PQL_insert(pql, a2);
> PQL_insert(pql, a6);
> PQL_insert(pql, a0);
> PQL_print(pql);

to
> PQL_insert(pql, a1);
> PQL_insert(pql, a6);
> PQL_insert(pql, a2);
> PQL_insert(pql, a0);
> PQL_print(pql);

I would recommend stepping through this code once if you can't see the
problem. THe logic in PQ_add_element seems a little off.

You have a test that will always fail if the queue is maintained in
order; if e->priority > h->priority then the test temp->priority ==
e->priority doesn't make sense since temp is later in the queue and e
was greater than h is must be greater than temp.

At any rate when the test fails you then insert the node at the head
of the list, even if h is somewhere other than the first node.

By reversing things as I noted above you should see that when you add
a2 instead of going into the list before 1 it goes into the head and
effectively cuts a6 adrift for a memory leak.

I didn't follow this thread really closely but I'm pretty sure this
isn't what you wanted to do.

<didn't really look at the rest of the code snipped>