I'm relatively new to Erlang, and as a mini project to work on as I
learn, I'm writing a simple message queue. I'm having some problems
retreiving data from the mnesia table I am using to store my messages.
My job records are defined as :
-record(job, { id, body, due = 0, priority = 1 }).
and after much playing around, the code to retreive the job with the
highest priority I have is
Q1 = qlc:q([ X || X <- mnesia:table(job)]),
Q2 = qlc:sort(Q1, {order, fun(Job1, Job2) -> Job1#job.priority >
Job2#job.priority end}),
C = qlc:cursor(Q2),
R = qlc:next_answers(C,1),
I know this is not the most efficient way, but I'm struggling to find
a fast way to do this, and this is without also sorting by the due
date which I also want to be able to do. In SQL, the query would be
something like
SELECT * FROM jobs ORDER BY priority, due_at LIMIT 1
which with the right indexing would return a result very quickly even
with a large amount of rows. The code above takes a few seconds with
100,000 records in the table.
I feel like I'm missing something somewhere, but am running out of ideas.
Thanks,
Barry
________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org
> I was surprised to discover that there is no priority queue
> implementation in OTP.
You will find an implementation of skew heaps in
http://erlang.org/pipermail/erlang-questions/2007-August/028769.html
It was on the first page of results from a Google search for
(priority queues Erlang).
According to that thread as stored at Nabble, the use of gb_trees
for priority queues was proposed by Ulf Wiger.
There is a paper "Optimal Purely Functional Priority Queues" by
Brodal and Okasaki at
http://www.brics.dk/RS/96/37/BRICS-RS-96-37.pdf
which it might be interesting to translate into Erlang; that
paper uses ML, which is strict like Erlang, not lazy like Haskell.
However, I'd draw your attention to a sentence in that paper:
"Our data structure is reasonably ecient in practice;
however, there are several competing data structures that, although
not asymptotically optimal, are somewhat faster than ours in practice."
We had a visitor here for a month or two a couple of years ago who
was interested in priority queue algorithms. He was evalluating a
recent theoretically improved algorithm, and was quite pleased that
his implementation of it was _by actual measurement_ faster than
several other moderately recent contenders. I said "why aren't
you including the old heap-in-an-array data structure in your
benchmarks? I think you really OUGHT to include the old heap-in-
an-array data structure in your benchmarks!" So he did. And it
was about 10 times faster than the _theoretically_ better ones.
If all you need is new(), empty(), insert(), find_min(), and
remove_min(), something like skew heaps or perfectly balanced
heaps will probably be hard to beat. If you need fancier things
like O(1) meld(), then you will need something like the
Brodal/Okasaki data structure.
Thanks for the interesting answers, aside from this specific problem,
the more general question I'm trying to find an answer is how to
select data from an mnesia table ordered by a certain element and
limited without having to go through the entire table when I perform
the sort. I'm not sure if I'm trying to shoehorn my SQL thinking into
this, or if I'm not aware of the solution.
Thanks,
Barry
The easiest way to do this would be to use an ordered set,
using an entry like {{Priority, Timestamp}, Reference},
where Reference is some reference to whatever it is that
you have queued.
If you want to have ascending order on priority and LIFO
on time, you can use the following trick:
timestamp() ->
{A,B,C} = erlang:now(),
{-A,-B,-C}.
(FIFO is more common, of course, but sometimes LIFO makes more
sense in practice, along the lines of "serve the ones most
likely to succeed".)
Then you can use mnesia:dirty_first(Tab) to pick the first item
from the queue.
If you want to limit the number of entries, mnesia:table_info(Tab,size)
will give you the number of objects, and you can delete the last item
when you push a new one, once you've hit the limit.
BR,
Ulf W
--
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com
This will work, and if the volume is low it'll work great, and it's easy
to get going. When we scaled this up, we discovered that we were doing
one or both of the following alot: 1) insert at the front or 2) delete
from the back. Since the underlying implementation is trying to be a
balanced tree, this leads to alot of rebalancing, which slows things down
(doubly so since our ordered set was on disk). We switched to specialized
data structures at that point.
I feel obliged to mention that premature optimization is evil. The
point of this post is just to give you something to measure to see if a
problem is occurring. Please start as Ulf recommends.
-- p