[erlang-questions] Mnesia sorting/limiting

340 views
Skip to first unread message

Barry Mitchelson

unread,
Jun 9, 2009, 4:16:52 PM6/9/09
to erlang-q...@erlang.org
Hi,

I'm relatively new to Erlang, and as a mini project to work on as I
learn, I'm writing a simple message queue. I'm having some problems
retreiving data from the mnesia table I am using to store my messages.

My job records are defined as :

-record(job, { id, body, due = 0, priority = 1 }).

and after much playing around, the code to retreive the job with the
highest priority I have is

Q1 = qlc:q([ X || X <- mnesia:table(job)]),
Q2 = qlc:sort(Q1, {order, fun(Job1, Job2) -> Job1#job.priority >
Job2#job.priority end}),
C = qlc:cursor(Q2),
R = qlc:next_answers(C,1),

I know this is not the most efficient way, but I'm struggling to find
a fast way to do this, and this is without also sorting by the due
date which I also want to be able to do. In SQL, the query would be
something like

SELECT * FROM jobs ORDER BY priority, due_at LIMIT 1

which with the right indexing would return a result very quickly even
with a large amount of rows. The code above takes a few seconds with
100,000 records in the table.

I feel like I'm missing something somewhere, but am running out of ideas.

Thanks,

Barry

________________________________________________________________
erlang-questions mailing list. See http://www.erlang.org/faq.html
erlang-questions (at) erlang.org

Ola Andersson A

unread,
Jun 10, 2009, 4:30:47 AM6/10/09
to erlang-q...@erlang.org
I was surprised to discover that there is no priority queue
implementation in OTP.
This was a few years back when I needed one. I don't know if that has
changed lately.
It seems like an obvious extension to the queue module.
I couldn't find anything using google either so I had to resort to
rolling my own using gb_trees.
That worked ok for my purposes but it could probably be made more
efficient and generalized.
Any volounteers?
/OLA.

Richard O'Keefe

unread,
Jun 10, 2009, 7:18:56 PM6/10/09
to Ola Andersson A, erlang-q...@erlang.org

On 10 Jun 2009, at 8:30 pm, Ola Andersson A wrote:

> I was surprised to discover that there is no priority queue
> implementation in OTP.

You will find an implementation of skew heaps in
http://erlang.org/pipermail/erlang-questions/2007-August/028769.html

It was on the first page of results from a Google search for
(priority queues Erlang).

According to that thread as stored at Nabble, the use of gb_trees
for priority queues was proposed by Ulf Wiger.

There is a paper "Optimal Purely Functional Priority Queues" by
Brodal and Okasaki at
http://www.brics.dk/RS/96/37/BRICS-RS-96-37.pdf
which it might be interesting to translate into Erlang; that
paper uses ML, which is strict like Erlang, not lazy like Haskell.
However, I'd draw your attention to a sentence in that paper:
"Our data structure is reasonably ecient in practice;
however, there are several competing data structures that, although
not asymptotically optimal, are somewhat faster than ours in practice."

We had a visitor here for a month or two a couple of years ago who
was interested in priority queue algorithms. He was evalluating a
recent theoretically improved algorithm, and was quite pleased that
his implementation of it was _by actual measurement_ faster than
several other moderately recent contenders. I said "why aren't
you including the old heap-in-an-array data structure in your
benchmarks? I think you really OUGHT to include the old heap-in-
an-array data structure in your benchmarks!" So he did. And it
was about 10 times faster than the _theoretically_ better ones.

If all you need is new(), empty(), insert(), find_min(), and
remove_min(), something like skew heaps or perfectly balanced
heaps will probably be hard to beat. If you need fancier things
like O(1) meld(), then you will need something like the
Brodal/Okasaki data structure.

Barry Mitchelson

unread,
Jun 11, 2009, 8:52:58 AM6/11/09
to Richard O'Keefe, Ola Andersson A, erlang-q...@erlang.org

Thanks for the interesting answers, aside from this specific problem,
the more general question I'm trying to find an answer is how to
select data from an mnesia table ordered by a certain element and
limited without having to go through the entire table when I perform
the sort. I'm not sure if I'm trying to shoehorn my SQL thinking into
this, or if I'm not aware of the solution.

Thanks,

Barry

Ulf Wiger

unread,
Jun 11, 2009, 9:12:11 AM6/11/09
to Barry Mitchelson, erlang-q...@erlang.org
Barry Mitchelson wrote:
>
> ...aside from this specific problem,

> the more general question I'm trying to find an answer is how to
> select data from an mnesia table ordered by a certain element and
> limited without having to go through the entire table when I perform
> the sort. I'm not sure if I'm trying to shoehorn my SQL thinking into
> this, or if I'm not aware of the solution.

The easiest way to do this would be to use an ordered set,
using an entry like {{Priority, Timestamp}, Reference},
where Reference is some reference to whatever it is that
you have queued.

If you want to have ascending order on priority and LIFO
on time, you can use the following trick:

timestamp() ->
{A,B,C} = erlang:now(),
{-A,-B,-C}.

(FIFO is more common, of course, but sometimes LIFO makes more
sense in practice, along the lines of "serve the ones most
likely to succeed".)

Then you can use mnesia:dirty_first(Tab) to pick the first item
from the queue.

If you want to limit the number of entries, mnesia:table_info(Tab,size)
will give you the number of objects, and you can delete the last item
when you push a new one, once you've hit the limit.

BR,
Ulf W


--
Ulf Wiger
CTO, Erlang Training & Consulting Ltd
http://www.erlang-consulting.com

Paul Mineiro

unread,
Jun 11, 2009, 10:51:16 AM6/11/09
to erlang-q...@erlang.org

This will work, and if the volume is low it'll work great, and it's easy
to get going. When we scaled this up, we discovered that we were doing
one or both of the following alot: 1) insert at the front or 2) delete
from the back. Since the underlying implementation is trying to be a
balanced tree, this leads to alot of rebalancing, which slows things down
(doubly so since our ordered set was on disk). We switched to specialized
data structures at that point.

I feel obliged to mention that premature optimization is evil. The
point of this post is just to give you something to measure to see if a
problem is occurring. Please start as Ulf recommends.

-- p

Reply all
Reply to author
Forward
0 new messages