Contrib proposal - "Priority Map"

63 views
Skip to first unread message

Mark Engelberg

unread,
Jun 30, 2010, 5:59:28 PM6/30/10
to clojure-dev
Is it a map? Is it a priority queue? It's both, thus the name
"priority map". Here's the blurb:

<blurb>
A priority map is very similar to a sorted map, but whereas a sorted
map produces a
sequence of the entries sorted by key, a priority map produces the
entries sorted by value.
In addition to supporting all the functions a sorted map supports, a
priority map
can also be thought of as a queue of [item priority] pairs. To support usage as
a versatile priority queue, priority maps also support conj/peek/pop operations.

The standard way to construct a priority map is with priority-map:
user=> (def p (priority-map :a 2 :b 1 :c 3 :d 5 :e 4 :f 3))
#'user/p
user=> p
{:b 1, :a 2, :c 3, :f 3, :e 4, :d 5}

So :b has priority 1, :a has priority 2, and so on.
Notice how the priority map prints in an order sorted by its
priorities (i.e., the map's values)

We can use assoc to assign a priority to a new item:
user=> (assoc p :g 1)
{:b 1, :g 1, :a 2, :c 3, :f 3, :e 4, :d 5}

or to assign a new priority to an extant item:
user=> (assoc p :c 4)
{:b 1, :a 2, :f 3, :c 4, :e 4, :d 5}

We can remove an item from the priority map:
user=> (dissoc p :e)
{:b 1, :a 2, :c 3, :f 3, :d 5}

An alternative way to add to the priority map is to conj a [item priority] pair:
user=> (conj p [:g 0])
{:g 0, :b 1, :a 2, :c 3, :f 3, :e 4, :d 5}

or use into:
user=> (into p [[:g 0] [:h 1] [:i 2]])
{:g 0, :b 1, :h 1, :a 2, :i 2, :c 3, :f 3, :e 4, :d 5}

Priority maps are countable:
user=> (count p)
6

Like other maps, equivalence is based not on type, but on contents.
In other words, just as a sorted-map can be equal to a hash-map,
so can a priority-map.
user=> (= p {:b 1, :a 2, :c 3, :f 3, :e 4, :d 5})
true

You can test them for emptiness:
user=> (empty? (priority-map))
true
user=> (empty? p)
false

You can test whether an item is in the priority map:
user=> (contains? p :a)
true
user=> (contains? p :g)
false

It is easy to look up the priority of a given item, using any of the
standard map mechanisms:
user=> (get p :a)
2
user=> (get p :g 10)
10
user=> (p :a)
2
user=> (:a p)
2

Priority maps derive much of their utility by providing priority-based seq.
Note that no guarantees are made about the order in which items of the
same priority appear.
user=> (seq p)
([:b 1] [:a 2] [:c 3] [:f 3] [:e 4] [:d 5])
Because no guarantees are made about the order of same-priority items, note that
rseq might not be an exact reverse of the seq. It is only guaranteed to be in
descending order.
user=> (rseq p)
([:d 5] [:e 4] [:c 3] [:f 3] [:a 2] [:b 1])

user=> (subseq p < 3)
([:b 1] [:a 2])
user=> (subseq p >= 3)
([:c 3] [:f 3] [:e 4] [:d 5])

This means first/rest/next/for/map/etc. all operate in priority order.
user=> (first p)
[:b 1]
user=> (rest p)
([:a 2] [:c 3] [:f 3] [:e 4] [:d 5])

Priority maps support metadata:
user=> (meta (with-meta p {:extra :info}))
{:extra :info}

But perhaps most importantly, priority maps can also function as
priority queues.
peek, like first, gives you the first [item priority] pair in the collection.
pop removes the first [item priority] from the collection.
(Note that unlike rest, which returns a seq, pop returns a priority map).

user=> (peek p)
[:b 1]
user=> (pop p)
{:a 2, :c 3, :f 3, :e 4, :d 5}

It is also possible to use a custom comparator:
user=> (priority-map-by (comparator >) :a 1 :b 2 :c 3)
{:c 3, :b 2, :a 1}

All of these operations are efficient. Generally speaking, most operations
are O(log n) where n is the number of distinct priorities. Some operations
(for example, straightforward lookup of an item's priority, or testing
whether a given item is in the priority map) are as efficient
as Clojure's built-in map.

The key to this efficiency is that internally, not only does the
priority map store
an ordinary hash map of items to priority, but it also stores a sorted map that
maps priorities to sets of items with that priority.

A typical textbook priority queue data structure supports at the ability to add
a [item priority] pair to the queue, and to pop/peek the next [item
priority] pair.
But many real-world applications of priority queues require more features, such
as the ability to test whether something is already in the queue, or to reassign
a priority. For example, a standard formulation of Dijkstra's
algorithm requires the
ability to reduce the priority number associated with a given item. Once you
throw persistence into the mix with the desire to adjust priorities,
the traditional
structures just don't work that well.

This particular blend of Clojure's built-in hash sets, hash maps, and
sorted maps
proved to be a great way to implement an especially flexible
persistent priority queue.

Connoisseurs of algorithms will note that this structure's peek
operation is not O(1) as
it would be if based upon a heap data structure, but I feel this is a
small concession for
the blend of persistence, priority reassignment, and priority-sorted
seq, which can be
quite expensive to achieve with a heap (I did actually try this for
comparison). Furthermore,
this peek's logarithmic behavior is quite good (on my computer I can
do a million
peeks at a priority map with a million items in 750ms). Also,
consider that peek and pop
usually follow one another, and even with a heap, pop is logarithmic.
So the net combination
of peek and pop is not much different between this versatile
formulation of a priority map and
a more limited heap-based one. In a nutshell, peek, although not
O(1), is unlikely to be the
bottleneck in your program.

All in all, I hope you will find priority maps to be an easy-to-use
and useful addition
to Clojure's assortment of built-in maps (hash-map and sorted-map).
</blurb>


Now, the blurb above contains one little lie. subseq (and by
extension rsubseq) don't quite work properly. subseq doesn't work
when you ask for the sequence > than a priority which has multiple
items associated with that priority (for rsubseq the problem occurs
with <). The reason is that the implementation of subseq assumes
(understandably) that the things which you're sorting by are unique.
So to get a subseq of all entries strictly greater than 3, it starts
by getting the sequence of all entries greater than or equal to 3. If
the first entry is equal to 3, it assumes this is the only entry equal
to 3, and it just calls next on the sequence. This strategy fails for
this particular structure.

I believe it is impossible to make subseq and rsubseq work for
priority maps without patching the core in some way.

The simplest way to patch the core is to change the calls to next into
calls to drop-while. This is what I did on my own system; it's a
simple fix and it works like a charm. While I was in there patching
subseq/rsubseq, I noticed that contrary to what their names suggest,
subseq and rsubseq don't actually always return a seq. In some cases,
the existing implementation returns () rather than nil. So while
making the patch for my own purposes, I also added a call to seq to
ensure that the result is in fact a seq.

A more elegant solution is to modify Sorted's seqFrom method to take
an additional boolean flag (in addition to ascending) called something
like "includes" which is a flag as to whether it should include the
starting number or not. So seqFrom(3,true,true) produces entries up
from and including 3, seqFrom(3,true,false) produces entries up from
but not including 3, etc. This allows subseq and rsubseq to just call
seqFrom with the appropriate args and let the Sorted implementation do
all the work, allowing for more flexibility in implementation by this
and other sorted data structures. For example, the priority map has
the ability to produce entries with priority strictly greater than 3
more efficiently than producing a sequence of entries greater than or
equal to 3 and using drop-while to eliminate the equal entries, so
putting the priority map in control of implementing these details
makes a lot of sense. Furthermore, if the inclusive/exclusive logic
is pushed into the Sorted implementation, sorted maps and sets could
continue to use the "next" strategy to skip over a single equal
element in their own internal implementation of Sorted, so behavior
and performance would remain identical for those data structures.

Of course, if no patch to subseq/rsubseq is permitted, then I'll just
have to remove them as supported by the priority map data structure,
but I think it would be a shame to lose out on that capability.

Code for priority map is attached, patch to core can be provided upon request.

This is something I've needed on several occasions, and I believe I'm
not the only one. I've seen other people asking about priority queue
implementations from time to time. Since I thought this might be
generally useful, I've tried to figure out how to make it interoperate
with Clojure's core interfaces as seamlessly as possible.

Comments welcome.

--Mark

priority_map.clj

Laurent PETIT

unread,
Jun 30, 2010, 6:10:04 PM6/30/10
to cloju...@googlegroups.com
Hi,

I just have on word: wow !

and one (newbie) question also: If I understand well, priorities are
on vals, and values put on the "queue" are keys.

So this prevents "pushing" the same value twice on the queue, right ?

Is this a theoretical problem, or do you see limitations to the use in
real world problems ?

2010/6/30 Mark Engelberg <mark.en...@gmail.com>:

> --
> You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
> To post to this group, send email to cloju...@googlegroups.com.
> To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.
>
>

Mark Engelberg

unread,
Jun 30, 2010, 6:40:21 PM6/30/10
to cloju...@googlegroups.com
On Wed, Jun 30, 2010 at 3:10 PM, Laurent PETIT <lauren...@gmail.com> wrote:
> Hi,
>
> I just have on word: wow !
>
> and one (newbie) question also: If I understand well, priorities are
> on vals, and values put on the "queue" are keys.

Yes, "items put on the queue" are the map keys and priorities are the map vals.

> So this prevents "pushing" the same value twice on the queue, right ?

Correct, putting the same item into the queue with a new priority is
effectively a priority reassignment. This also makes a priority map
behave exactly as a map (with the restriction that the vals must be
comparable to one another), so hopefully the behavior is pretty
intuitive.

>
> Is this a theoretical problem, or do you see limitations to the use in
> real world problems ?

I use priority-based searching a lot in my work. Before switching to
Clojure, I did most of this work in Python, and just used the heap
module that comes with Python. I was constantly wanting to be able to
ask whether an item was already in the queue, and reassign priorities,
and I had to develop elaborate workarounds because I couldn't.

I have never wanted the behavior of being able to put the same item
twice into a priority queue with different priorities. That's because
one of the most common uses of priority queues is "priority-based
search" in which the priority represents the cost/distance it takes to
reach a given node. When you add the node again to the priority queue
with a new cost/distance, it means you've found a better path -- you
really don't want the old outdated information in the priority queue.

I suppose if you really wanted to add items multiple times, you could
always add [[item (gensym)] priority] pairs to your priority queue,
and ignore the gensym when pulling it out. Or use some other scheme
in which your items contain a unique label so they are treated as
unique keys in the priority map.

So I view this behavior as a feature, not a limitation.

One reason I called this a "priority map" is to leave the door open
for creating a more traditional "priority queue" that has a more
limited, traditional set of queue operations but is ultra-optimized
for those particular operations. I imagine that such a "priority
queue" would support only conj, peek, pop, and would not support any
of these other map-like operations including priority reassignment.
It seems likely that if you added the same item twice to such a
structure, it would just get added a second time, mainly because it's
too inefficient to do anything else. seq would probably have to
return the items in a random order, and peek would be the only way to
get the "next item" in the priority queue. It's certainly an option
to create such a thing (I've already done some preliminary work on
this), and I think there's definitely room for both possibilities to
be offered within the contrib library, but the behavior of priority
map is the one I was most excited about building first because it is
the most directly relevant to me and my work. Based on my own
experience, I predict that most people will find this well-rounded
priority map formulation to be more useful than a standard "priority
queue" that supports only a few basic operations.

Mark Engelberg

unread,
Jun 30, 2010, 6:57:14 PM6/30/10
to cloju...@googlegroups.com
On Wed, Jun 30, 2010 at 3:40 PM, Mark Engelberg
<mark.en...@gmail.com> wrote:
> I suppose if you really wanted to add items multiple times, you could
> always add [[item (gensym)] priority] pairs to your priority queue,
> and ignore the gensym when pulling it out.  Or use some other scheme
> in which your items contain a unique label so they are treated as
> unique keys in the priority map.

Or another idea just occurred to me that you could put the items in
atoms before putting them in the priority map, and deref them
immediately upon taking them out. This would be another way that two
identical items would become "unique" as keys in the map. Again, I'm
not sure how often this behavior would be desired, but it's good to
know there are so many ways to adapt a priority map to this usage if
it is needed for some reason.

Laurent PETIT

unread,
Jun 30, 2010, 6:58:25 PM6/30/10
to cloju...@googlegroups.com
2010/7/1 Mark Engelberg <mark.en...@gmail.com>:

yes, thanks for the clarifications !

Jason Wolfe

unread,
Jun 30, 2010, 10:37:56 PM6/30/10
to Clojure Dev
+1. I use queues like this all the time in my work. Currently I'm
using mutable queues from a Java library, but it would much nicer to
have persistent queues in Clojure itself. Thanks for the great work!

My only minor gripe is with (= priority-map hash-map). Presumably
part of the contract of a priority-map is that (seq priority-map)
gives you the k-v pairs in order, and the proposed semantics mean that
this contract may be broken by objects = to the priority-map in
question (namely other kinds of maps, and priority-maps with different
comparators).
But, I guess this is really a gripe with sorted-set, so take it with
many grains of salt...

-Jason
>  priority_map.clj
> 22KViewDownload

Mark Engelberg

unread,
Jul 14, 2010, 2:57:33 PM7/14/10
to cloju...@googlegroups.com
In my initial post about priority maps, I mentioned that it will only
be possible to properly implement subseq on priority maps with a patch
to the core. subseq achieves its polymorphic behavior (over sorted
maps and sorted sets) by delegating much of its behavior to the Java
methods seqFrom, seq, entryKey, and comparator. So in theory, you
should be able to extend subseq to work for any new collection by
customizing those methods. However, the current implementation of
subseq assumes that the values you are sorting on are unique, which is
a reasonable assumption for the built-in sorted maps which sort on
unique keys, but it is then impossible to extend subseq behavior to
other types of collections. To give library designers the power to
hook into subseq, this assumption needs to be removed.

I proposed two possible ways to patch subseq:
1. The simple way is to allow for dropping multiple equal values when
subseq massages the results of seqFrom into a strictly greater than
sequence.
2. The better way is for subseq to delegate more of its logic to
seqFrom (by adding an extra flag to the method call as to whether you
want greater than or equal or strictly greater than). This will allow
for the best efficiency, and provide the greatest possibilities for
future extensions.

I also pointed out that subseq should probably be patched anyway,
because the name implies that it returns a seq, but it sometimes
returns empty rather than nil.

I would like to encourage more discussion about this from the "keepers
of the core", and want to know whether one of the above patches would
be welcome.

Rich Hickey

unread,
Jul 16, 2010, 9:05:16 AM7/16/10
to Clojure Dev
Thanks Mark, this looks very cool and useful!

Yes, please submit as a patch, hopefully we can get it into 1.3

However, let's leave the subseq issues for a second round of patches,
i.e. a first patch should not support subseq. This will give us some
time to think about how best to support subseq.

Rich

Mark Engelberg

unread,
Jul 17, 2010, 1:58:03 AM7/17/10
to cloju...@googlegroups.com
On Fri, Jul 16, 2010 at 6:05 AM, Rich Hickey <richh...@gmail.com> wrote:
> Yes, please submit as a patch, hopefully we can get it into 1.3

I started to make a patch today, and realized that I'm currently
relying on the new ^:static function notation which isn't yet
available in the master branch, in order to route the deftype's method
implementations to type-hinted static functions without a performance
penalty. Should I rewrite it to avoid this feature, or is it likely
enough that this will be added in that it makes more sense to just tag
the patch in some way as requiring static functions?

Mark Engelberg

unread,
Jul 17, 2010, 3:24:54 AM7/17/10
to cloju...@googlegroups.com
Never mind, I went ahead and rewrote it without ^:static by pushing
the logic into the deftype rather than breaking it out to separate
function calls.

Mark Engelberg

unread,
Jul 17, 2010, 6:02:05 AM7/17/10
to cloju...@googlegroups.com
On Fri, Jul 16, 2010 at 6:05 AM, Rich Hickey <richh...@gmail.com> wrote:
> However, let's leave the subseq issues for a second round of patches,
> i.e. a first patch should not support subseq. This will give us some
> time to think about how best to support subseq.

OK, I've edited my priority map so it does not try to support subseq,
and I'll be submitting the patch to include this soon.

Meanwhile, to initiate discussion on how to modify subseq, I've
attached a proposed patch. This patch works by modifying the seqFrom
method of the Sorted interface to take an additional "inclusive"
parameter (i.e., <= and >= are inclusive, < and > are not).

In this patch, I do not address one issue I raised before, which is
whether subseq implies by its name that it should return a seq rather
than a sequence (in other words nil rather than ()). If seq behavior
is desired, it would be necessary to wrap a call to seq around the
calls to take-while. But for now, I'm just making the behavior match
the current behavior.

Although I think this is the cleanest way to address the extensibility
issue with subseq, the change to seqFrom will break anyone who
currently is overriding that method. I didn't see any such classes in
clojure.contrib, so I don't think it's an issue, but if this is a
concern, my other idea is to fix the problem entirely within subseq by
changing the calls to next into calls to drop-while. I have coded
this to confirm that it works, and can provide that alternative patch
if desired.

subseq.diff

Mark Engelberg

unread,
Aug 19, 2010, 2:44:14 PM8/19/10
to cloju...@googlegroups.com
> On Fri, Jul 16, 2010 at 6:05 AM, Rich Hickey <richh...@gmail.com> wrote:
>> However, let's leave the subseq issues for a second round of patches,
>> i.e. a first patch should not support subseq. This will give us some
>> time to think about how best to support subseq.

Last month, I posted a proposed patch to subseq that would make it
possible to hook into subseq for alternative sorted data structures
such as the priority map I put in contrib. I know everyone's been
busy getting 1.2 out the door, but I'm curious, has anyone had a
chance to take a look at the proposed patch?

Thanks,

Mark

Stuart Halloway

unread,
Aug 20, 2010, 11:02:20 AM8/20/10
to cloju...@googlegroups.com
Hi Mark,

Just did. Also made a ticket for it https://www.assembla.com/spaces/clojure/tickets/428-subseq--rsubseq-enhancements-to-support-priority-maps-.

It seems reasonable, but I am concerned that we might need more of an abstraction than just adding a boolean flag to seqFrom. The function takes a "key" argument, but isn't it now acting as a value? Will discuss with Rich at next team meeting.

Stu

Reply all
Reply to author
Forward
0 new messages