data structures for efficient range queries

2,680 views
Skip to first unread message

nchubrich

unread,
Oct 19, 2009, 6:54:19 PM10/19/09
to Clojure
I need to make a data structure for a query such as "find everything
that is priced $3.27 - $6.12" (and perhaps sum up the total revenue
for all items in that price range). The naive way would be to make an
array with one slot for each increment in the entire range, and have
each slot pointing to a bucket with all items at that increment (here,
price). But this would not be terribly efficient or feasible over
very large ranges (imagine you wanted to query both large and small
ranges: think of distances on a galactic, planetary, continental,
city, human, etc. scale. Do you make multiple indices for coarse-
grained and small-grained queries? Do you have coarse-grained slots
pointing to smaller ranges, and so on ad infinitum? But that would
seem to make evaluating large ranges very inefficient). Is there a
cleverer/Clojurisher way to do it? The price example is what I'm
doing, so I don't really need a galactiscalable data structure....

Thanks-----

Nick

Richard Newman

unread,
Oct 19, 2009, 7:38:44 PM10/19/09
to clo...@googlegroups.com
Two simple approaches:

1. Use a sorted, random-access data structure. A vector will suffice.
Store your data in this. This is good if your data doesn't change much.

Find the extremes of a range by binary search and linear walking. If
that's too slow, build a metaindex, which is simply a precalculated
binary search to a certain depth.

A lookup is then the computation of two indices. Your values lie
between those indices.


2. Store your values in a tree. Each node stores the range it
contains. Navigate the tree appropriately to find matching values.
This will take more space than the array.

Daniel Renfer

unread,
Oct 19, 2009, 7:38:59 PM10/19/09
to clo...@googlegroups.com

Would it be possible to use a hash-map with the prices as the keys and vectors of items as your values? That way you get efficient access to your values if you know the price and aren't paying for the empty space.

Alex Osborne

unread,
Oct 19, 2009, 9:13:41 PM10/19/09
to clo...@googlegroups.com
nchubrich wrote:
> I need to make a data structure for a query such as "find everything
> that is priced $3.27 - $6.12" (and perhaps sum up the total revenue
> for all items in that price range).


That's one of the things sorted maps are for:

(let [objects-by-price (sorted-map 0.50 :cookie, 5.0 :lunch,

6.10 :movie, 200.0 :tv)]

(take-while #(<= (key %) 6.12) (subseq objects-by-price >= 3.27)))

=> ([5.0 :lunch] [6.1 :movie])


Alex Osborne

unread,
Oct 19, 2009, 9:30:08 PM10/19/09
to clo...@googlegroups.com

>> The naive way would be to make an
>> array with one slot for each increment in the entire range, and have
>> each slot pointing to a bucket with all items at that increment (here,
>> price). But this would not be terribly efficient or feasible over
>> very large ranges


Also, if you're wondering how sorted-map makes that fast, see:

http://en.wikipedia.org/wiki/Binary_search_tree

It just does a search and then traverses the tree from that point, so it
takes O((log n) + m) time where n is the size of the tree and m is the
number of items your query returns.

Richard's suggestion of binary search in a sorted array is also a good
option but is a bit of a pain if you need to add new items to the array.

http://en.wikipedia.org/wiki/Binary_search

Timothy Pratley

unread,
Oct 19, 2009, 10:08:10 PM10/19/09
to Clojure
If the thing you want to index by is not unique, you could do...
something like:

(def m (atom (sorted-map)))
(def rm (atom (sorted-map)))

(defn add
[k v]
(swap! m assoc k v)
(swap! rm assoc v (conj (get @rm v []) k)))

(add 1 "bbb")
(add 2 "ccc")
(add 3 "aaa")
(add 4 "aaa")

; @rm is now {"aaa" [3 4], "bbb" [1], "ccc" [2]}

This seems simpler to code than a sorted vector index, but would
require more memory. However you can quickly lookup by key or by index
or by range.

Volkan YAZICI

unread,
Oct 20, 2009, 3:15:09 AM10/20/09
to clo...@googlegroups.com

I think what you are after is an interval tree[1] data structure. You
might find a suitable Java library or implement yours easily.


Regards.

[1] http://en.wikipedia.org/wiki/Interval_tree

nchubrich

unread,
Oct 19, 2009, 7:55:32 PM10/19/09
to Clojure
Sounds like the tree is what I need (since the data will be changing
quite a bit). As for using a hash-map: wouldn't you need a sorted map
to be able to pull out all the keys in a range? And then are the seq
functions efficient for this? (I.E. if you drop a number of them to
get to the beginning of the range, is the drop O(1) or O(n)?)
Thanks for your help!

nchubrich

unread,
Oct 19, 2009, 11:19:04 PM10/19/09
to Clojure
Thanks for the advice, everyone! Timothy, I guess if you had non-
uniqueness but didn't care to have it indexed in more than one way you
could just take Alex's example and have his maps point to vectors,
right?


Timothy Pratley

unread,
Oct 20, 2009, 8:43:57 AM10/20/09
to Clojure
Yup!

Stuart Sierra

unread,
Oct 20, 2009, 11:16:11 AM10/20/09
to Clojure
On Oct 20, 3:15 am, Volkan YAZICI <volkan.yaz...@gmail.com> wrote:
> I think what you are after is an interval tree[1] data structure. You
> might find a suitable Java library or implement yours easily.
>
> Regards.
>
> [1]http://en.wikipedia.org/wiki/Interval_tree

Yes, I think an interval tree is what you are looking for here.

-SS

Sean Devlin

unread,
Oct 20, 2009, 3:34:22 PM10/20/09
to Clojure
This is standard w/ Java 6.

http://java.sun.com/javase/6/docs/api/java/util/NavigableMap.html

On Oct 20, 11:16 am, Stuart Sierra <the.stuart.sie...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages