Implementing R-trees in Hypertable

66 views
Skip to first unread message

Doug Judd

unread,
May 19, 2008, 1:30:55 PM5/19/08
to hyperta...@googlegroups.com
There have been a number of potential Hypertable applications that have cropped up in the area of GIS.  Most of these applications require a "geospacial" index.  The R-tree appears to be one of the more popular techniques for implementing this kind of index.  Here I describe how to implement an R-tree in Hypertable.

There are really two main problems involved in implementing R-trees (or any tree) in Hypertable:

1. How is the tree represented?
2. How do you ensure that the tree is always consistent during modifications?

Representation

Trees in Hypertable can be implemented using a table where each node is represented as a row.  Each node in the tree is assigned a unique ID which is used as the row key.  Each internal node will have a "Children" column family that contains a list of the following tuples:

(MBR, NodeID)

MBR is the minimum bounding rectangle of the child subtree and NodeID is the node ID for the child which is the root of the subtree.

Leaf nodes will have an "Objects" column family that contains a list of the following tuples:

(MBR, ObjectKey)

MBR is the minimum bounding rectangle of the object.  The second part of the tuple, ObjectKey, is a row key reference to the object in some other table.  Also, all nodes will have a "Parent" column which contains the Node ID of its parent.

Maintaining Consistency

To maintain consistency during tree modifications, a special "tree updater" process is introduced which is responsible for all tree modifications.  By having only one of these processes in the system, we avoid concurrent modification problems.

== Inserts ==

The way R-trees handle inserts is similar to B-tree insertion.  The object is pushed down into a leaf node and splits propagate up the tree.  The following is a rough description of how the insert and split propagation would work.  I've left out the MBR recomputation logic and split algorithms because they are no different that those described in the r-tree or r*-tree papers.

Let's call the node that the object is being inserted into N and it's parent P.  If inserting the object results in N splitting, then the split will generate two new nodes, N1 and N2.  The new nodes will be created by invoking the splitting algorithm and allocating new node IDs.  At this point, the entry for N in P's Children column will get replaced with two new entries, one for N1 and the other for N2.  If this results in P splitting, then the process continues recursively up the tree.  Once a node P is reached that does not need splitting, then it will have its Children column modified to replace N with N1 and N2.

The important thing to note here is that the recursive process of splitting causes two new subtrees to be built up, N1 and N2.  These two subtrees are built up outside of the live tree.  Any new nodes that are created will have new node IDs and will reference other nodes in the existing tree.  The existing tree will not get modified during the splitting process until the end when the one parent node gets updated to include the new N1 and N2.  Since row operations are atomic, the live tree will be modified atomically and should never be inconsistent.

Also, the nodes that got replaced by N1 nd N2 will have to get garbage collected once there are no pending lookups that could be referencing them.

== Deletes ==

Deletes can happen in the same way described in the r-tree paper (CondenseTree algorithm).  According to the paper, when a leaf node becomes "under-full" it will be removed from the tree and all of its objects will get re-inserted into the tree using the normal insertion algorithm.  This assumes that some sort of write lock is acquired which prevents lookups from happening concurrently.  Otherwise there would be a period where some of the objects disappear.

Here's a solution to allow lookups to happen concurrently with deletes.  When a node is to be deleted and all of it's descendant objects are to get re-inserted, then it can get marked "phantom".  When a node is marked "phantom" it will be ignored from subsequent insert operations.  During lookups, "phantom" nodes will be considered.  However, this creates a situation during a delete in which objects appear in the tree twice, once under the "phantom" node and again in its new location.  The lookup operation will therefore have to account for this and only return one instance of each object.

- Doug

Luke

unread,
May 19, 2008, 2:29:00 PM5/19/08
to Hypertable Development
Well, would you implement BTree explicitly on hypertable? I suspect
that for most applications a simpler approach would be: all nodes are
flatten to
key value pairs with rowkey starts with <padded_lat><padded_long>...
and that MBR lookups are just range scans, and that maintenance ops
are just regular insert/delete/update.

Doug Judd

unread,
May 19, 2008, 2:38:08 PM5/19/08
to hyperta...@googlegroups.com
The problem with this approach is that with an R-tree there is no "order" of objects.  It's not possible to flatten to key/value pairs such that objects that are closer in proximity sort next to each other.  You could end up scanning the entire list to find all overlapping objects.

- Doug

Luke

unread,
May 19, 2008, 3:51:32 PM5/19/08
to Hypertable Development
Yeah, the simple mapping is definitely not optimal (need to scan more
long entries than necessary for the table is order by (lat,long)) but
much simpler to implement. You can guestimate the scan size based on
the size of scan BR, if it's not "too large", just use the simple
scan. It's possible to use only one addtional aux index tables to
facilitate large BR scans: (long, lat) from which you can construct
parallel scans on the main table. The performance will be probably
worse than properly implemented rtree, because the latter takes
advantage of storage locality better. Would love to see someone play
with it and come up with some benchmark :)

Doug Judd

unread,
May 19, 2008, 4:17:52 PM5/19/08
to hyperta...@googlegroups.com
With 100-way fan out, to hold 10 Billion objects, the r-tree will have a height of five.  If there is any locality of reference, most of the internal nodes would be cached.

Luke

unread,
May 19, 2008, 4:18:12 PM5/19/08
to Hypertable Development
In order to take advantage of rtree's storage locality, object key
should be prefixed with parent, so they're stored together.

On May 19, 10:30 am, "Doug Judd" <d...@zvents.com> wrote:

Doug Judd

unread,
May 19, 2008, 4:27:16 PM5/19/08
to hyperta...@googlegroups.com
Good point.  I guess the only disadvantage to doing it that way is that by concentrating locality, you create hot spots.  You can imagine all of the leaves for a particular region (e.g. Manhattan) landing on a single node.  It might be better to not have locality and have the leaves scattered randomly throughout the system to share the query burden (using Query cache to minimize I/O).

Luke

unread,
May 19, 2008, 4:39:09 PM5/19/08
to Hypertable Development
Well, hotspots are handled by various caches anyway. Spatial storage
locality will help the throughput of analytics scans for areas/
regions.

Doug Judd

unread,
May 19, 2008, 4:48:12 PM5/19/08
to hyperta...@googlegroups.com
True enough.  Plus it makes block cache useful.
Reply all
Reply to author
Forward
0 new messages