How to implement Segment Tree in key/value datastore to use Google S2 geometry?

308 views
Skip to first unread message

Constantine Vassilev

unread,
Jan 13, 2018, 11:48:11 PM1/13/18
to golang-nuts
I need to implement the following:

1) I have a database of ~100 mil records.

Each record is of type key/value.

The key is an internal ID of type: "123456789".
The value is a kml with geographic boundary in form of lat/lon coordinates:

                 <?xml version=\"1.0\" encoding=\"UTF-8\"?>
                 <kml xmlns=\"http://www.opengis.net/kml/2.2\">
                   <Placemark>
                       <Polygon>
                             <outerBoundaryIs>
                                     <LinearRing>
                                               <coordinates>
                                              -117.587174583104,33.6437620515745 
                                              -117.587083176312,33.6437891755388 
                                              -117.587016626677,33.6438098355405 
                                              -117.586923617992,33.6435949397585 
                                              -117.587051757569,33.643539681552 
                                              -117.587079610599,33.6435348310862 
                                              -117.587174583104,33.6437620515745
                                               </coordinates>
                                     </LinearRing>      
                              </outerBoundaryIs>    
                        </Polygon>  
                    </Placemark>
                  </kml>


2) Input values are of type lat/lon like -117.11111,33.2222

The usage is having an input lat/lon to find the boundary in which it is located and return the key (the internal ID).

The closest solution I found is using Google S2 geometry library:
and this library:
https://github.com/akhenakh/regionagogo

This implementation is using an in memory index with Segment Tree datastructure:
and BoltDB.

I need to create persistent  key/value database on the disk (it would be big like ~100GB) using
RocksDB key/value store with key Segment Tree datastructure.

It should get CellID from lat/lon and find the S2 interval where it resides.

How is the best way to implement it?







Tamás Gulácsi

unread,
Jan 14, 2018, 2:21:48 AM1/14/18
to golang-nuts
You want poligon bounded inclusion, don't you?
With segmented tree, you'll need to implement some other data structure that holds the slices (say, you slice the globe vertically and each slice holds a segment tree of the vertically sliced poligons),
I think

Constantine Vassilev

unread,
Jan 14, 2018, 1:50:16 PM1/14/18
to golang-nuts
Thanks for the suggestion and the link.

The question is how it will be searched. Also I liked the Google S2's Golang implementation.

From the documentation if the link I see:
--

Queries

Bounding-box and k-nearest-neighbors queries are supported.

Bounding-box queries require a search *Rect. It returns all objects that touch the search rectangle.

--

What I need is having location's lat/lon to find the polygon around it.

Also how the index would be created and stored as key/value.

Jesper Louis Andersen

unread,
Jan 15, 2018, 10:13:17 AM1/15/18
to Constantine Vassilev, golang-nuts
My intuition is that this screams S2 at the top of its lungs, provided that you can hook into the structure.

It is simply storing a radix tree (PATRICIA) of S2 cells and attaching information about box bounding to all of these. This yields a compressed trie-like structure that you can probably maintain in memory even for fairly large sets of bounding boxes. The path compression means that you need considerably fewer than the 30-cell lookup depth that would normally happen. If they haven't this built-in I don't know why, as it seems rather obvious to me, givien that you know the internal structure of S2[0].

The alternative ideas are just quad-tree'ing the whole world, but then you are getting close to an R-tree solution anyway. Another solution is to just brute-force everything on two hash-table maps. Uber did this in a blog post, but I wouldn't recommend that solution since it won't really scale. Finally, you can just stuff everything into PostGIS in Postgres, which contains many of these algorithms already. If you need to link to other relational data, you shouldn't rule out such a solution I think.

[0] If you've never looked into this, do it. It is awfully brilliantly designed. Hilbert curves and all.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages