Splitting cell low and hi ranges into sub-ranges?

55 views
Skip to first unread message

Ryan Pendergast

unread,
Jul 23, 2019, 12:08:37 PM7/23/19
to s2geometry-io
I'm trying to look up closest points of interest stored in AWS DynamoDB, given a lat/lng and a radius (in meters).  The points of interest are stored in DynamoDB, with a prefix of the int64 cellID hash, for ease of lookup (see hashKeyLength below).  I don't think my q is DynamoDB specific, but using it as an ex helps give context and clarity to my question (I hope).

I've implemented the algorithm to compute the covering cells of the bounding box made from pt&radius, however I'm a bit confused on how to compute the range of cell IDs needed to query DynamoDB.

A Java lib was created by AWS to do this query which has been ported to JS.  I want to do this in go, but will ref the JS lib for ease of understanding my q.

When generating the queries to dispatch to DynamoDB, this library goes through all the cells in a covering area, and splits the low and hi range of each cell into sub-ranges based on the hash key length.

I've gone through the logic quite a few times and I can't figure why trySplit() (splitting of cell range into multiple) is needed.

Can't I just take the lowest minimum range of all the cells, and the highest maximum range of all the cells, and construct a query that gets hash prefixes that are between this range?

For example, if my covering had 3 cells and I have a hashKeyLength (length of hash key prefix stored in DB) of 3:

cell 1:
 
- rangeMin:    123456789
 
- rangeMax    123999999


cell
2:
 
- rangeMin:    124000000
 
- rangeMax    124999999


cell
3:
 
- rangeMin:    125000000
 
- rangeMax    125678912


Can't I just query cell IDs between `123456789` and `125678912` (actually `123` and `125` since in the example I chose to store hash key length prefix of 3)?  

Why do I have to `trySplit()` `cell1`s range (`123456789-123999999`) into 3 separate ranges?

Is there a better/easier way to accomplish this? I'd like to use S2ClosestPointQuery but it has not been ported to the go s2 lib.

FWIW here is the Java impl of trySplit (JS lib is based on this).

Eric Engle

unread,
Aug 6, 2019, 5:38:30 PM8/6/19
to Ryan Pendergast, s2geometry-io
Hi Ryan,

The trySplit method isn't part of this mailing list's library, but if I understand correctly, the usual reason to do something like that is that given a bunch of query ranges, they may or may not be all near each other in database order.

If they're all near each other, you can describe them nicely as one range/prefix. But if that covers too much data (because one of the ranges is really far from the others) then it's better to make more requests that more narrowly focus on the interesting bits.

Kind regards,
Eric

--
You received this message because you are subscribed to the Google Groups "s2geometry-io" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s2geometry-i...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/s2geometry-io/2eb056db-9b3c-4a72-a8b4-74ed7a908655%40googlegroups.com.

Ryan Pendergast

unread,
Aug 7, 2019, 10:10:49 AM8/7/19
to s2geometry-io
Thanks Eric.  Understand `trySplit` is not part of S2, so appreciate your response.

Just to clarify:  from an S2 geometry perspective, if I get the covering cells (S2RegionCoverer) of a bounding rectangle (S2 Rect from point and radius) aren't all the covering cells "near/touching" each other by definition?

Said another way, when you get the covering of a rectangle, all the cell's in said covering are "near" each other correct?

Or are you saying a given cell ID in the covering could cover a huge area (for ex green on here), therefore breaking up the large range into multiple requests is necessary for performance?
To unsubscribe from this group and stop receiving emails from it, send an email to s2geom...@googlegroups.com.

Eric Engle

unread,
Aug 7, 2019, 2:34:11 PM8/7/19
to Ryan Pendergast, s2geometry-io
On Wed, Aug 7, 2019 at 7:10 AM Ryan Pendergast <ryan.pe...@gmail.com> wrote:
Thanks Eric.  Understand `trySplit` is not part of S2, so appreciate your response.

Just to clarify:  from an S2 geometry perspective, if I get the covering cells (S2RegionCoverer) of a bounding rectangle (S2 Rect from point and radius) aren't all the covering cells "near/touching" each other by definition?

Nope :-]

The cells are near each other, in a 2D spatial sense. But the sense of 'near' we must use for queries is the database's. Which is like a file: unavoidably 1D.

So, when you map a spatial position to a 1D position via the hilbert curve, there's a property you want from that mapping: if two points are spatially near, they should be 1D near.

The Hilbert curve we use places two spatially near points near in 1D space, but with some probability they'll be a little further than normal. A smaller chance, and they're further. And so on.

The most extreme case in the Hilbert curve we use is at the ends. The curve's continuous path starts and ends with adjacent cells, but the IDs of those cells are 2^60 apart.

The reason all this works well is that the Hilbert curve provides a good 1D ordering of 2D features, but because it's imperfect, the query system should cope.

Said another way, when you get the covering of a rectangle, all the cell's in said covering are "near" each other correct?

Consider the extreme case given above. Two very narrow range requests is better than querying the entire database.

Or are you saying a given cell ID in the covering could cover a huge area (for ex green on here), therefore breaking up the large range into multiple requests is necessary for performance?

The covering cells won't be larger than they need to be, given how you generated the covering. But of course a simple covering of an ocean would have large cells.

Or maybe the database is just incredibly detailed relative to the size of the query feature.

Regardless, if a query cell would retrieve too much data, you can refine the covering by substituting cells (or ranges) for subcells (or more but tighter ranges).

To unsubscribe from this group and stop receiving emails from it, send an email to s2geometry-i...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/s2geometry-io/a7facd38-8115-4f49-8233-bd5072709a9c%40googlegroups.com.

Ryan Pendergast

unread,
Aug 7, 2019, 3:07:16 PM8/7/19
to s2geometry-io
:) I'm obv a geo noob, so thanks for bearing with me.  I'm interested in this field, and think I'm just a small "ah-ha" moment away from fully grasping the concept in question here.

I'm with you until:
Regardless, if a query cell would retrieve too much data, you can refine the covering by substituting cells (or ranges) for subcells (or more but tighter ranges).

Specifically the "you can refine" piece.

I get that a cell at a zoom level, could contain a ton of my points of interest (stored in my DB).  However if I need to "find all points of interest within roughly X meters", and my covering returns 8 cells, don't I need to query for all points of interest that hash into the 8 cell ranges?  Since I don't know the points before I make the query, I'm not sure how I can refine to subcells (tighter ranges) before making said query.

Essentially I'm trying to create a `S2ClosestPointQuery` backed by a DB connected by network (not in memory/local disk).  I've tried looking for how `FindClosestPoints()` does the algorithm to refine, but unsuccessful as my c++ is not what it once was.  Not looking for a solution from you here, just giving you some context.

I want to be respectful of your time, so a "I've given you enough info in my prev. response to answer your question - go read more" is a perfectly fine reply.  

Thanks again.

Eric Engle

unread,
Sep 20, 2019, 12:26:42 PM9/20/19
to Ryan Pendergast, s2geometry-io
Well, this isn't a complete or correct approach but maybe it'll give you the intuition. Forget about cells, just think about the numeric hilbert range from 0 to 2^63-1. When you want to get all rows that intersect a range, start with that one range and check how many chunks of the database intersect the range. If it's too many, split the range in two and recursively try each half. At some point, it's not too many and you query the chunks. This approach ensures that each query you send has a reasonable result size, even if the total set of results is huge. That's what I mean by refining; you need some approach for figuring out how detailed a coverings to construct, based on what's in the database.

To unsubscribe from this group and stop receiving emails from it, send an email to s2geometry-i...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/s2geometry-io/5db799ac-d270-4ad8-adac-cf4811ec6b76%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages