How to efficiently compute bounds

47 views
Skip to first unread message

Milton

unread,
Jan 12, 2012, 12:16:46 PM1/12/12
to geodb
Hello again people

Not sure if this fits here (or at the Hatbox level), but I was
wondering how to ask GeoDB to efficiently compute the bounds of a
given set of objects. Something like this:
SELECT GetBounds(myGeomAttribute) FROM myTable WHERE [..condition..]

This is possible with Postgis, Oracle Spatial, SQLServer 2008, etc,
but I couldn't find out a way to do this efficiently using GeoDB/
Hatbox.

I did some looking around, and saw that:

- GeoTools' H2 DataStore implementation seems to compute the envelope
of each entry and merge them one by one. This sounds really slow,
since taking a look at ST_Envelope seems to imply that every entry's
geometry read into memory and decoded in order to compute each
envelope. Is that right?

- Looking at Hatbox's internal structure, it seems that every entry's
envelope is already computed and stored in its RTree Node structures.
So, I was thinking if one could implement an AggregateFunction (http://
www.h2database.com/html/grammar.html#create_aggregate) to put together
all of the entries envelopes. Unfortunately, I guess some
implementation changes would be needed in Hatbox in order to be able
to efficiently retrieve an Entry given its ID.

Well, I really hope there's an easier solution.. It does sound really
weird to me that a spatial extension cannot easily compute bounds! :P

Thanks,
Milton

PS: as a side note, the latest Hatbox version (1.0.b9) includes a new
method Proc.getDatasetBounds that unfortunately only dumbly computes
bounds for the entire dataset (uses the RTree's root Node's bounds
directly).

Justin Deoliveira

unread,
Jan 12, 2012, 3:15:05 PM1/12/12
to ge...@googlegroups.com
Hi Milton,

As I understand it the bounds on the root node of the tree are only approximate, they are not exact. So I have thought of adding an ST_EstimatedExtent function, like what postgis has for computing estimated bounds, but have yet to do so. That would only work for the entire bounds of the dataset though.

Another improvement indeed would be to create an aggregate function as you mention, although i am not sure how much time this will actually save since I would think the overhead of calling a function vs the overhead of deserializing a geometry to compute its bounds, or looking it in up in the hatbox tree, is negligible. I could be wrong though. 

I did experiment with a custom wkb format for storing geometries in which the envelope was actually stored in the first 32 bytes of the representation so that when accessing the bounding box instead of parsing the entire wkb representation one could just strip off those bytes directly. And surprisingly i didn't find much of a gain in most cases. It only really showed up with very large datasets with many huge geometries (10,000+ vertices). 

Anyways, patches and improvements very much welcome, anything you experiment with something that works please pass it along.

-Justin
--
Justin Deoliveira
Enterprise support for open source geospatial.

Milton

unread,
Jan 12, 2012, 3:47:31 PM1/12/12
to geodb
Hi Justin

Hmm, I was assuming that it was really costly to deserialize a
geometry.. I wouldn't think it would be on the same level as calling a
function, but if it is, then forget it! :)

Actually, I would ideally like to do what some really simple spatially
enabled DBs used to do, which was simply to have the bounds (min_x,
max_x, min_y, max_y) of the geometry on a table and then call
something like:
SELECT MIN(min_x), MAX(max_x), MIN(min_y), MAX(max_y) FROM
[geom_bounds_table]

I assume *that* would be quite fast. But I don't see how to do that
with Hatbox. Do you?

Anyway, thanks a lot for the inputs!
Milton

PS: for the time being, I guess I'll stick to the dumb slow method :P




On Jan 12, 6:15 pm, Justin Deoliveira <jdeol...@opengeo.org> wrote:
> Hi Milton,
>
> As I understand it the bounds on the root node of the tree are only
> approximate, they are not exact. So I have thought of adding an
> ST_EstimatedExtent function, like what postgis has for computing estimated
> bounds, but have yet to do so. That would only work for the entire bounds
> of the dataset though.
>
> Another improvement indeed would be to create an aggregate function as you
> mention, although i am not sure how much time this will actually save since
> I would think the overhead of calling a function vs the overhead of
> deserializing a geometry to compute its bounds, or looking it in up in the
> hatbox tree, is negligible. I could be wrong though.
>
> I did experiment with a custom wkb format for storing geometries in which
> the envelope was actually stored in the first 32 bytes of the
> representation so that when accessing the bounding box instead of parsing
> the entire wkb representation one could just strip off those bytes
> directly. And surprisingly i didn't find much of a gain in most cases. It
> only really showed up with very large datasets with many huge geometries
> (10,000+ vertices).
>
> Anyways, patches and improvements very much welcome, anything you
> experiment with something that works please pass it along.
>
> -Justin
>
> OpenGeo -http://opengeo.org
Reply all
Reply to author
Forward
0 new messages