Qi4j Spatial Queries

12 views

Skip to first unread message

Jiri Jetmar

unread,

Jan 17, 2015, 5:04:16 PM1/17/15

to qi4j...@googlegroups.com, jil...@jillesvangurp.com

Hi Gang,

we talked a while ago about adding spatial queries to Qi4j. I hacked on this topic a bit and will trigger a github pull request asap.

Let me summarize in the next lines the most important aspects and design decisions.

I.) General

The main goals was from my side :

- No 3th party dependencies in the Qi4j Core

- Full integration in to the Query DSL

- Practical and easy to understand solution (e.g. no 3D+ support)

II.) Architecture/Design

The implementation has three main blocks :

1.) Qi4j Core (Geometrical Types and Query DSL)

2.) Indexer & Finder implementation (for now ElasticSearch is supported)

3.) A spatial library for converting data types (GeoJSON) and transformation of spatial projections (Grid Coordinate Systems).

I oriented my self on the DE-9IM Model (http://en.wikipedia.org/wiki/DE-9IM) that describes the

spatial relation of two geometries to each other. This relations can be seen as geometrical functions -

like within, equals, distjoint, touches, contains, .. The hole DE-9IM Model specifies in total a matrix

of about 512 geometrical functions. I implemented a indexer/finder just for a very small subset :

- Within

- Disjoint

- Intersects

From my experience this are the most common geometrical functions that are used. They are definitely

domains where much more functions are needed, like CAD, but then most likely a kind of expert system is needed.

The geometries are added to the Qi4j Core are derived from the GeoJSON specification - TPoint, TMultiPoint, TLineString, TMulti-

LineString, TPolygon, TMultiPolygon, TFeature and TFeatureCollection. The "T" prefix was added to have a kind

indication that this is a Qi4j "T"ype as e.g. ElasticSearch defines a number of geometrical types, Spatial4j as well, ..

IV.) Indexer & Finder

For now I added the spatial functions to ElasticSearch. ES is using the so-called quad trees or geohash for spatial indexing. I;m using

the quad tree Indexer. Here a good description : http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves.

Quad tree is in general a very simple indexing method. Each level of the tree has exactly 4 nodes. Within one quadratic node all geometries

(points, ...) has the same spatial position). The required precision/resolution is expressed by the used level. Fortunately one does not need

to define the level directly but it is possible to say "1m" and ES is calculating the required number of levels.

On the implementation itself - well, ElasticSearch hates me, really ! :-)

There are a number of difficulties to be solved to convince ES to work as required. This are in short :

- Spatial mappings in general. Before ANY geometrical type can be indexed it is required to map it accordingly. Means one has to define - this is a

point, this is a line, using indexing method X, with precision Y.

- Spatial mappings for points. ES offers two different indexing method for points - "geoshape" or "gepoint". When a point is indexed as a "geoshape"

then sortings by e.g. distance is supported. When "geoshape" is used then NO sorting filter is available. This has to be considered for the "orderBy"

segments. Further when "gepoint" indexing for points is used then NO disjoint or intersects functions are supported.

- Spatial mapping type names. It is required to use "deep" names, like "city.location", so it was needed to add a way to figure out that "deep" name of a property

Ok, in general this mapping thing took me a while to make it running. Because of the possible supported/unsupported combinations I wrote a kind of matrix

that describes the supported/unsupported combinations. I;m expecting in future some movement on the ES side in terms of supported spatial functions, so a

kind of general "overview"make sense. One can implement the toString() method to get a kind of pretty-printed version of this matrix. This matrix looks for now like :

// ST_Within

supports(enable(ST_WithinSpecification.class), propertyOf(AnyGeometry), filterOf(TPoint.class, TPolygon.class), enable(OrderBy), SpatialConfiguration.INDEXING_METHOD.GEO_POINT);

supports(enable(ST_WithinSpecification.class), propertyOf(AnyGeometry), filterOf(TPoint.class, TPolygon.class), disable(OrderBy), SpatialConfiguration.INDEXING_METHOD.GEO_SHAPE);

// ST_Disjoint

supports(disable(ST_DisjointSpecification.class), propertyOf(AnyGeometry), filterOf(TPoint.class, TPolygon.class), enable(OrderBy), SpatialConfiguration.INDEXING_METHOD.GEO_POINT);

supports(enable(ST_DisjointSpecification.class), propertyOf(AnyGeometry), filterOf(TPoint.class, TPolygon.class), disable(OrderBy), SpatialConfiguration.INDEXING_METHOD.GEO_SHAPE);

// ST_Intersects

supports(disable(ST_IntersectsSpecification.class), propertyOf(AnyGeometry), filterOf(TPoint.class, TPolygon.class), enable(OrderBy), SpatialConfiguration.INDEXING_METHOD.GEO_POINT);

supports(enable(ST_IntersectsSpecification.class), propertyOf(AnyGeometry), filterOf(TPoint.class, TPolygon.class), disable(OrderBy), SpatialConfiguration.INDEXING_METHOD.GEO_SHAPE);

On the ST_Within case : Enable ST_With Spec on arbitrary geometries (point, line, polygon,..) and allow filter geometries of point and polygon, enable orderBy sortings when indexing method is GEOPOINT.

Note that this orderBy constraints are only valid for geometries in the orderBy statements, like e.g. sorting by distance. OrderBy expression without spatial types are not considered by this matrix (means supported).

Another thing is the fact that ES is just supporting the "GPS" aka EPSG 4326 projection. Means when one is using e.g. the Swiss Grid then ES is not able to index it. Therefore I added the possibility

to let convert the projection automatically while be able to control the conversion precision for the Indexer and Finder. This feature is optional and can be switched on/off.

The spatial support is implemented in general as a "add-on/extension" that can be switched on/off and the whole implementation is done as modular as possible.

V.) Hands-on

To create a point in the default projection :

TPoint point1 = TPoint(module).lat(48.13905780942574).lon(11.57958984375).geometry();

To create a point in a arbitrary (supported) projection :

TPoint point2 = TPoint(module).y(1286436.5975464052).x(2389280.7514562616).geometry("EPSG:27572")

To express a query that is searching for all points within a given area (point and radius, unit, default projection) :

Query<A> query = unitOfWork.newQuery(

.where(

ST_Within

(

templateFor(A.class).point(),

TPoint(module).y(48.13905780941111).x(11.57958981111).geometry(),

10, TUnit.METER

)

));

ATTENTION : Depending on the point indexing method (geopoint, geoshape) different filters are used. In case of geopoint the geo_distance_filter

(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-distance-filter.html) is used and sorting filters are supported.

When geoshape indexing is used then a "internal" geometry of type circle is created and then "polygonized" to a polygon with n-points (360 for now)

and the geo_polygon_filter (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-polygon-filter.html) is used.

Means that this is just an approximation !

That expression here is cool :

Query<A> query = unitOfWork.newQuery(

.where(and(

ST_Disjoint

(

templateFor(A.class).point(),

TPoint(module).y(2389280.7514562616).x(1286436.5975464052).geometry("EPSG:27572"),

10, TUnit.METER

ST_Within

(

templateFor(A.class).point(),

TPoint(module).y(48.13905780942574).x(11.57958984375).geometry(),

100,TUnit.KILOMETER

)

));

The above expression combines within and disjoint expressions and uses two different projections. The "native" EPSG:4326 and EPSG:27572. The latter

is automatically converted to EPSG:4326 while making sure that the conversion error does not exceed the defined error. In this case this are about 2 meters.

VI.) Status & Limitations

Well, it works.. Is this valid, stable, robust ? Honestly I do not know.. I;m going to eat my own dog food in the next weeks, means I;m going to use the

spatial implementation for my productive system. However it looks promising.

Limitations for now are :

- Just a very limited spatial functions (Within, Intersects, Disjoint)

- Indexing of polygons - orientation of vertex order of coordinates has to be added. Required for polygons that cross the dateline.

- Potentially unsupported Query DSL expressions can be defined. I tried to make it as strict as possible, but who knows..

- Exception handling - I think this topic needs some more love..

- Some for documentation/javadoc..

- There are bugs in the code. Lets try to find & fix them..

VII.) Benchmarks/Performance

Benchmarking is hard, really hard. Lot of things has to be considered. Therefore my benchmark setup reflects a scenario that is close to my

production needs. This is a large number of relative simple geometrical objects. Further I;m doing two different benchmarks :

1.) Generation of 100M spatial points that are normally distributed on the world sphere

2.) Importing the whole Open Street Map "Planet.osm" and performing a number of different queries

My overall setup is :

- 3x CoreOS boxes (Xeon, 32 GB RAM, 9 TB disk each)

- Mesos/Marathon/Docker deployment infrastructure

- EntityIndex == ElasticSearch, EntityStore == RIAK

For the 1.) benchmark I can already say that :

- Indexing speed is about 5k/second

- Spatial queries returns back within 20-30ms on 100M records. No real difference between geopoint or geoshape indexing

- Limiting factor is definitely RIAK, need to add some more boxes. ElasticSearch does not care about this load

Interesting is here to play with indexing precision (number of quadtree levels).. Will try it in the next days.

For 2.)

This tests requires some pre-processing. The OSM format has to be first converted to GeoJSON.

I;m using the https://github.com/jillesvangurp/osm2geojson converter (Jilles thanks !) to create the OSM geojson files.

For the whole "planet.osm" the transformation takes about 10 hours in my box.

The result is a GeoJSON file that follows the GeoJSON specification. A line is e.g. defined as :

LineString lineString = geoJsonMapper.readValue("{\"type\":\"LineString\",\"coordinates\":[[100.0,0.0],[101.0,1.0]]}",

LineString.class);

The GeoJSON line object can then be converted to a TLineString and directly stored in domain model.

TLineString lineString = TConversions.Convert(module).from(lineString).toTGeometry();

I;m planing to perform some detailed JMH based (http://openjdk.java.net/projects/code-tools/jmh/) benchmarks to answer

questions like how many concurrent threads are optimal for a given setup, what is the average spatial query

response time, memory usage, and, and, and.. Will keep you posted about that guys.

VIII.) Next Steps

There are some ugly corners, e.g.

.where(

ST_Within

(

templateFor(A.class).point(), // <-- Here, how to figure out the composite type of point() ?

TPoint(module).y(1286436.5975464052).x(2389280.7514562616).geometry("EPSG:27572"),

10, TUnit.METER

)

));

.. I;m using something like

Type typeOfProperty = GenericPropertyInfo.toPropertyType(Classes.typeOf(propertyFunction.accessor()));

Class classOfProperty = Class.forName(typeOfProperty.getTypeName());

Further it would be interesting to add more spatial features backed by e.g. Postgis for those cases where

more spatial functions are required.

Guys lets take a look on the code and let me know where it can be improved. Further any cool ideas are highly welcome !

Thank you.

Cheers,

Jiri

Niclas Hedhman

unread,

Jan 18, 2015, 12:23:14 AM1/18/15

to Jiri Jetmar, qi4j...@googlegroups.com, jil...@jillesvangurp.com

This is a massive PR, and instead of pulling it into 'develop', I will create a feature branch first and see if I can provide useful feedback.

There are "internal" packages in api/ that stands out like "should not be there", and I will try to assist in either breaking that out to runtime or figure out whether it is api after all.

Thanks for the contribution.

--
You received this message because you are subscribed to the Google Groups "qi4j-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qi4j-dev+u...@googlegroups.com.
To post to this group, send email to qi4j...@googlegroups.com.
Visit this group at http://groups.google.com/group/qi4j-dev.
For more options, visit https://groups.google.com/d/optout.