Qi4j + Spatial Queries + Elastic Search

89 views
Skip to first unread message

Jiri Jetmar

unread,
Nov 17, 2013, 9:21:16 AM11/17/13
to qi4j...@googlegroups.com
Hello guys, 

we talked a while ago about spatial queries using Qi4j and ElasticSearch. I was in the meantime busy with some other things, but now I;m back on that topic and already started
hacking.  

@Paul - you made a proposal regarding the best approach and basically the three following options are available : 


1) Add support for GeoQueries to the Qi4j Query API ("native" Qi4j support)

2) Use native queries, ie. native ElasticSearch queries through the Qi4j Query API

The Qi4j Query API support native queries throug the QuerySpecification type in the Query DSL.
I'm not sure the ElasticSearch extension support using them but it should not be too hard to do.
  

3) Use the ElasticSearch Client API directly

You can get a handle to the ElasticSearch Client from the ElasticSearchSupport service type that has a `Client client()` method to access the Client instance used by ES Index/Query services.


My goal is to be at close as possible to the current Qi4j Query DSL so option 1.) would be the best. At the other side I fear the effort to make the approach generic enough so that it works 
for all the underlying repositories. 

Option 3.) is at least from my understanding too far away from what we call Query DSL. Therefore I would like to continue using the 2.) approach where the following query definition 
would be possible. 

 Query<City> query = this.module.currentUnitOfWork().newQuery( module.newQueryBuilder( City.class ).where( SpatialElasticSearchExpressions.point("41.12,-71.34") , "where to search - e.g. City.location()??" ));

This does not look that bad and can be extended easily, like "where (point, radius, unit)" or any other shape definition can be used. Also aspects like the corresponding geo-reference 
system e.g. WGS84 can be specified.

In general to add spatial queries using ElasticSearch means  two things : 

i.) add the "data" to the index
ii.) generate the corresponding JSON query "strings" that are used the express the location related aspects

In the next lines I will try to describe my current status and the issues I;m faced with so that we can discuss the best approach. 

i.) Indexing 

ElasticSearch is capable to work schema-less, but unfortunately this is not true for geo types. Regarding 
property "City.location()" is a geo_point. At least I was not able to find any "auto-indexing" on geo types. Therefore the definition what-is-a-geo-type is pretty static. 
I added this for testing to AbstractElasticSearchSupport.java, but this is definitely the wrong place. Looks like : 



            String mapping = XContentFactory.jsonBuilder().startObject().startObject("spatial_mapping")
                    .startObject("properties").startObject("location").field("type", "geo_point").field("lat_lon", true).endObject().endObject()
                    .endObject().endObject().string();


            client.admin().indices().prepareCreate( index ).
                    setIndex( index ).
                    setSettings( indexSettings ).
                    addMapping("spatial_mapping", mapping).
                    execute().
                    actionGet();
 
Means during the creation of the Index it is defined that the property location() is a  geo_spatial type. This does not make a lot of sense to me. Therefore it would be smart
to modify the index during adding of data also in terms of data (geo) types. This must be done only the first time, so e.g. ElasticSearchIndexer.java

                   switch( changedState.status() )
                    {
                        case REMOVED:
                            LOGGER.trace( "Removing Entity State from Index: {}", changedState );
                            remove( bulkBuilder, changedState.identity().identity() );
                            break;
                        case UPDATED:
                            LOGGER.trace( "Updating Entity State in Index: {}", changedState );
                            remove( bulkBuilder, changedState.identity().identity() );
                            String updatedJson = toJSON( changedState, newStates, uow );
                            LOGGER.trace( "Will index: {}", updatedJson );
                            index( bulkBuilder, changedState.identity().identity(), updatedJson );
                            break;
                        case NEW:
                            LOGGER.trace( "Creating Entity State in Index: {}", changedState );
                            String newJson = toJSON( changedState, newStates, uow );
                            LOGGER.trace( "Will index: {}", newJson );
                            // TODO Before adding data, modify the index - if spatial properties exist e.g. "if @Spatial annotation then.."
                            index(bulkBuilder, changedState.identity().identity(), newJson);
                            break;
                        case LOADED:

is updating for each NEW spatial type the index-mappings. 

Would this be a useful approach ? Any other ideas or advises ? 

ii.) Querying

geo related expressions, e.g. 

    "filtered" : {
       
"query" : {
           
"match_all" : {}
       
},
       
"filter" : {
           
"geo_distance" : {
               
"distance" : "12km",
               
"location : "40,-70 }
This means when one is executing a Query that is using the property City.location() it must be known that this is a spatial type. Therefore I;m thinking about  such a construct :

public interface City
    extends Nameable
{
    @Optional
    Property<String> country();

    @Optional
    Property<String> county();


    @Optional
    @Spatial
    Property<String> location();
}
 
Means a simple annotation that is saying - this is a spatial type. This can be also extended by setting e.g. the reference system, format, etc. The @Spatial annotation 
is also used during indexing to modify the index (to add mappings for spatial types, when @Spatial types are used). 

This property-name (City.location()) must also be then used by the ElasticSearchFinder. I;m not sure yet, how this property name is propagated until the ElasticSearchFinder. This is usually done using the Qi4j 
grammar API. So I guess in this case this should be done in SpatialElasticSearchExpressions ? 


Would be nice to get some feedback, advice and/or ideas about the best approach. 

Thank you guys!

Cheers, 
Jiri

Jiri Jetmar

unread,
Nov 17, 2013, 2:46:47 PM11/17/13
to Niclas Hedhman, qi4j...@googlegroups.com
Hi Niclas, 

thank you for your feedback. 

Actually I have a first & very fragile version running, that passed the following test : 

       final Query<City> spatial = this.module.currentUnitOfWork().newQuery( module.newQueryBuilder( City.class )
                                                                 .where( SpatialElasticSearchExpressions.search("location", 3.1475, 101.693333, 1000) ));

        City city = spatial.find();
        assertTrue(city.name().get().equals("Kuala Lumpur"));

This version is automatically adding the corresponding elastic search index mappings in case a property is annotated with @Spatial. The requirement
to manage the index-mappings in elastic search is a bit ugly and a smart solution must be found. Currently the index-mappings are done as soon as 
a property is annotated as spatial - again and again.


Cheers, 
Jiri



2013/11/17 Niclas Hedhman <nic...@hedhman.org>

A lot to assimilate... It will take a while.

Entity changes are propogated to the Indexer via StateChangeListeners, which are fed by EntityStoreUnitOfWork. StateChangeListeners are normally regular Service instances.

The @Spatial (good idea) can be picked up from the Qi4j API via PropertyDescriptor.metaInfo() method.
But I think you should also consider making a distinct type of Location, instead of using a flimsy String.

The rest is pretty much above my head as I have very little experience with GeoSpatial data.

But I think I can help in general design to fit the Qi4j patterns and approaches....


Cheers
Niclas



--
You received this message because you are subscribed to the Google Groups "qi4j-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qi4j-dev+u...@googlegroups.com.
To post to this group, send email to qi4j...@googlegroups.com.
Visit this group at http://groups.google.com/group/qi4j-dev.
For more options, visit https://groups.google.com/groups/opt_out.



--
Niclas Hedhman, Software Developer
河南南路555弄15号1901室。
http://www.qi4j.org - New Energy for Java

I live here; http://tinyurl.com/3xugrbk
I work here; http://tinyurl.com/6a2pl4j
I relax here; http://tinyurl.com/2cgsug

Niclas Hedhman

unread,
Nov 18, 2013, 10:34:35 AM11/18/13
to Jiri Jetmar, qi4j...@googlegroups.com
This is ugly;

 SpatialElasticSearchExpressions.search("location", 

May I suggest a completely different approach;

Use the type safety pattern that is present in QueryExpressions already, and build in a proper GeoLocation property type, and extend the Specifications of query (sub)expressions available.

So, 

public interface SpatialQueryExpressions {

    WithInRadiusSpecification isWithInRadius( Property<GeoLocation> location, double miles )

    // Other useful query sub-expressions
}

Then integrate that into the Query generation engine.

We will need to introduce a "supports( Class<? extends QuerySpecification> type );" on indexing engine, as more special cases are likely to surface.

This is probably a much steeper learning curve for you, but it is magnitude better experience for the user.

And I think it would be possible to support GeoSpatial queries on databases that doesn't support them, by the "in-memory" scan that is possible in Qi4j's query engine.


I think Paul has a lot of experience in indexing/querying and may have better opinion in this space.


Niclas

Jiri Jetmar

unread,
Nov 18, 2013, 11:12:11 AM11/18/13
to qi4j...@googlegroups.com
Hi Niclas, 

Thanks for your feedback. Pls find my comments below : 


2013/11/18 Niclas Hedhman <nic...@hedhman.org>

This is ugly;

 SpatialElasticSearchExpressions.search("location", 

agree, this is also not the "final" solution, just for "testing". The main issue is the handling of geo_types on elastic search as it is required to say "City.location() is a geo_type" before any kind of spatial query can be done on this property. 
 

May I suggest a completely different approach;

Use the type safety pattern that is present in QueryExpressions already, and build in a proper GeoLocation property type, and extend the Specifications of query (sub)expressions available.

So, 

public interface SpatialQueryExpressions {

    WithInRadiusSpecification isWithInRadius( Property<GeoLocation> location, double miles )

    // Other useful query sub-expressions
}

Then integrate that into the Query generation engine.

We will need to introduce a "supports( Class<? extends QuerySpecification> type );" on indexing engine, as more special cases are likely to surface.

aha, so you are arguing to make the solution more generic, not only for elastic search ? This would be the option "3.)".


This is probably a much steeper learning curve for you, but it is magnitude better experience for the user.

Ok, see the point. Maybe Paul can give me some support so that I can work on that.  
 
And I think it would be possible to support GeoSpatial queries on databases that doesn't support them, by the "in-memory" scan that is possible in Qi4j's query engine.

 
ElasticSearch also does not add the spatial feature in a "native" way but they are using 
the Spatial4J and JTS libraries. Maybe we can do something similar in Qi4j.  


I think Paul has a lot of experience in indexing/querying and may have better opinion in this space.


Niclas

Thank you. 

Cheers, 
Jiri 

Jiri Jetmar

unread,
Nov 18, 2013, 12:18:12 PM11/18/13
to qi4j...@googlegroups.com
Hello Niclas ,

thank you for the valuable input. 

I think I got the point you are trying to explain regarding the Query API. Pls let me check the code to get an idea how this can be done. 

Cheers, 
Jiri


2013/11/18 Niclas Hedhman <nic...@hedhman.org>



On Tue, Nov 19, 2013 at 12:12 AM, Jiri Jetmar <juergen...@gmail.com> wrote:
> agree, this is also not the "final" solution, just for "testing". The main issue is the handling of geo_types on elastic search as it is required to say "City.location() is a geo_type" before any kind of spatial query can be done on this property. 

Yes, but that can be on Property<GeoLocation>, where GeoLocation is a specific type, just like String, List and DateTime, that Qi4j Indexer is expected to understand. The @Spatial is not really needed.

I would do something like this;

public interface House
{
    Property<Address> address();
    Property<GeoLocation> location();
    Property<BigDecimal> price();
}

QueryBuilder qb = ...;
GeoLocation location = geoLocator.find( "Shanghai,China");
GeoRadius radius = new GeoRadius( 1000, GeoDistance.km);
GeoArea area = new GeoRadius( location, radius );
House template = templateFor( House.class );
qb = qb.where( isWithInRadius( template.location(), area ) );
Query<House> q = qb.newQuery();

>> And I think it would be possible to support GeoSpatial queries on databases that doesn't support them, by the "in-memory" scan that is possible in Qi4j's query engine.

> ElasticSearch also does not add the spatial feature in a "native" way but they are using
> the Spatial4J and JTS libraries. Maybe we can do something similar in Qi4j.  

As I said, I know too little on the subject itself. But remember that the the Indexer is in charge of the index store and should be able to set up each property to whatever the requirement of the store is.


Cheers

Paul Merlin

unread,
Nov 20, 2013, 5:28:58 AM11/20/13
to qi4j...@googlegroups.com
Guys,

Sorry about the delay, was quite busy theses days.

Niclas, I'm in par with your aproach regarding the API.

Eventually the question will be: do we support spatial types
in core? This would allow us to add the SpatialQueryExpressions
in org.qi4j.api.query. Moreover, if spatial types are supported
(like JodaTime types are), means we could go further and support
GeoJSON for (de)serialization. Ideally core should not depend on
Spatial4J or JTS.

In the meantime, this could be added to the ElasticSearch
Index/Query only so we can contemplate it without impacting core
for now.

@Jiri: if you have more questions, don't hesitate!

Cheers

/Paul

Paul Merlin

unread,
Nov 23, 2013, 9:55:05 AM11/23/13
to qi4j...@googlegroups.com
Niclas Hedhman a écrit :
> As I have mentioned several times; I have very little Geo experience,
> and can't provide a reasonable "balanced view" whether it makes sense
> or not.
Same thing here, very little geo-xp.
Maybe one of the Polymap3 authors could chime in?

Jiri Jetmar

unread,
Nov 25, 2013, 6:46:30 AM11/25/13
to Paul Merlin, qi4j...@googlegroups.com
Hi Guys, 

Thank you for the feedback. In see here in general the following aspects: 

1.) The Query API (DSL) and the Query SPI
    
  As Niclas suggested e.g. SpatialQueryExpressions.java that contains the spatial DSL definition like in isWithInRadius and others.

"Then integrate that into the Query generation engine.

We will need to introduce a "supports( Class<? extends QuerySpecification> type );" on indexing engine, as more special cases are likely to surface.
"
Here I;m not sure what you mean. Are you talking about the org.qi4j.api.query.grammar.* or something else ?
In general when this approach is used the @Spatial annotation is not required. 


2.) The implementation of spatial features for the concrete repository  (Indexer and Finder)

Here it is required to translate the Qi4j Indexer and Finder to the native formats and further to consider repository related aspects.  For Elastic Search it is required to "map" spatial properties, like City.location() == GeoType, as Elastic Search is not schema-less for spatial types. I;m thinking to use here the following approach :

i.)  During bootstrap read out all spatial mappings and store it is a transient HashMap. 
ii.) Whenever a spatial type has to be indexed check whether the mappings is known e.g. HashMap.containsKey(property). Of not, add the spatial type to ElasticSearch and to the transient HashMap. This spatial type mapping has to be done only the first/one times for a type - e.g. for City.location(). Not sure here what should happen with the spatial mapping when e.g. City.location() is removed from the Domain Model. Who is removing the spatial mapping from Elastic Search ?

Further what to do when there is a spatial query using a repository that does not support spatial types/search ? Throw a Exception - SpatialQueryNotSupportedException ?

Thank you. 

Cheers, 
Jiri



  


2013/11/20 Paul Merlin <pa...@nosphere.org>
--
You received this message because you are subscribed to the Google Groups "qi4j-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to qi4j-dev+unsubscribe@googlegroups.com.

Jiri Jetmar

unread,
Nov 25, 2013, 8:40:45 AM11/25/13
to Paul Merlin, qi4j...@googlegroups.com
Hi Paul, 


You have have to say : 

{
    "location" : {           // City.location()
        "properties" : {
            "location" : {
                "type" : "geo_point"
            }
        }
    }
}

BEFORE any spatial query for templateFor(City.class).location() can be done.  Means that during Indexing one has to say that a property abc() has to be indexed as a geo_point. There is also a geo_shape_type - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-geo-shape-type.html. That is the reason why I was talking about a geo (or spatial) types. 

Without mapping no spatial query can be executed - a exception "a City.location() is not a geo type" is thrown. 

I do not find any way how to do "auto" spatial mappping.  Therefore I added to the ElasticSearch Indexer some code that updates the index-mappings whenever a spatial type (e.g. geo_point) is added. Because it does not make sense to update to index each time when e.g. City.location() is added, I implemented a HashMap based caching for the mappings. 

Thank you. 

Cheers, 
Jiri




2013/11/25 Paul Merlin <pa...@nosphere.org>
Jiri,

I have a pretty simple question.

What would the spatial types be?
Coming from an existing library? which one?

Cheers

/Paul


Paul Merlin

unread,
Nov 25, 2013, 8:59:42 AM11/25/13
to qi4j...@googlegroups.com
I am wondering about the Java spatial types to use as Properties not how
they could be indexed in ElasticSearch.

Like in Property<ASpatialType> location();

Are you thinking of Spatial4J, JTS? Something else?

JTS is licenced as LGPL, Spatial4J as ASL.

I don't know at all if they are in par wrt their features but licensing
is important to us and in this case I guess we'd prefer ASL.

Cheers

/Paul

Jiri Jetmar

unread,
Nov 25, 2013, 9:34:15 AM11/25/13
to Paul Merlin, qi4j...@googlegroups.com
Just a kind of orientation : 


So we can argue there are 1d,2d and 3d spatial types : 

I would define the basic spatial types as :

- Point (1d)
- MultiPoint  (2d and 3d), e.g. for the definition of a circle or a sphere 
- Polygon (2d and 3d), for everything else. 

I think using those types every others can be derived, like a rectangle, etc.

Cheers, 
Jiri 



2013/11/25 Jiri Jetmar <juergen...@gmail.com>
Aha, well for now I called it  

Property<String> location();

.. and used the @Spatial annotation to say "this is a spatial type". 
This was a kind of Proof-of-Concept. For the final implementation I agree with you and Niclas to use a dedicated Qi4j spatial type. 

I do not think that a 3th-party library is required. In fact not a lot of functionally is required, like the semantics definition of a shape, point, some definitions of reference systems and few other things, but that's it.

Those libraries are mentioned because Niclas was thinking about to add spatial support even for those repositories that are natively not supporting spatial queries. But this is a different discussion. As I understood ElasticSearch uses this two libraries to add spatial support. 

Cheers, 
jj


 


2013/11/25 Paul Merlin <pa...@nosphere.org>


Cheers

/Paul

Paul Merlin

unread,
Jan 27, 2014, 10:20:27 AM1/27/14
to Jiri Jetmar, qi4j...@googlegroups.com
Jiri,

If you still are interested in contributing Spatial state and queries
support,
I'll be happy to review commits and help you achieve this.

With recent changes I've made to the way serialization is extensible, we can
now think about adding such support.

Here are some things to start with:
- define basic spatial types that could be used in Properties
- define how they should be serialized
- define query grammar for spacial queries
- implements spacial types serialization
- add support to entity stores not using JSON serialization
- implements support in index/query engines

You can take a look at how I added Money/BigMoney support recently to see
which part of the code is of interest.

Work on this should be done either in a fork or in a feature branch.

Let me know.

/Paul

Jiri Jetmar

unread,
Jan 27, 2014, 2:01:23 PM1/27/14
to Paul Merlin, qi4j...@googlegroups.com
Hi Paul, 

thanks for offering support. Ok, I will take a look how you implemented the Money/BigMoney feature. 
Will start to work on this topic at the beginning of the next week. 

Cheers,
jj


2014-01-27 Paul Merlin <pa...@nosphere.org>

Paul Merlin

unread,
Jan 31, 2014, 12:13:20 PM1/31/14
to qi4j...@googlegroups.com
Forward to the list:

Niclas Hedhman a écrit :
> My 2 jiao;
>
> Very good; http://en.wikipedia.org/wiki/Spatial_query, although
> "equals" is probably a bit weird, and should have a 'radius' component
> to it.
>
> Also, I found; http://en.wikipedia.org/wiki/GeoJSON
>
>
> Cheers
> Niclas
Reply all
Reply to author
Forward
0 new messages