Hello guys,
we talked a while ago about spatial queries using Qi4j and ElasticSearch. I was in the meantime busy with some other things, but now I;m back on that topic and already started
hacking.
@Paul - you made a proposal regarding the best approach and basically the three following options are available :
1) Add support for GeoQueries to the Qi4j Query API ("native" Qi4j support)
2) Use native queries, ie. native ElasticSearch queries through the Qi4j Query API
The Qi4j Query API support native queries throug the QuerySpecification type in the Query DSL.
I'm not sure the ElasticSearch extension support using them but it should not be too hard to do.
3) Use the ElasticSearch Client API directly
You can get a handle to the ElasticSearch Client from the ElasticSearchSupport service type that has a `Client client()` method to access the Client instance used by ES Index/Query services.
My goal is to be at close as possible to the current Qi4j Query DSL so option 1.) would be the best. At the other side I fear the effort to make the approach generic enough so that it works
for all the underlying repositories.
Option 3.) is at least from my understanding too far away from what we call Query DSL. Therefore I would like to continue using the 2.) approach where the following query definition
would be possible.
Query<City> query = this.module.currentUnitOfWork().newQuery( module.newQueryBuilder( City.class ).where( SpatialElasticSearchExpressions.point("41.12,-71.34") , "where to search - e.g. City.location()??" ));
This does not look that bad and can be extended easily, like "where (point, radius, unit)" or any other shape definition can be used. Also aspects like the corresponding geo-reference
system e.g. WGS84 can be specified.
In general to add spatial queries using ElasticSearch means two things :
i.) add the "data" to the index
ii.) generate the corresponding JSON query "strings" that are used the express the location related aspects
In the next lines I will try to describe my current status and the issues I;m faced with so that we can discuss the best approach.
i.) Indexing
ElasticSearch is capable to work schema-less, but unfortunately this is not true for geo types. Regarding
property "City.location()" is a geo_point. At least I was not able to find any "auto-indexing" on geo types. Therefore the definition what-is-a-geo-type is pretty static.
I added this for testing to AbstractElasticSearchSupport.java, but this is definitely the wrong place. Looks like :
String mapping = XContentFactory.jsonBuilder().startObject().startObject("spatial_mapping")
.startObject("properties").startObject("location").field("type", "geo_point").field("lat_lon", true).endObject().endObject()
.endObject().endObject().string();
client.admin().indices().prepareCreate( index ).
setIndex( index ).
setSettings( indexSettings ).
addMapping("spatial_mapping", mapping).
execute().
actionGet();
Means during the creation of the Index it is defined that the property location() is a geo_spatial type. This does not make a lot of sense to me. Therefore it would be smart
to modify the index during adding of data also in terms of data (geo) types. This must be done only the first time, so e.g. ElasticSearchIndexer.java
switch( changedState.status() )
{
case REMOVED:
LOGGER.trace( "Removing Entity State from Index: {}", changedState );
remove( bulkBuilder, changedState.identity().identity() );
break;
case UPDATED:
LOGGER.trace( "Updating Entity State in Index: {}", changedState );
remove( bulkBuilder, changedState.identity().identity() );
String updatedJson = toJSON( changedState, newStates, uow );
LOGGER.trace( "Will index: {}", updatedJson );
index( bulkBuilder, changedState.identity().identity(), updatedJson );
break;
case NEW:
LOGGER.trace( "Creating Entity State in Index: {}", changedState );
String newJson = toJSON( changedState, newStates, uow );
LOGGER.trace( "Will index: {}", newJson );
// TODO Before adding data, modify the index - if spatial properties exist e.g. "if @Spatial annotation then.."
index(bulkBuilder, changedState.identity().identity(), newJson);
break;
case LOADED:
is updating for each NEW spatial type the index-mappings.
Would this be a useful approach ? Any other ideas or advises ?
ii.) Querying
geo related expressions, e.g.
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "12km",
"location : "40,-70 }
This means when one is executing a Query that is using the property City.location() it must be known that this is a spatial type. Therefore I;m thinking about such a construct :
public interface City
extends Nameable
{
@Optional
Property<String> country();
@Optional
Property<String> county();
@Optional
@Spatial
Property<String> location();
}
Means a simple annotation that is saying - this is a spatial type. This can be also extended by setting e.g. the reference system, format, etc. The @Spatial annotation
is also used during indexing to modify the index (to add mappings for spatial types, when @Spatial types are used).
This property-name (City.location()) must also be then used by the ElasticSearchFinder. I;m not sure yet, how this property name is propagated until the ElasticSearchFinder. This is usually done using the Qi4j
grammar API. So I guess in this case this should be done in SpatialElasticSearchExpressions ?
Would be nice to get some feedback, advice and/or ideas about the best approach.
Thank you guys!
Cheers,
Jiri