(Geo) JSON handling in Qi4j

12 views
Skip to first unread message

Jiri Jetmar

unread,
Feb 8, 2014, 5:58:09 AM2/8/14
to qi4j...@googlegroups.com
Hi Gang, 

I have several questions regarding the (de-)serializazion of jsons in Qi4j. 

1.) 
Given is the following GeoJSON


{ "type": "FeatureCollection",
    "features": [
      { "type": "Feature",
        "geometry": {
        "type": "Point", 
        "coordinates": [102.0, 0.5]},
        "properties": {"prop0": "value0"}
        },
      { "type": "Feature",
        "geometry": {
          "type": "LineString",
          "coordinates": [
            [102.0, 0.0], [103.0, 1.0], [104.0, 0.0], [105.0, 1.0]
            ]
          },
        "properties": {
          "prop0": "value0",
          "prop1": 0.0
          }
        },
      { "type": "Feature",
         "geometry": {
           "type": "Polygon",
           "coordinates": [
             [ [100.0, 0.0], [101.0, 0.0], [101.0, 1.0],
               [100.0, 1.0], [100.0, 0.0] ]
             ]
         },
         "properties": {
           "prop0": "value0",
           "prop1": {"this": "that"}
           }
         }
       ]
     }



It is copied from the GeoJSON spec at http://geojson.org/geojson-spec.html.

The FeatureCollection is a array of Features that contains one of the defined Geometries. This 
are Point, LineString, Polygon, etc. To parse those simple types is not a problem, but the type
Feature can be a Point, LineString, etc.. 

I defined the Feature like this : 

public interface TGeomFeature extends TGeom {

    Property<TGeometry> geometry();

    Property<Map<String, String>> properties();

}

TGeometry is defined as 

public interface TGeometry extends TGeom {

    @Optional
    Property<String> type();
}

and is common for TGeomPoint, TGeomPolygon, etc.  Problem is here that Qi4j does not know what  
kind of object Property<TGeometry> geometry() is. The "_type" is not availalbe as this GeoJSON
is defined/created "outside" of Qi4j. 

2.)  

This GeoJson files tent to be really huge. Huge means about 300 GB for a GeoJson Europa export
from openstreetmaps. I;m not exactly sure how the Qi4j deserializer works, but it looks like that
the whole thing has to be first loaded to memory and then all the Objects are generated. Here a kind 
of iterating approach would help a lot. 

Thank you. 

Cheers, 
jj

Paul Merlin

unread,
Feb 8, 2014, 8:37:43 AM2/8/14
to Jiri Jetmar, qi4j...@googlegroups.com
Hey Jiri,

Jiri Jetmar a écrit :
The only way to infer the TGeometry actual ValueType is to use the "type" field from the GeoJSON payload. Something a ComplexDeserializer for TGeometry registered in ValueDeserializerAdapter could do by creating the explicit ValueType from the "type" field value and then delegate to the method that deserialize ValueComposites.

For now registering custom (De)Serializers can only be done in core/spi or by subclassing a concrete (De)Serializer, there's no API for that yet. See https://ops4j1.jira.com/browse/QI-410



2.)  

This GeoJson files tent to be really huge. Huge means about 300 GB for a GeoJson Europa export
from openstreetmaps. I;m not exactly sure how the Qi4j deserializer works, but it looks like that
the whole thing has to be first loaded to memory and then all the Objects are generated. Here a kind 
of iterating approach would help a lot.

ValueDeserializerAdapter mixes pull-parsing and tree-parsing. It always starts with pull-parsing. It then locally switches to tree-parsing when encountering a ValueComposite or an unknown type. This is to get a hand on the possibly present "_type" field to eventually override the CompositeValueType used for parsing and ValueBuilder. Note that "_type" could be the last element of the JSON object holding a ValueComposite state and by so pull-parsing would be impossible.

In other words, if your JSON payload contains a gigantic array of ValueComposites (or a gigantic object that contains ValueComposites), it will pull-parse the array (or object), reading ValueComposites JSON nodes for tree-parsing in an iterating manner. Therefore, it should handle big JSON payloads with collections of ValueComposites well.

This also means that, if your JSON payload represent a *single* ValueComposite then yes the JSON payload is entirely loaded into memory for tree-parsing.

If you see a different behaviour it's probably a bug.

There's room for improvement in all this. We could add an option to ValueDeserializerAdapter so it simply do not look for the "_type" field and continue to use pull-parsing on the whole payload. See https://ops4j1.jira.com/browse/QI-409 and comment if you need that.

/Paul


Reply all
Reply to author
Forward
0 new messages