spatialDimensions specification for TSV file using lat/lon columns

198 views
Skip to first unread message

Terry Senovich

unread,
Aug 18, 2016, 6:21:04 PM8/18/16
to Druid User

Assuming an example TSV file format as follows:

columnA|columnB|latitude|longitude

And the columns and dimensions are specified as:

      "columns" : [ "columnA",        
                   
"columnB",
                   
"latitude",
                   
"longitude"
       
],
     
"delimiter":"|",
     
"dimensionsSpec" : {
       
"dimensions" : [
                       
"columnA",            
                       
"columnB",
                       
"latitude",
                       
"longitude"
       
],



The documentation gives the following example assuming a JSON label with an array of two values

        "spatialDimensions" : [
         
{
           
"dimName": "coorindates",
           
"dims": ["latitude", "longitude"]
         
}
       
]



The documentation says dimName is required and is "The name of the spatial dimension. A spatial dimension may be constructed from multiple other dimensions or it may already exist as part of an event. If a spatial dimension already exists, it must be an array of coordinate values."

So the logical JSON structure for multiple other dimensions is this, which does parse the data and start the map reduce tasks

        "spatialDimensions" : [
         
{
           
"dimName" : "latitude",
           
"dims" : []
         
},
         
{
           
"dimName" : "longitude",
           
"dims" : []
         
},
       
]


but when processing the data, this throws an java.lang.IllegalArgumentException when inserting into the RTree as shown below.  This is where the insert checks that the incoming array of floats (coords) contains the proper number of dimensions, which is specified when the rtree is created.

 /** @param coords - the coordinates of the entry
   * @param entry  - the integer to insert
   */

 
public void insert(float[] coords, int entry) {
   
Preconditions.checkArgument(coords.length == numDims);
    insertInner
(new Point(coords, entry, bitmapFactory));
 
}


 I'm guessing at the proper JSON structure, am I specifying this incorrectly ?  It looks like the number of dimensions for the RTREE is mismatching the incoming values in the float array for the RTree index.

 
2016-08-18T21:54:57,091 INFO [main] org.apache.hadoop.mapred.JobClient - Task Id : attempt_201608160124_0063_r_000007_0, Status : FAILED on node b141-18
java.lang.IllegalArgumentException
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
        at com.metamx.collections.spatial.RTree.insert(RTree.java:89)
        at io.druid.segment.IndexMerger.makeIndexFiles(IndexMerger.java:974)
        at io.druid.segment.IndexMerger.merge(IndexMerger.java:423)
        at io.druid.segment.IndexMerger.persist(IndexMerger.java:195)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.persist(IndexGeneratorJob.java:501)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:672)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:469)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:620)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:458)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:278)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595)
        at org.apache.hadoop.mapred.Child.main(Child.java:267)


Jonathan Wei

unread,
Aug 19, 2016, 3:31:07 PM8/19/16
to druid...@googlegroups.com
Hello,

If you're composing the spatial dimension from two other dimensions, you'll want to use this syntax:

"spatialDimensions" : [
          
{
            
"dimName": "coorindates",
            
"dims": ["latitude", "longitude"]
          
}
        
]

"dims" there indicates the individual component dimensions that are used to construct the new spatial "coordinates" dimension


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/0aa2f636-45fb-420e-bac2-be2b3eb1d698%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Terry Senovich

unread,
Aug 22, 2016, 4:22:51 PM8/22/16
to Druid User
Hi Jonathan -

That was the first thing I tried.  But when including "latitude" and "longitude" in the columns and dimensions list (as shown in the original post), and using your recommended snippet, the hadoop indexing job fails during parsing with an error indicating it is looking for column "coordinates". 

2016-08-22T17:00:26,794 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_101]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_101]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_101]
        at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_101]
        at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:115) [druid-services-0.9.1.1.jar:0.9.1.1]
        at io.druid.cli.Main.main(Main.java:105) [druid-services-0.9.1.1.jar:0.9.1.1]
Caused by: java.lang.IllegalArgumentException: Instantiation of [simple type, class io.druid.data.input.impl.DelimitedParseSpec] value failed: column[coordinates] not in columns.
        at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2774) ~[jackson-databind-2.4.6.jar:2.4.6]
       
Adding the column name "coordinates" to the end of the column list fixes that.  This isn't very well documented, but seems to work !
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Fangjin Yang

unread,
Aug 25, 2016, 5:41:03 PM8/25/16
to Druid User
You spelt "coordinates" incorrectly in the spec.
Reply all
Reply to author
Forward
0 new messages