Deep Storage :Local

Nick

unread,

Jul 13, 2016, 4:36:28 AM7/13/16

to Druid User

I am exploring Druid and using imply.io package for it.

I am able to post wikiticker data which comes with imply.io and able to see the data in pivot as expected.

After that , I have created another specs file and placed in quickstart folder and run the indexing job to load the data .

I am getting the following error message

                Bytes Written=8
2016-07-13T08:34:11,107 ERROR [task-runner-0-priority-0] io.druid.indexer.IndexGeneratorJob - [File var/druid/hadoop-tmp/sales/2016-07-13T083357.479Z/ce6b738c2f6749098763ac533b80abed/segmentDescriptorInfo does not exist] SegmentDescriptorInfo is not found usually when indexing process did not produce any segments meaning either there was no input data to process or all the input events were discarded due to some error
2016-07-13T08:34:11,110 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_sales_2016-07-13T08:33:53.220Z, type=index_hadoop, dataSource=sales}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
        at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
        at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_92]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_92]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_92]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_92]
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_92]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_92]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_92]
        at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_92]
        at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
        ... 7 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File var/druid/hadoop-tmp/sales/2016-07-13T083357.479Z/ce6b738c2f6749098763ac533b80abed/segmentDescriptorInfo does not exist

bin/post-index-task --file quickstart/sales-index.json


Ingestion specs for the file is given as 
{
  "type" : "index_hadoop",
  "spec" : {
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "quickstart/sales-2016-06-27-sampled.json"
      }
    },
        "dataSchema": {
        "dataSource": "sales",
        "granularitySpec": {
                "type": "uniform",
                "segmentGranularity": "week",
                "queryGranularity": "none",
                "intervals" : ["2016-06-27/2016-06-28"]
        },
        "parser": {
                "type": "string",
                "parseSpec": {
                        "format": "json",
                        "timestampSpec": {
                                "column": "Week",
                                "format": "YYYYMMDD"
                        },
                        "dimensionsSpec": {
                                "dimensions": ["Week",
                                "SKU",
                                "Str",
                                "Type","Qty"],
                                "dimensionExclusions": [],
                                "spatialDimensions": []
                        }
                }
        },
        "metricsSpec": [{
                "type": "count",
                "name": "count"
        },
        {
                "type": "doubleSum",
                "name": "QtySum",
                "fieldName": "Qty"
        }],
         "tuningConfig" : {
      "type" : "hadoop",
      "partitionsSpec" : {
        "type" : "hashed",
        "targetPartitionSize" : 5000000
      },
      "jobProperties" : {}
    }
}
}
}

sales-2016-06-27-sampled.json sample data set is as following


{"Week": 20131014,"SKU": "0001780","Str": "0011100","Type": "abc","Qty": 22}
{"Week": 20131223,"SKU": "0001780","Str": "0001100","Type": "abc","Qty": 2}

David Lim

unread,

Jul 13, 2016, 12:23:23 PM7/13/16

to Druid User

Hey Nick,

Your interval (

"2016-06-27/2016-06-28") needs to match the timestamps of your data (20131014, 20131223) otherwise no segments will be created. Also I don't think your timestampSpec format is what you intended, you probably want 'YYYYMMdd' (see: http://www.joda.org/joda-time/apidocs/org/joda/time/format/DateTimeFormat.html)

Nick

unread,

Jul 25, 2016, 2:02:11 AM7/25/16

to Druid User

Thanks David!

It worked after correcting the intervals in the ingestion spec.

--

Nick

Reply all

Reply to author

Forward