HTTP ERROR: 500 on running index task

616 views
Skip to first unread message

vikas srivastava

unread,
Jun 29, 2016, 12:13:35 PM6/29/16
to Druid User
Hi,

I am trying to run Druid-0.9.0 for a POC.

My issue is getting a HTTP error : 500 when i try to run the index task.

Mu Ingestion Spec file is:

{
  "type" : "index",
  "spec" : {
    "ioConfig" : {
      "type" : "index",
      "firehose" : {
        "type" : "local",
        "baseDir" : "/home/centos/"
        "filter" : "abc.csv"
      }
    },
    "dataSchema" : {
      "dataSource" : "abc",
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "day",
        "queryGranularity" : "none",
        "intervals" : ["2000-01-01/2000-01-02"]
      },
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "csv",
                  "columns" : [
                   "timestamp",
                   "household_id",
           "network_group_id",
           "quarter_hour_of_the_day_offset",
           "broadcast_month_id",
           "ad_zone",
           "region",
           "region_all",
           "party",
           "party_all",
           "duration",
           "ethnic_group",
           "age_range",
           "income_range",
           "gender"
                  ]
          "dimensionsSpec" : {
            "dimensions" : [
              "household_id",
              "network_group_id",
              "quarter_hour_of_the_day_offset",
              "broadcast_month_id",
              "ad_zone",
              "region",
              "region_all",
              "party",
              "party_all",
              "duration",
              "ethnic_group",
              "age_range",
              "income_range",
              "gender"
            ]
          },
          "timestampSpec" : {
            "format" : "auto",
            "column" : "time"
          }
        }
      },
      "metricsSpec" : [
        {
          "name" : "household_id",
          "type" : "count"
        },
        {
          "name" : "duration",
          "type" : "longSum",
          "fieldName" : "duration"
        }

      ]
    },
    "tuningConfig" : {
      "type" : "index",
      "targetPartitionSize" : 0
      "rowFlushBoundary": 0
      }
    }
  }

The error i am getting:

Warning: Couldn't read data from file "abc.json", this makes an empty
Warning: POST.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
<title>Error 500 </title>
</head>
<body>
<h2>HTTP ERROR: 500</h2>
<p>Problem accessing /druid/indexer/v1/task. Reason:
<pre>    java.lang.NullPointerException: task</pre></p>
<hr /><i><small>Powered by Jetty://</small></i>
</body>
</html>

I am stuck at this and would really appreciate some help.

Thanks.

David Lim

unread,
Jun 29, 2016, 12:44:53 PM6/29/16
to Druid User
Hi Vikas,

Something is wrong in your HTTP request to the overlord. If you are using curl, your '-d @{fileName}' path is likely incorrect.

vikas srivastava

unread,
Jun 29, 2016, 1:12:19 PM6/29/16
to Druid User
Hi David,

Thanks for the reply.

I am using curl to POST. My curl command is:

curl -X 'POST' -H 'Content-Type:application/json' -d @abc.json ***.***.***.***:8090/druid/indexer/v1/task

Is there something wrong with this command ?

Thanks,
Vikas

David Lim

unread,
Jun 29, 2016, 1:30:19 PM6/29/16
to Druid User
That command looks good to me. Are you running it from the same directory where abc.json lives? Another reason you may be unable to read the file is insufficient permissions.

vikas srivastava

unread,
Jun 29, 2016, 1:33:44 PM6/29/16
to Druid User

No i wasn't. When i ran the command from the same directory, i got this error msg:

{"error":"Instantiation of [simple type, class io.druid.indexing.common.task.IndexTask] value failed: null"}

David Lim

unread,
Jun 29, 2016, 1:48:40 PM6/29/16
to Druid User
Okay that's better. That error means that your JSON couldn't be deserialized into an IndexTask, in this case because it's not proper JSON. You're missing a bunch of commas in there:

After: "/home/centos/" in:


      "firehose" : {
        "type" : "local",
        "baseDir" : "/home/centos/"
        "filter" : "abc.csv"
      }

After: "]" in:


           "age_range",
           "income_range",
           "gender"
                  ]
          "dimensionsSpec" : {
            "dimensions" : [

After '"targetPartitionSize" : 0' in:


    "tuningConfig" : {
      "type" : "index",
      "targetPartitionSize" : 0
      "rowFlushBoundary": 0
      }

Also your tuningConfig settings don't look valid to me. Take a look at the documentation here: http://druid.io/docs/latest/ingestion/tasks.html for ideas about what might be more reasonable values.

vikas srivastava

unread,
Jun 29, 2016, 2:53:21 PM6/29/16
to Druid User
Thanks a lot man. that solved the issue.

Now i am getting "com.metamx.common.parsers.ParseException: Unparseable timestamp found!"

I have tagged all rows with a fixed timestamp as stated on the tutorial page since my data does not come with timestamp of their own.

A line in my data looks like this:

2000-01-01T00:00:00.000Z ,123,123,123,123,aa,aa,aa,aa,aa,123,aa,123+,123-456,aa

The Logs are as follows:

n] com.sun.jersey.server.impl.application.WebApplicationImpl - Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:25 AM'
2016-06-29T18:37:19,025 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.initialization.jetty.CustomExceptionMapper to GuiceManagedComponentProvider with the scope "Singleton"
2016-06-29T18:37:19,028 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding com.fasterxml.jackson.jaxrs.json.JacksonJsonProvider to GuiceManagedComponentProvider with the scope "Singleton"
2016-06-29T18:37:19,109 INFO [task-runner-0-priority-0] io.druid.segment.realtime.firehose.LocalFirehoseFactory - Found files: [/druid-0.9.0/abc_1.csv]
2016-06-29T18:37:19,117 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[IndexTask{id=index_abc_2016-06-29T18:37:14.940Z, type=index, dataSource=abc}]
com.metamx.common.parsers.ParseException: Unparseable timestamp found!
	at io.druid.data.input.impl.MapInputRowParser.parse(MapInputRowParser.java:72) ~[druid-api-0.3.16.jar:0.3.16]
	at io.druid.data.input.impl.StringInputRowParser.parseMap(StringInputRowParser.java:136) ~[druid-api-0.3.16.jar:0.3.16]
	at io.druid.data.input.impl.StringInputRowParser.parse(StringInputRowParser.java:131) ~[druid-api-0.3.16.jar:0.3.16]
	at io.druid.data.input.impl.FileIteratingFirehose.nextRow(FileIteratingFirehose.java:72) ~[druid-api-0.3.16.jar:0.3.16]
	at io.druid.indexing.common.task.IndexTask.getDataIntervals(IndexTask.java:244) ~[druid-indexing-service-0.9.0.jar:0.9.0]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:200) ~[druid-indexing-service-0.9.0.jar:0.9.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:338) [druid-indexing-service-0.9.0.jar:0.9.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:318) [druid-indexing-service-0.9.0.jar:0.9.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_91]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_91]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_91]
Caused by: java.lang.NullPointerException: Null timestamp in input: {timestamp=2000-01-01T00:00:00.000Z , abc_id=2114451, xyz_id=482, quarter_hour_of_th...
	at io.druid.data.input.impl.MapInputRowParser.parse(MapInputRowParser.java:63) ~[druid-api-0.3.16.jar:0.3.16]
	... 11 more
2016-06-29T18:37:19,128 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_abc_2016-06-29T18:37:14.940Z",
  "status" : "FAILED",
  "duration" : 421

vikas srivastava

unread,
Jun 29, 2016, 3:32:56 PM6/29/16
to Druid User
Hold on, i found the issue. the column name and timestampSpec were different. fixed it. now the task is showing as "running" status on the coordinator console.

will update once i get a result.

vikas srivastava

unread,
Jun 29, 2016, 3:59:58 PM6/29/16
to Druid User
ok, so now the task has been running for around 30 mins.

The data i am ingesting is a 10 Gigs file.

what is the average time taken by Druid to ingest this amount of data ?

 

On Wednesday, 29 June 2016 12:13:35 UTC-4, vikas srivastava wrote:

Fangjin Yang

unread,
Jun 30, 2016, 2:30:10 PM6/30/16
to Druid User
Hi Vikas, it appears you are running the local index task, which is not recommended for any file size greater than 1G as the performance can be slow.

You can use a remote Hadoop cluster such as EMR to do the ingestion. That should significantly improve ingestion times.
Reply all
Reply to author
Forward
0 new messages