Reindexing task failed.

기준

unread,

Sep 26, 2017, 6:46:31 AM9/26/17

to Druid User

Hi team!

I have trouble with reindexing task.

Here is my reindexing json

```

{

"type": "index_hadoop",

"spec": {

"dataSchema": {

"dataSource": "myDataSource"

},

"ioConfig": {

"type": "hadoop",

"inputSpec": {

"type": "dataSource",

"ingestionSpec": {

"dataSource": "myDataSource",

"intervals": ["2017-09-26T17:00:00Z/PT1H"]

}

},

"tuningConfig": {

"type": "hadoop",

"jobProperties": {

"mapreduce.job.queuename": "root.druid.batch",

"mapreduce.job.classloader": "true",

"mapreduce.job.classloader.system.classes": "-javax.validation.,java.,javax.,org.apache.commons.logging.,org.apache.log4j.,org.apache.hadoop.",

"mapreduce.map.memory.mb": 4096,

"mapreduce.map.java.opts": "-server -Xmx4096m -Duser.timezone=UTC -Dfile.encoding=UTF-8",

"mapreduce.reduce.memory.mb": 8192,

"mapreduce.reduce.java.opts": "-server -Xmx8g -Duser.timezone=UTC -Dfile.encoding=UTF-8"

},

"partitionsSpec": {

"type": "dimension",

"partitionDimension": "pd",

"targetPartitionSize": 8333333

},

"buildV9Directly": "true"

}

},

"hadoopDependencyCoordinates": ["org.apache.hadoop:hadoop-client:2.6.0"]

}

```

This is the indexer log from overlord web-console ui.

```

2017-09-26T10:32:50,958 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.HadoopIndexTask - Starting a hadoop determine configuration job...

2017-09-26T10:32:50,988 INFO [task-runner-0-priority-0] io.druid.indexer.path.DatasourcePathSpec - Found total [36] segments for [myDataSource] in interval [[2017-09-26T17:00:00.000Z/2017-09-26T18:00:00.000Z]]

2017-09-26T10:32:50,988 WARN [task-runner-0-priority-0] io.druid.segment.indexing.DataSchema - No parser has been specified

2017-09-26T10:32:50,989 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_myDataSource_2017-09-26T10:32:43.389Z, type=index_hadoop, dataSource=myDataSource}]

java.lang.RuntimeException: java.lang.reflect.InvocationTargetException

at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:211) ~[druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:176) ~[druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.0.jar:0.10.0]

at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.0.jar:0.10.0]

at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_60]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_60]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_60]

at java.lang.Thread.run(Thread.java:745) [?:1.8.0_60]

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60]

at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:208) ~[druid-indexing-service-0.10.0.jar:0.10.0]

... 7 more

Caused by: java.lang.NullPointerException

at io.druid.indexer.path.DatasourcePathSpec.addInputPaths(DatasourcePathSpec.java:117) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]

at io.druid.indexer.HadoopDruidIndexerConfig.addInputPaths(HadoopDruidIndexerConfig.java:389) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]

at io.druid.indexer.JobHelper.ensurePaths(JobHelper.java:337) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]

at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:55) ~[druid-indexing-hadoop-0.10.0.jar:0.10.0]

at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:306) ~[druid-indexing-service-0.10.0.jar:0.10.0]

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_60]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_60]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_60]

at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_60]

at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:208) ~[druid-indexing-service-0.10.0.jar:0.10.0]

... 7 more

```

This task executed via following command

```

curl -v -L -XPOST -H'Content-Type: application/json' -d @$jsonFile http://$overlord:$overlordPort/druid/indexer/v1/task

```

Can you plz tell me where is the invalid point?

I read following document several times but i could not found flaws.

And can you give some full sample json?

It would be very useful.

Thanks, have a nice day.

Jonathan Wei

unread,

Sep 26, 2017, 3:18:29 PM9/26/17

to druid...@googlegroups.com

Hello,

You're missing parser definition in that ingestion spec, you can see an example and more documentation on that at:

http://druid.io/docs/latest/ingestion/batch-ingestion.html

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/e80e0e3b-153e-4dd9-8eb0-b75ba69e0514%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

기준

unread,

Sep 27, 2017, 1:41:25 AM9/27/17

to Druid User

Oh. thanks!

But i wonder why reindexing task need to know about parser spec?

Because datasource and interval is described in json, and those file is located in hdfs.

Then what is the difference between indexing task and re-indexing?

Re-indexing have to know about parser spec detail, then re-indexing is same as batch-indexing after all.

I thought reindexing task as make existing segments(probably many, cause real-time kafka indexing makes a lot of segment per partition and granularity) into

more fine-grained segments.

Can i achieve this via merge task?

Thanks~!

Have a nice day!

2017년 9월 26일 화요일 오후 7시 46분 31초 UTC+9, 기준 님의 말:

laris...@saucelabs.com

unread,

Jan 19, 2018, 5:49:18 PM1/19/18

to Druid User

Hi,

I'm getting same error, but I do have parser definition.

2018-01-19T22:30:53,858 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_sauce_jobs_ds_2018-01-19T22:30:36.715Z, type=index_hadoop, dataSource=sauce_jobs_ds}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:218) ~[druid-indexing-service-0.11.0.jar:0.11.0]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:226) ~[druid-indexing-service-0.11.0.jar:0.11.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.11.0.jar:0.11.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.11.0.jar:0.11.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_151]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_151]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_151]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:215) ~[druid-indexing-service-0.11.0.jar:0.11.0]
	... 7 more
Caused by: io.druid.java.util.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
	at io.druid.indexer.JobHelper.runJobs(JobHelper.java:390) ~[druid-indexing-hadoop-0.11.0.jar:0.11.0]
	at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:95) ~[druid-indexing-hadoop-0.11.0.jar:0.11.0]
	at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:279) ~[druid-indexing-service-0.11.0.jar:0.11.0]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_151]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_151]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_151]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_151]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:215) ~[druid-indexing-service-0.11.0.jar:0.11.0]
	... 7 more
2018-01-19T22:30:53,870 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_sauce_jobs_ds_2018-01-19T22:30:36.715Z] status changed to [FAILED].
2018-01-19T22:30:53,878 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_sauce_jobs_ds_2018-01-19T22:30:36.715Z",
  "status" : "FAILED",
  "duration" : 12139
}

larissa

unread,

Jan 19, 2018, 6:00:36 PM1/19/18

to Druid User

console.log

Jonathan Wei

unread,

Jan 19, 2018, 6:25:35 PM1/19/18

to druid...@googlegroups.com

> Then what is the difference between indexing task and re-indexing?

Re-indexing is a type of indexing task where the input data comes from existing Druid segments, they're conceptually the same type of task.

>I thought reindexing task as make existing segments(probably many, cause real-time kafka indexing makes a lot of segment per partition and granularity) into more fine-grained segments. Can i achieve this via merge task?

I don't think you can reindex existing segments into a granularity finer than what it was originally ingested with (as that "finer-grained" information would have been lost).

The merge task also doesn't work with segments generated by the Kafka indexing service, the reasoning is explained in the last paragraph of the doc page (http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html).

On Fri, Jan 19, 2018 at 3:00 PM, larissa <lar...@saucelabs.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/ecf6401d-abef-4a94-bdb0-72135d7399c8%40googlegroups.com.

Jonathan Wei

unread,

Jan 19, 2018, 6:27:56 PM1/19/18

to druid...@googlegroups.com

> I'm getting same error, but I do have parser definition.

The actual error is shown further up in the log you uploaded:

```
java.lang.Exception: java.io.IOException: No FileSystem for scheme: s3n

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.7.3.jar:?]
Caused by: java.io.IOException: No FileSystem for scheme: s3n
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) ~[hadoop-common-2.7.3.jar:?]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:709) ~[druid-indexing-hadoop-0.11.0.jar:0.11.0]
at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:489) ~[druid-indexing-hadoop-0.11.0.jar:0.11.0]
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.7.3.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_151]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_151]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_151]

```

I would check that you have the Druid S3 deep storage extension loaded, and try setting your jobProperties in the indexing spec like the following:

```
"jobProperties" : {
"fs.s3.awsAccessKeyId" : "YOUR_ACCESS_KEY",
"fs.s3.awsSecretAccessKey" : "YOUR_SECRET_KEY",
"fs.s3.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
"fs.s3n.awsAccessKeyId" : "YOUR_ACCESS_KEY",
"fs.s3n.awsSecretAccessKey" : "YOUR_SECRET_KEY",
"fs.s3n.impl" : "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
"io.compression.codecs" : "org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"

```

On Fri, Jan 19, 2018 at 3:25 PM, Jonathan Wei <jon...@imply.io> wrote:
>
> > Then what is the difference between indexing task and re-indexing?
>
> Re-indexing is a type of indexing task where the input data comes from existing Druid segments, they're conceptually the same type of task.
>
> >I thought reindexing task as make existing segments(probably many, cause real-time kafka indexing makes a lot of segment per partition and granularity) into more fine-grained segments. Can i achieve this via merge task?
>
> I don't think you can reindex existing segments into a granularity finer than what it was originally ingested with (as that "finer-grained" information would have been lost).
>
> The merge task also doesn't work with segments generated by the Kafka indexing service, the reasoning is explained in the last paragraph of the doc page (http://druid.io/docs/latest/development/extensions-core/kafka-ingestion.html).
>
>
>
> On Fri, Jan 19, 2018 at 3:00 PM, larissa <lar...@saucelabs.com> wrote:
>>>>>
>>>>>
>> --
>> You received this message because you are subscribed to the Google Groups "Druid User" group.

>> To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

larissa

unread,

Jan 22, 2018, 1:57:29 PM1/22/18

to Druid User

Hi Jonathan,

Thanks for the quick response. After adding the config of "jobProperties" in the index json as what you suggested, and

hadoop-aws-2.7.3.jar in the lib/ as what in posting https://groups.google.com/forum/#!topic/druid-user/HhcMkkbKRXI mentioned,

everything works.

Thanks so much again. :)

eke...@accedian.com

unread,

Feb 26, 2018, 2:42:42 PM2/26/18

to Druid User

I'm trying to reindex segments stored in S3 as well but I'm getting errors WRT the S3 credentials. I could really use some help trying to diagnose this.

I'm supplying the credentials in the jobProperties. This works for me if the inputSpec is 'static' but fails using 'dataSource'.

I'm using Druid version 0.11.0. The credentials are also configured in the common config (druid.s3.accessKey, etc).

Here's there error log:

2018-02-26T19:22:31,729 WARN [task-runner-0-priority-0] org.apache.hadoop.fs.FileSystem - Cannot load filesystem

java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
	at java.util.ServiceLoader.fail(ServiceLoader.java:232) ~[?:1.8.0_66-internal]
	at java.util.ServiceLoader.access$100(ServiceLoader.java:185) ~[?:1.8.0_66-internal]
	at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384) ~[?:1.8.0_66-internal]
	at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404) ~[?:1.8.0_66-internal]
	at java.util.ServiceLoader$1.next(ServiceLoader.java:480) ~[?:1.8.0_66-internal]
	at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2631) [hadoop-common-2.7.3.jar:?]

at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2650) [hadoop-common-2.7.3.jar:?]

...

2018-02-26T19:22:32,056 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_npav-ts-metrics_2018-02-26T19:22:19.303Z, type=index_hadoop, dataSource=npav-ts-metrics}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:218) ~[druid-indexing-service-0.11.0.jar:0.11.0]
	at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:226) ~[druid-indexing-service-0.11.0.jar:0.11.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.11.0.jar:0.11.0]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.11.0.jar:0.11.0]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_66-internal]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_66-internal]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_66-internal]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_66-internal]
Caused by: java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_66-internal]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_66-internal]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_66-internal]
	at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_66-internal]
	at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:215) ~[druid-indexing-service-0.11.0.jar:0.11.0]
	... 7 more
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).

Here's my task request:

{
  "type":"index_hadoop",
  "spec":{
    "dataSchema":{

      "dataSource":"npav-ts-metrics",
      "parser":{
        "type":"hadoopyString",
        "parseSpec":{
          "format":"json",
          "timestampSpec":{
            "column":"timestamp",
            "format":"auto"
          },
          "dimensionsSpec":{
            "dimensions":[
              "tenantId"
            ],
            "dimensionExclusions":[
              "timestamp"
            ]
          }
        }
      },
      "metricsSpec":[
        {
          "name":"rowCount",
          "type":"count"
        },
        {
          "fieldName":"delayMin",
          "name":"delayMin",
          "type":"longMin"
        },
        {
          "fieldName":"delayMax",
          "name":"delayMax",
          "type":"longMax"
        }
      ],
      "granularitySpec":{
        "type":"uniform",
        "segmentGranularity":"DAY",
        "queryGranularity":"NONE",
        "intervals":[
          "2018-02-25T18:00:00.000Z/2018-02-25T19:00:00.000Z"
        ]


      }
    },
    "ioConfig":{
      "type":"hadoop",
      "inputSpec":{
        "type":"dataSource",
        "ingestionSpec":{

          "dataSource":"npav-ts-metrics",
          "intervals":[
            "2018-02-25T18:00:00.000Z/2018-02-25T19:00:00.000Z"

          ]
        }
      }
    },
    "tuningConfig":{
      "type":"hadoop",

      "jobProperties":{,
        "fs.s3.awsAccessKeyId":"MY_ACCESS_KEY",
        "fs.s3n.awsAccessKeyId":"MY_ACCESS_KEY",
        "fs.s3a.awsAccessKeyId":"MY_ACCESS_KEY",
        "fs.s3.awsSecretAccessKey":"MY_SECRET_KEY",
        "fs.s3n.awsSecretAccessKey":"MY_SECRET_KEY",
        "fs.s3a.awsSecretAccessKey":"MY_SECRET_KEY",
        "fs.s3.impl":"org.apache.hadoop.fs.s3native.NativeS3FileSystem",


        "fs.s3n.impl":"org.apache.hadoop.fs.s3native.NativeS3FileSystem",
        "io.compression.codecs":"org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec"
      }
    }
  }
}

Thanks,

LIz

Dario Penas

unread,

Mar 15, 2018, 11:02:14 AM3/15/18

to Druid User

We have the same issue. There is no issue while ingesting the data, but when it comes to reindexing it, it gives us the same error as you. We are sending the following task:

{
    "type" : "index_hadoop",
    "spec" : {


    	"ioConfig" : {
	        "type" : "hadoop",
	        "inputSpec" : {
	            "type" : "dataSource",
	            "ingestionSpec" : {

	                "dataSource": "reporting-cli-clean",
	                "intervals": ["2018-03-03T00:00:00Z/2018-03-03T23:59:59Z"]
	            }
	        }
    	},
        "dataSchema" : {
            "dataSource" : "reporting-cli-clean",
            "granularitySpec": {
                "type": "uniform",
                "segmentGranularity": "day",
                "queryGranularity": "day",
                "intervals": [
                    "2018-03-03T00:00:00Z/2018-03-03T23:59:59Z"
                ]


            },
			"parser" : {
                "type" : "hadoopyString",
                "parseSpec" : {
                    "format": "json",
                    "timestampSpec": {

                        "column": "request_datetime",
                        "format": "YYYY-MM-dd HH:mm:ss"
                    },
                    "dimensionsSpec": {
                        "dimensions": [
                            "whatever",
                        ],
                        "dimensionsExclusions": [],
                        "spatialDimensions": []
                    }
                }
            },
            "metricsSpec": [
                {
                    "name": "count",
                    "type": "count"


                }
            ]
        },
    	"tuningConfig": {
            "type": "hadoop",
            "jobProperties": {


                "fs.s3n.awsAccessKeyId": "xxx",
                "fs.s3n.awsSecretAccessKey": "xxx",
                "fs.s3n.impl": "org.apache.hadoop.fs.s3native.NativeS3FileSystem",               
                "fs.s3n.awsAccessKeyId": "xxx",
                "fs.s3n.awsSecretAccessKey": "xxx",
                "fs.s3n.impl": "org.apache.hadoop.fs.s3native.NativeS3FileSystem",
            }
        }
    }
}

Do you have any idea of why this is failing?

Also, why is it necessary to specify the S3 credentials when we are not getting any of the files from S3? Druid shouldn't need them, right?

Thank you!
Darío.

Wang Yi

unread,

Mar 29, 2018, 11:05:26 AM3/29/18

to Druid User

Yes, exactly the same issue, can some experts help us here?

Jonathan Wei

unread,

Mar 29, 2018, 4:21:54 PM3/29/18

to druid...@googlegroups.com

If you're seeing AWS credentials issues on reindexing tasks, you may be hitting the following bug (issue has a link to the fix): https://github.com/druid-io/druid/issues/5135

> Also, why is it necessary to specify the S3 credentials when we are not getting any of the files from S3? Druid shouldn't need them, right?

The reindexing task would pull its input segments from S3 if that's the configured deep storage, so the credentials are needed.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.

To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.

To post to this group, send email to druid...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/4b5cdbe3-6945-492c-9701-fe72c5e5fdf4%40googlegroups.com.

Reply all

Reply to author

Forward