hadoop batch ingestion failed.

1,686 views
Skip to first unread message

Qi Wang

unread,
Jun 25, 2015, 5:16:52 PM6/25/15
to druid...@googlegroups.com
Hi,

I am trying to switch batch ingestion from local to hadoop but I cann't get it to run. The log says " Job[class io.druid.indexer.LegacyIndexGeneratorJob] failed!" as follows but I don't have a clue of what it means or how to fix it. Can someone take a look at this? Thanks!

Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
2015-06-24T23:32:56,599 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 100%
2015-06-24T23:32:57,611 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1432969244945_2036 failed with state FAILED due to: Task failed task_1432969244945_2036_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

2015-06-24T23:32:57,691 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Counters: 12
        Job Counters
                Failed map tasks=4
                Launched map tasks=4
                Other local map tasks=3
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=12143
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=12143
                Total vcore-seconds taken by all map tasks=12143
                Total megabyte-seconds taken by all map tasks=49737728
        Map-Reduce Framework
                CPU time spent (ms)=0
                Physical memory (bytes) snapshot=0
                Virtual memory (bytes) snapshot=0
2015-06-24T23:32:57,699 INFO [task-runner-0] io.druid.indexer.JobHelper - Deleting path[/tmp/druid-indexing/hadoop_test/2015-06-24T233208.070Z]
2015-06-24T23:32:57,717 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_hadoop_test_2015-06-24T23:32:08.069Z, type=index_hadoop, dataSource=hadoop_test}]
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_25]
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_25]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_25]
        at java.lang.reflect.Method.invoke(Method.java:483) ~[?:1.8.0_25]
        at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:256) ~[druid-indexing-service-0.7.3.jar:0.7.3]
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:235) [druid-indexing-service-0.7.3.jar:0.7.3]
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:214) [druid-indexing-service-0.7.3.jar:0.7.3]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_25]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_25]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_25]
        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_25]
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.LegacyIndexGeneratorJob] failed!
        at io.druid.indexer.JobHelper.runJobs(JobHelper.java:155) ~[druid-indexing-hadoop-0.7.3.jar:0.7.3]
        at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:96) ~[druid-indexing-hadoop-0.7.3.jar:0.7.3]
        at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:304) ~[druid-indexing-service-0.7.3.jar:0.7.3]
        ... 11 more
2015-06-24T23:32:57,725 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_hadoop_test_2015-06-24T23:32:08.069Z",
  "status" : "FAILED",
  "duration" : 44106
}

Michael Schiff

unread,
Jun 25, 2015, 5:54:00 PM6/25/15
to druid...@googlegroups.com
I have run into this issue before.  It is caused because Hadoop and Druid rely on conflicting versions of fasterxml.  The solution for me was to build a custom jar to use, which manually excludes the conflicting fasterxml dependency, and then to use this jar on the classpath of my hadoop indexing task.

See Benjamin Schaff's comment in this thread: https://groups.google.com/forum/#!msg/druid-development/jNxhMZpp-rc/XwAFP2xYe60J


Here is the build file I used:

libraryDependencies ++= Seq(
  "com.amazonaws" % "aws-java-sdk" % "1.9.23" exclude("common-logging", "common-logging"),
  "org.joda" % "joda-convert" % "1.7",
  "joda-time" % "joda-time" % "2.7",
  "io.druid" % "druid" % "0.7.1.1" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid" % "druid-services" % "0.7.1.1" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid" % "druid-indexing-service" % "0.7.1.1" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid" % "druid-indexing-hadoop" % "0.7.1.1" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid.extensions" % "mysql-metadata-storage" % "0.7.1.1" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid.extensions" % "druid-s3-extensions" % "0.7.1.1" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid.extensions" % "druid-histogram" % "0.7.1.1" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "com.fasterxml.jackson.core" % "jackson-annotations" % "2.3.0",
  "com.fasterxml.jackson.core" % "jackson-core" % "2.3.0",
  "com.fasterxml.jackson.core" % "jackson-databind" % "2.3.0",
  "com.fasterxml.jackson.datatype" % "jackson-datatype-guava" % "2.3.0",
  "com.fasterxml.jackson.datatype" % "jackson-datatype-joda" % "2.3.0",
  "com.fasterxml.jackson.jaxrs" % "jackson-jaxrs-base" % "2.3.0",
  "com.fasterxml.jackson.jaxrs" % "jackson-jaxrs-json-provider" % "2.3.0",
  "com.fasterxml.jackson.jaxrs" % "jackson-jaxrs-smile-provider" % "2.3.0",
  "com.fasterxml.jackson.module" % "jackson-module-jaxb-annotations" % "2.3.0",
  "com.sun.jersey" % "jersey-servlet" % "1.17.1",
  "mysql" % "mysql-connector-java" % "5.1.34",
  "org.scalatest" %% "scalatest" % "2.2.3" % "test",
  "org.mockito" % "mockito-core" % "1.10.19" % "test"
)

assemblyMergeStrategy in assembly := {
  case path if path contains "pom." => MergeStrategy.first
  case path if path contains "javax.inject.Named" => MergeStrategy.first
  case path if path contains "mime.types" => MergeStrategy.first
  case path if path contains "org/apache/commons/logging/impl/SimpleLog.class" => MergeStrategy.first
  case path if path contains "org/apache/commons/logging/impl/SimpleLog$1.class" => MergeStrategy.first
  case path if path contains "org/apache/commons/logging/impl/NoOpLog.class" => MergeStrategy.first
  case path if path contains "org/apache/commons/logging/LogFactory.class" => MergeStrategy.first
  case path if path contains "org/apache/commons/logging/LogConfigurationException.class" => MergeStrategy.first
  case path if path contains "org/apache/commons/logging/Log.class" => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}


It would be nice not to need to package your own build, but for now this seems to be the work-around.

Gian Merlino

unread,
Jun 29, 2015, 4:09:01 AM6/29/15
to druid...@googlegroups.com
Hi Qi,

Just wondering, did you end up getting this to work? And either way- I'm wondering which version of Hadoop are you using? I think Michael is right that it's a dependency/packaging problem, and it would be good for us to know which Hadoop versions are causing problems with the current Druid builds.

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/7c600fcd-9aa3-45cd-b9b7-d671411e12e3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Qi Wang

unread,
Jun 29, 2015, 5:24:11 PM6/29/15
to druid...@googlegroups.com
Hi Gian,

I'm still trying to figure out how to recompile Druid since I used the tar ball before. The hadoop version I'm using is Hadoop 2.5.0-cdh5.3.3. I saw the new version of Druid, 0.8.0 is released recently and I'm wondering if that version is compatible with our hadoop system?

Thanks,
Qi

Qi Wang

unread,
Jun 30, 2015, 1:52:45 PM6/30/15
to druid...@googlegroups.com
Hi Michael,

Thanks for the help! I'm trying to build the custom jar as you said using sbt but I'm getting this error, have you met it before?
Thanks!

assemblyMergeStrategy in assembly := {
^
[error] Type error in expression

Michael Schiff

unread,
Jun 30, 2015, 2:40:09 PM6/30/15
to druid...@googlegroups.com
Qi,
I have not seen this issue before.  what version of SBT are you using?  I did this with 0.13.8


Gian,
I ran into this issue with druid 0.7.1.1 (depends on jackson 2.4.0) and hadoop 2.4 (depends on jackson 2.3.0)

Qi Wang

unread,
Jun 30, 2015, 3:01:07 PM6/30/15
to druid...@googlegroups.com
Hi Michael, 

I'm doing it with 0.13.8 as well. I'm a newbie to sbt. This is what I did:

a) Install sbt
b) Download & unpack the source code of Druid (version 0.7.3)
c) Create a build.sbt file in the base directory of druid source code with the content you provided. And then modify the version number from "0.7.1.1" to "0.7.3"
d) Create a assemble.sbt file in the same directory and put "addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.12.0")" in it.
e) Run sbt

Does it look good?

Thanks,
Qi

Michael Schiff

unread,
Jun 30, 2015, 3:23:00 PM6/30/15
to druid...@googlegroups.com
I did things slightly differently:

1) install sbt
2) create new empty directory 'druid_assembly_build'
3) cd to this new directory
4) create a build.sbt file with the contents described above
5) create a directory 'druid_assembly_build/project'
6) create a file 'druid_assembly_build/project/assembly.sbt' with contents "addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")"
7) from inside druid_assembly_build run 'sbt assembly'

Qi Wang

unread,
Jun 30, 2015, 7:12:00 PM6/30/15
to druid...@googlegroups.com
It worked. Finally the map phase stop complaining. Thanks Michael!
But now the reduce phase starts to have the same issue. I have no clue why this happened. It seems Benjamin Schaff's thread mentioned the same issue but they didn't really mention how to solve it. So I'm wondering if you met this also.

2015-06-30T21:23:29,085 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1432969244945_2041_r_000006_0, Status : FAILED
Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

Qi Wang

unread,
Jun 30, 2015, 8:49:45 PM6/30/15
to druid...@googlegroups.com
So the error message is the same, the only difference is that for now map phase can finish smoothly but reduce phase has the same issue. I'm using Druid 0.7.3, maybe there are other dependencies I need to manually resolve?

Error: class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;
2015-06-30T23:57:02,959 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 75%
2015-06-30T23:57:03,966 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1432969244945_2044_r_000020_2, Status : FAILED
Error: java.lang.IllegalArgumentException: Wrong FS: file://hdfs:/gold-ha-nameservice/user/qi_wang/deepStorage/hadoop_test/hadoop_test/2015-05-21T00:00:00.000Z_2015-05-22T00:00:00.000Z/2015-06-30T23:55:56.232Z/0, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
        at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:414)
        at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:590)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.serializeOutIndex(IndexGeneratorJob.java:470)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:446)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:292)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

2015-06-30T23:57:03,967 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Task Id : attempt_1432969244945_2044_r_000021_2, Status : FAILED
Error: java.lang.IllegalArgumentException: Wrong FS: file://hdfs:/gold-ha-nameservice/user/qi_wang/deepStorage/hadoop_test/hadoop_test/2015-05-22T00:00:00.000Z_2015-05-23T00:00:00.000Z/2015-06-30T23:55:56.232Z/0, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
        at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:80)
        at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:414)
        at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:590)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.serializeOutIndex(IndexGeneratorJob.java:470)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:446)
        at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:292)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

2015-06-30T23:57:04,971 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 83%
2015-06-30T23:57:09,992 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job -  map 100% reduce 100%
2015-06-30T23:57:09,999 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_1432969244945_2044 failed with state FAILED due to: Task failed task_1432969244945_2044_r_000019
Job failed as tasks failed. failedMaps:0 failedReduces:1

Qi Wang

unread,
Jul 1, 2015, 8:11:52 PM7/1/15
to druid...@googlegroups.com
Tried Druid 0.7.1.1. Got the same error. The interesting thing is that the segments actually was stored to deep storage folder but the mysql meta data for those segments is not updated.

Gian Merlino

unread,
Jul 1, 2015, 8:20:21 PM7/1/15
to druid...@googlegroups.com
Hi Qi, are you actually getting both of these errors?

1) class com.fasterxml.jackson.datatype.guava.deser.HostAndPortDeserializer overrides final method deserialize.(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/lang/Object;

2) java.lang.IllegalArgumentException: Wrong FS: file://hdfs:/gold-ha-nameservice/user/qi_wang/deepStorage/hadoop_test/hadoop_test/2015-05-22T00:00:00.000Z_2015-05-23T00:00:00.000Z/2015-06-30T23:55:56.232Z/0, expected: file:///

It's strange that you'd get the first one on some machines but not others. If you've rebuilt specific Druid versions, you could try wiping out the Druid jars on HDFS to make sure that your new ones are actually getting uploaded. I think they're in /tmp/druid-indexing/classpath by default.

For the second one, that seems like something wrong with your Druid hadoop indexing json spec or with your hadoop config xmls. Do you mind posting those?

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.

Qi Wang

unread,
Jul 2, 2015, 1:11:19 PM7/2/15
to druid...@googlegroups.com
Hi Gian,

I managed to solve the first problem. It turns out that I didn't delete the old library in /tmp/druid-indexing/classpath/ and I forgot to remove the lib/* from the classpath. Thanks and you are awesome!

For the second problem. My hadoop ingestion file looks like this. It's just a small test file.

{
  "type" : "index_hadoop",
  "spec" : {
    "dataSchema" : {
      "dataSource" : "hadoop_test",
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "json",
          "timestampSpec" : {
            "column" : "ds",
            "format" : "auto"
          },
          "dimensionsSpec" : {
            "dimensions": [
                "dim_app_family",
                "dim_browser_family",
                "dim_destination_country",
                "dim_destination_market",
                "dim_device_type_best_guess",
                "dim_language",
                "dim_origin_country",
                "dim_origin_market",
                "dim_os_family",
                "ds",
                "subject_id",
                "treatment_name"
            ],
            "dimensionExclusions" : [],
            "spatialDimensions" : []
          }
        }
      },
      "metricsSpec" : [
        {
          "type" : "count",
          "name" : "count"
        }
      ],
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "DAY",
        "queryGranularity" : "NONE",
        "intervals" : [ "2015-05-01/2015-05-25" ]
      }
    },
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "/user/qi_wang/hadoop_data.json"
      }
    },
    "tuningConfig" : {
      "type": "hadoop"
    }
  }
}

Qi Wang

unread,
Jul 2, 2015, 1:45:43 PM7/2/15
to druid...@googlegroups.com
Hi Gian,

I managed to solve the 2nd problem as well! The build.sbt michael provided does not include the hdfs extension. Here is the new build.sbt file I used just in case someone else need it in the future.

libraryDependencies ++= Seq(
  "com.amazonaws" % "aws-java-sdk" % "1.9.23" exclude("common-logging", "common-logging"),
  "org.joda" % "joda-convert" % "1.7",
  "joda-time" % "joda-time" % "2.7",
  "io.druid" % "druid" % "0.7.3" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid" % "druid-services" % "0.7.3" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid" % "druid-indexing-service" % "0.7.3" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid" % "druid-indexing-hadoop" % "0.7.3" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid.extensions" % "mysql-metadata-storage" % "0.7.3" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid.extensions" % "druid-s3-extensions" % "0.7.3" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid.extensions" % "druid-histogram" % "0.7.3" excludeAll (
    ExclusionRule("org.ow2.asm"),
    ExclusionRule("com.fasterxml.jackson.core"),
    ExclusionRule("com.fasterxml.jackson.datatype"),
    ExclusionRule("com.fasterxml.jackson.dataformat"),
    ExclusionRule("com.fasterxml.jackson.jaxrs"),
    ExclusionRule("com.fasterxml.jackson.module")
  ),
  "io.druid.extensions" % "druid-hdfs-storage" % "0.7.3" excludeAll (
  case path if path contains "META-INF/jersey-module-version" => MergeStrategy.first
  case path if path contains ".properties" => MergeStrategy.first
  case path if path contains ".class" => MergeStrategy.first
  case x =>
    val oldStrategy = (assemblyMergeStrategy in assembly).value
    oldStrategy(x)
}

On Wednesday, July 1, 2015 at 5:20:21 PM UTC-7, Gian Merlino wrote:

Fangjin Yang

unread,
Jul 5, 2015, 2:18:17 PM7/5/15
to druid...@googlegroups.com, qi....@airbnb.com
Hi Qi, would you like to contribute your findings to the Druid documentation? It should help others who face the same problems.

Qi Wang

unread,
Jul 6, 2015, 5:35:27 PM7/6/15
to druid...@googlegroups.com, qi....@airbnb.com
Yeah sure. How do I do that?

Fangjin Yang

unread,
Jul 7, 2015, 12:29:56 AM7/7/15
to druid...@googlegroups.com, qi....@airbnb.com
All of the Druid documentation is hosted in the Druid github repository.

This is a good documentation to add your findings to:

jh...@kochava.com

unread,
Jul 7, 2015, 1:40:13 PM7/7/15
to druid...@googlegroups.com
Hi Qi,
I tried building a stand alone assembly using the build.sbt file that you recommended. I was able to successfully build the the jar, however, I have been unable to run it successfully.
Can you share with me the java command line that you use to run the index job?
I'm finding that when I include in common.runtime.properties

# Extensions

druid.extensions.coordinates=["io.druid.extensions:mysql-metadata-storage","io.druid.extensions:druid-hdfs-storage","io.druid.extensions:druid-indexing-hadoop"]


The druid extensions then causes this failure:

2015-07-07T17:18:59,273 ERROR [main] io.druid.initialization.Initialization - Unable to resolve artifacts for [io.druid.extensions:druid-indexing-hadoop:jar:0.7.1.1 (runtime) -> [] < [ (https://repo1.maven.org/maven2/, releases+snapshots),  (https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local, releases+snapshots)]].

org.eclipse.aether.resolution.DependencyResolutionException: Could not find artifact io.druid.extensions:druid-indexing-hadoop:jar:0.7.1.1 in  (https://repo1.maven.org/maven2/)



When I remove the druid extensions, it gives this error:

2015-07-07T17:32:09,063 INFO [main] org.skife.config.ConfigurationObjectFactory - Using method itself for [${base_path}.columnCache.sizeBytes] on [io.druid.query.DruidProcessingConfig#columnCacheSizeBytes()]

2015-07-07T17:32:09,064 INFO [main] org.skife.config.ConfigurationObjectFactory - Assigning default value [processing-%s] for [${base_path}.formatString] on [com.metamx.common.concurrent.ExecutorServiceConfig#getFormatString()]

2015-07-07T17:32:09,130 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[interface io.druid.segment.data.BitmapSerdeFactory] from props[druid.processing.bitmap.] as [ConciseBitmapSerdeFactory{}]

2015-07-07T17:32:09,232 ERROR [main] io.druid.cli.CliHadoopIndexer - failure!!!!

java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.7.0_79]

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) ~[?:1.7.0_79]

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.7.0_79]

at java.lang.reflect.Method.invoke(Method.java:606) ~[?:1.7.0_79]

at io.druid.cli.CliHadoopIndexer.run(CliHadoopIndexer.java:120) [DruidAssembly-SBT-assembly-1.0.jar:1.0]

at io.druid.cli.Main.main(Main.java:88) [DruidAssembly-SBT-assembly-1.0.jar:1.0]

Caused by: com.google.inject.CreationException: Guice creation errors:


1) Binding to null instances is not allowed. Use toProvider(Providers.of(null)) if this is your intended behaviour.

  at io.druid.cli.CliInternalHadoopIndexer$1.configure(CliInternalHadoopIndexer.java:83)



I'm totally confused as to how you got this Hadoop Indexer to finally work.
Any help would be greatly appreciated.

Johnny Hom

Qi Wang

unread,
Jul 7, 2015, 2:52:57 PM7/7/15
to druid...@googlegroups.com, qi....@airbnb.com
OK! Will do that later!

Qi Wang

unread,
Jul 7, 2015, 2:57:38 PM7/7/15
to druid...@googlegroups.com
Hi Johnny,

Try this:
1) In the runtime configuration file of overlord, you need to remove all the extensions there because you have already included them in the fat jar you compiled using sbt.
2) In the command line, you need to remove the lib/* and then include the path to your fat jar.

Hope that helps.
Qi

jh...@kochava.com

unread,
Jul 7, 2015, 8:30:06 PM7/7/15
to druid...@googlegroups.com
Hi Qi,
That didn't work :(
Just to be clear, you are doing the HadoopDruidIndexer, right? Not the indexing service.
I just want to confirm.

jh...@kochava.com

unread,
Jul 7, 2015, 9:06:14 PM7/7/15
to druid...@googlegroups.com
Hi Qi,
I just realized that I was using the example wikipedia_index_hadoop_task.json example and that this is actually a index service job and not the HadoopDruidIndexer job.
Now that I'm using the right one, it actually works :)

Qi Wang

unread,
Jul 8, 2015, 12:42:02 PM7/8/15
to druid...@googlegroups.com
awesome. 

DashV

unread,
Dec 23, 2015, 4:48:18 PM12/23/15
to Druid User
Hit the same error with Hadoop 2.6.0 on qubole.

Looks like they are running jackson 2.1.1

jh...@kochava.com

unread,
Dec 23, 2015, 7:27:50 PM12/23/15
to Druid User

DashV

unread,
Dec 23, 2015, 8:14:06 PM12/23/15
to Druid User
Oh man this is great thanks!

I just tried to build 0.8.3 on my own with the exact version of jackson I needed and it was giving me compile errors. :) The method below looks much less invasive! :)

Thanks!

bwh...@uber.com

unread,
Feb 11, 2016, 11:28:01 PM2/11/16
to Druid User
Hello there!

I am running a large Hadoop cluster on version 2.6.0 so none of my Hadoop indexing jobs work. I run into the jackson dependency issue. The issue has been documented several times: 

In order to fix the jackson dependency issue, I am forced to build a fat Druid jar excluding the jackson dependency with sbt. I followed the solution documented here: 

However, with this sbt built fat jar, I run into a really weird a random issue:
com.google.inject.ProvisionException: Guice provision errors:

1) Error in custom provider, com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
  at io.druid.storage.s3.S3StorageDruidModule.getRestS3Service(S3StorageDruidModule.java:107)
  at io.druid.storage.s3.S3StorageDruidModule.getRestS3Service(S3StorageDruidModule.java:107)
  while locating org.jets3t.service.impl.rest.httpclient.RestS3Service
    for parameter 0 at io.druid.storage.s3.S3DataSegmentKiller.<init>(S3DataSegmentKiller.java:43)
  while locating io.druid.storage.s3.S3DataSegmentKiller
  at io.druid.storage.s3.S3StorageDruidModule.configure(S3StorageDruidModule.java:84)
  while locating io.druid.segment.loading.DataSegmentKiller annotated with @com.google.inject.multibindings.Element(setName=,uniqueId=124, type=MAPBINDER)
  at io.druid.guice.Binders.dataSegmentKillerBinder(Binders.java:24)
  while locating java.util.Map<java.lang.String, io.druid.segment.loading.DataSegmentKiller>
    for parameter 0 at io.druid.segment.loading.OmniDataSegmentKiller.<init>(OmniDataSegmentKiller.java:36)
  while locating io.druid.segment.loading.OmniDataSegmentKiller
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:142)
  while locating io.druid.segment.loading.DataSegmentKiller
    for parameter 4 at io.druid.indexing.common.TaskToolboxFactory.<init>(TaskToolboxFactory.java:76)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:131)
  while locating io.druid.indexing.common.TaskToolboxFactory
    for parameter 0 at io.druid.indexing.overlord.ThreadPoolTaskRunner.<init>(ThreadPoolTaskRunner.java:71)
  at io.druid.cli.CliPeon$1.configure(CliPeon.java:157)
  while locating io.druid.indexing.overlord.ThreadPoolTaskRunner
  while locating io.druid.query.QuerySegmentWalker
    for parameter 3 at io.druid.server.QueryResource.<init>(QueryResource.java:90)
  while locating io.druid.server.QueryResource
Caused by: com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
at io.druid.storage.s3.S3StorageDruidModule.getRestS3Service(S3StorageDruidModule.java:107)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.google.inject.internal.ProviderMethod.get(ProviderMethod.java:105)
at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)
at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55)
at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:66)
at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
at com.google.inject.internal.InjectorImpl$3$1.call(InjectorImpl.java:1005)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058)
at com.google.inject.internal.InjectorImpl$3.get(InjectorImpl.java:1001)
at com.google.inject.spi.ProviderLookup$1.get(ProviderLookup.java:90)
at com.google.inject.spi.ProviderLookup$1.get(ProviderLookup.java:90)
at com.google.inject.multibindings.MapBinder$RealMapBinder$2.get(MapBinder.java:389)
at com.google.inject.multibindings.MapBinder$RealMapBinder$2.get(MapBinder.java:385)
at com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)
at com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55)
at com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:66)
at com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:65)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269)
at com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058)
at com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at io.druid.guice.LifecycleScope$1.get(LifecycleScope.java:49)
at com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
at com.google.inject.internal.FactoryProxy.get(FactoryProxy.java:56)
at com.google.inject.internal.SingleParameterInjector.inject(SingleParameterInjector.java:38)
at com.google.inject.internal.SingleParameterInjector.getAll(SingleParameterInjector.java:62)
at com.google.inject.internal.ConstructorInjector.provision(ConstructorInjector.java:107)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:88)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:269)
at com.google.inject.internal.InjectorImpl$3$1.call(InjectorImpl.java:1005)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1051)
at com.google.inject.internal.InjectorImpl$3.get(InjectorImpl.java:1001)
at com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1036)
at io.druid.guice.LifecycleModule$2.start(LifecycleModule.java:134)
at io.druid.cli.GuiceRunnable.initLifecycle(GuiceRunnable.java:71)
at io.druid.cli.CliPeon.run(CliPeon.java:211)
at io.druid.cli.Main.main(Main.java:91)

I don't use AWS at all so I am not sure why I am getting this issue. Must be an issue with the build. Here is my spec: 
{
  "type" : "index_hadoop",
  "spec" : {
    "dataSchema" : {
      "dataSource" : "api_created_trips_02",
      "parser" : {
        "type" : "string",
        "parseSpec" : {
          "format" : "json",
          "timestampSpec" : {
            "column" : "SecondsSinceEpoch",
            "format" : "posix"
          },
          "dimensionsSpec": {
            "dimensions": [
              "uuid",
              "country_id",
              "city_id",
              "request_at",
              "status"
            ],
            "dimensionExclusions": [],
            "spatialDimensions": []
          }
        }
      },
      "metricsSpec": [
        {
          "type": "doubleSum",
          "name": "duration",
          "fieldName": "duration"
        }
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "HOUR",
        "intervals" : ["2015-12-05/2015-12-06"]
      }
    },
    "ioConfig" : {
      "type" : "hadoop",
      "inputSpec" : {
        "type" : "static",
        "paths" : "hdfs://nameservice1/user/bwhite/api_created_trips_1day_json"
      }
    }
  }
}

Thanks!

Fangjin Yang

unread,
Feb 17, 2016, 3:56:36 PM2/17/16
to Druid User
Bwhite, are you still having this issue? There have been several posts with different problems and we're not sure which one to follow.
Reply all
Reply to author
Forward
0 new messages