Ingesting HDFS data with Druid

2,782 views
Skip to first unread message

Chadin Anuwattanaporn

unread,
Feb 18, 2014, 5:19:18 AM2/18/14
to druid-de...@googlegroups.com
Hi all, 

I'm getting started with Druid. I have successfully completed Druid Cluster tutorial and finished part 1 of Loading Your Data tutorial. I currently would like to configure Druid to ingest data from HDFS. I tried to look around, but it seems Firehose doesn't support ingesting HDFS. I looked at batch ingestion, but there was no specific instructions to set it up with HDFS.

Could you point me to the resources I need, or advise me on how to go about doing this?

Any clarifications needed, feel free to let me know, too.

Thank you!

Best, 
Chadin

Nishant Bangarwa

unread,
Feb 18, 2014, 7:01:56 AM2/18/14
to druid-de...@googlegroups.com
Hi Chadin, 

To ingest data from HDFS, you can set your pathSpec for the batch ingestion to the location where your input files are present in hdfs. 

"pathSpec": {
      "type": "static",
      "paths": "hdfs://<path-to-input-files>"
}

Additionally If you want to store your segments in hdfs you will need to add these configs to your nodes -  

druid.storage.type=hdfs
druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.<version>"]
druid.storage.storageDirectory=<hdfs-dir>

For more info you can have a look at - 


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/ffb4b037-262c-4043-8c9b-fe64e9d731ee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--

Chadin Anuwattanaporn

unread,
Feb 18, 2014, 10:45:02 PM2/18/14
to druid-de...@googlegroups.com
Hi Nishant, 

Thank you very much for your reply.

I started with the config file provided in the page, changed the config file to what you stated. The current config file is as follows: (changes highlighted in green)

{
  "dataSource": "chadin",
  "timestampSpec" : {
    "column": "ts",
    "format": "auto"
  },
  "dataSpec": {
    "format": "tsv",
    "columns": [
      "ts",
      "tag",
      "data"
    ],
    "dimensions": [
      "column_1",
      "column_2",
      "column_3"
    ]
  },
  "granularitySpec": {
    "type": "uniform",
    "intervals": [
      "<ISO8601 interval:http:\/\/en.wikipedia.org\/wiki\/ISO_8601#Time_intervals>"
    ],
    "gran": "day"
  },
  "pathSpec": {
    "type": "static",
    "inputPath": "hdfs:\/\/user\/ubuntu"
  },
  "rollupSpec": {
    "aggs": [
      {
        "type": "count",
        "name": "event_count"
      },
      {
        "type": "doubleSum",
        "fieldName": "column_4",
        "name": "revenue"
      },
      {
        "type": "longSum",
        "fieldName": "column_5",
        "name": "clicks"
      }
    ],
    "rollupGranularity": "minute"
  },
  "workingPath": "\/tmp\/path\/on\/hdfs",
  "segmentOutputPath": "hdfs:\/\/user\/ubuntu",
  "leaveIntermediate": "false",
  "partitionsSpec": {
    "targetPartitionSize": 5000000
  },
  "updaterJobSpec": {
    "type": "db",
    "connectURI": "jdbc:mysql:\/\/localhost:7980\/test_db",
    "user": "username",
    "password": "passmeup",
    "segmentTable": "segments"
  }
}

I ran the following command: 

curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/chadin_index_task.json localhost:8087/druid/indexer/v1/task

And the server returned a "server error", and the Overlord node produced the following error: 

Feb 19, 2014 3:38:49 AM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container
com.fasterxml.jackson.databind.JsonMappingException: Unexpected token (END_OBJECT), expected FIELD_NAME: missing property 'type' that is to contain type id  (for class io.druid.indexing.common.task.Task)
 at [Source: org.eclipse.jetty.server.HttpInput@1584da34; line: 1, column: 1208]
        at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)
        at com.fasterxml.jackson.databind.DeserializationContext.wrongTokenException(DeserializationContext.java:668)
        at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedUsingDefaultImpl(AsPropertyTypeDeserializer.java:141)
        at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:90)
        at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:106)
        at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:36)
        at com.fasterxml.jackson.databind.ObjectReader._bind(ObjectReader.java:1179)
        at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:635)
        at com.fasterxml.jackson.jaxrs.base.ProviderBase.readFrom(ProviderBase.java:587)
        at com.sun.jersey.spi.container.ContainerRequest.getEntity(ContainerRequest.java:488)
        at com.sun.jersey.server.impl.model.method.dispatch.EntityParamDispatchProvider$EntityInjectable.getValue(EntityParamDispatchProvider.java:123)
        at com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:278)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:268)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:180)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120)
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:132)
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:129)
        at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:206)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:129)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)
        at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:256)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at io.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:71)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:370)
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
        at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949)
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:651)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:744)

2014-02-19 03:38:49,533 WARN [qtp2088428874-25] org.eclipse.jetty.servlet.ServletHandler -
javax.servlet.ServletException: com.fasterxml.jackson.databind.JsonMappingException: Unexpected token (END_OBJECT), expected FIELD_NAME: missing property 'type' that is to contain type id  (for class io.druid.indexing.common.task.Task)
 at [Source: org.eclipse.jetty.server.HttpInput@1584da34; line: 1, column: 1208]
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:420)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)
        at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
        at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:278)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:268)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:180)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120)
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:132)
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:129)
        at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:206)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:129)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)
        at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:256)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at io.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:71)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:370)
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
        at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949)
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:651)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:744)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected token (END_OBJECT), expected FIELD_NAME: missing property 'type' that is to contain type id  (for class io.druid.indexing.common.task.Task)
 at [Source: org.eclipse.jetty.server.HttpInput@1584da34; line: 1, column: 1208]
        at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)
        at com.fasterxml.jackson.databind.DeserializationContext.wrongTokenException(DeserializationContext.java:668)
        at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedUsingDefaultImpl(AsPropertyTypeDeserializer.java:141)
        at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:90)
        at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:106)
        at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:36)
        at com.fasterxml.jackson.databind.ObjectReader._bind(ObjectReader.java:1179)
        at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:635)
        at com.fasterxml.jackson.jaxrs.base.ProviderBase.readFrom(ProviderBase.java:587)
        at com.sun.jersey.spi.container.ContainerRequest.getEntity(ContainerRequest.java:488)
        at com.sun.jersey.server.impl.model.method.dispatch.EntityParamDispatchProvider$EntityInjectable.getValue(EntityParamDispatchProvider.java:123)
        at com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)
        at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)
        at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
        at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
        at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
        at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)
        at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)
        at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)
        at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
        ... 39 more

Any idea on what might be causing the error? I read the error and it didn't give a clue to me... sorry if it's a really newbie question!

Thank you in advance for the help!

Nishant Bangarwa

unread,
Feb 19, 2014, 2:49:40 AM2/19/14
to druid-de...@googlegroups.com
Hi Chadin, 

The exception above means that the json is missing the type attribute for the task. 
For a HadoopIndexTask you can specify the type as follows - 
{
"type" : "index_hadoop", 
"config" : <hadoop-index-config (the config you sent in mail) }
}

Also there are some configurations which doesn't make sense in the context of the indexing service like workingPath, segmentOutputPath and updaterJobSpec since indexing service internally determines the based on the middlemanager config. you will need to remove those from your config file as well. 

More Info is present here  





For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 19, 2014, 4:57:51 AM2/19/14
to druid-de...@googlegroups.com
Hi Nishant, 

Thank you so much for your advice. It is really helping!

So I have modified the config file, removed the optional parts and also a few more as prompted by the server, and the final configuration file that went through looked like this: 

{
  "type": "index_hadoop",
  "config": {
  "dataSource": "the_data_source",
  "timestampSpec" : {
    "column": "ts",
    "format": "auto"
  },
  "dataSpec": {
    "format": "tsv",
    "columns": [
      "ts",
      "tag",
      "data"
    ],
    "dimensions": [
      "column_1",
      "column_2",
      "column_3"
    ]
  },
  "granularitySpec": {
    "type": "uniform",
    "intervals": [
      "2010/2020"
    ],
    "gran": "day"
  },
  "pathSpec": {
    "type": "static",
    "paths": "hdfs:\/\/localhost:8020\/user\/ubuntu"
  },
  "rollupSpec": {
    "aggs": [
      {
        "type": "count",
        "name": "event_count"
      },
      {
        "type": "doubleSum",
        "fieldName": "column_4",
        "name": "revenue"
      },
      {
        "type": "longSum",
        "fieldName": "column_5",
        "name": "clicks"
      }
    ],
    "rollupGranularity": "minute"
  }
  }
}

I successfully submitted the task to the overlord node, but the task failed to execute. Below is the error from the overlord node: 

2014-02-19 09:51:22,148 INFO [pool-6-thread-1] io.druid.indexing.overlord.ForkingTaskRunner - Logging task index_hadoop_the_data_source_2014-02-19T09:51:22.131Z output to: /tmp/persistent/index_hadoop_the_data_source_2014-02-19T09:51:22.131Z/086f7754-22a4-41eb-bd44-e35f69baacf5/log
2014-02-19 09:51:41,165 INFO [qtp1751358750-26] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_hadoop_the_data_source_2014-02-19T09:51:22.131Z]: LockTryAcquireAction{interval=2010-01-01T00:00:00.000Z/2020-01-01T00:00:00.000Z}
2014-02-19 09:51:41,165 INFO [qtp1751358750-26] io.druid.indexing.overlord.TaskLockbox - Task[index_hadoop_the_data_source_2014-02-19T09:51:22.131Z] already present in TaskLock[index_hadoop_the_data_source_2014-02-19T09:51:22.131Z]
2014-02-19 09:51:44,129 INFO [qtp1751358750-28] io.druid.indexing.common.actions.LocalTaskActionClient - Performing action for task[index_hadoop_the_data_source_2014-02-19T09:51:22.131Z]: LockListAction{}
2014-02-19 09:51:53,394 INFO [pool-6-thread-1] io.druid.indexing.overlord.ForkingTaskRunner - Process exited with status[0] for task: index_hadoop_the_data_source_2014-02-19T09:51:22.131Z
2014-02-19 09:51:53,395 INFO [pool-6-thread-1] io.druid.indexing.common.tasklogs.FileTaskLogs - Wrote task log to: log/index_hadoop_the_data_source_2014-02-19T09:51:22.131Z.log
2014-02-19 09:51:53,395 INFO [pool-6-thread-1] io.druid.indexing.overlord.ForkingTaskRunner - Removing temporary directory: /tmp/persistent/index_hadoop_the_data_source_2014-02-19T09:51:22.131Z/086f7754-22a4-41eb-bd44-e35f69baacf5
2014-02-19 09:51:53,396 INFO [pool-6-thread-1] io.druid.indexing.overlord.TaskQueue - Received FAILED status for task: index_hadoop_the_data_source_2014-02-19T09:51:22.131Z
2014-02-19 09:51:53,396 INFO [pool-6-thread-1] io.druid.indexing.overlord.ForkingTaskRunner - Ignoring request to cancel unknown task: index_hadoop_the_data_source_2014-02-19T09:51:22.131Z
2014-02-19 09:51:53,397 INFO [pool-6-thread-1] io.druid.indexing.overlord.HeapMemoryTaskStorage - Updating task index_hadoop_the_data_source_2014-02-19T09:51:22.131Z to status: TaskStatus{id=index_hadoop_the_data_source_2014-02-19T09:51:22.131Z, status=FAILED, duration=11516}
2014-02-19 09:51:53,397 INFO [pool-6-thread-1] io.druid.indexing.overlord.TaskLockbox - Removing task[index_hadoop_the_data_source_2014-02-19T09:51:22.131Z] from TaskLock[index_hadoop_the_data_source_2014-02-19T09:51:22.131Z]
2014-02-19 09:51:53,397 INFO [pool-6-thread-1] io.druid.indexing.overlord.TaskLockbox - TaskLock is now empty: TaskLock{groupId=index_hadoop_the_data_source_2014-02-19T09:51:22.131Z, dataSource=the_data_source, interval=2010-01-01T00:00:00.000Z/2020-01-01T00:00:00.000Z, version=2014-02-19T09:51:22.134Z}
2014-02-19 09:51:53,397 INFO [pool-6-thread-1] io.druid.indexing.overlord.TaskQueue - Task done: HadoopIndexTask{id=index_hadoop_the_data_source_2014-02-19T09:51:22.131Z, type=index_hadoop, dataSource=the_data_source}
2014-02-19 09:51:53,398 INFO [pool-6-thread-1] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-02-19T09:51:53.397Z","service":"overlord","host":"localhost:8087","metric":"indexer/time/run/millis","value":11516,"user2":"the_data_source","user3":"FAILED","user4":"index_hadoop"}]
2014-02-19 09:51:53,398 INFO [pool-6-thread-1] io.druid.indexing.overlord.TaskQueue - Task FAILED: HadoopIndexTask{id=index_hadoop_the_data_source_2014-02-19T09:51:22.131Z, type=index_hadoop, dataSource=the_data_source} (11516 run duration)

Below is the log file: (I skipped the first half off the log since it didn't seem to be any indicative of any error, just loading of classes and setting of default parameters)

2014-02-19 09:51:50,536 INFO [task-runner-0] io.druid.indexer.HadoopDruidIndexerConfig - Running with config:
{
  "dataSource" : "the_data_source",
  "timestampSpec" : {
    "column" : "ts",
    "format" : "auto"
  },
  "dataSpec" : {
    "format" : "tsv",
    "delimiter" : "\t",
    "columns" : [ "ts", "tag", "data" ],
    "dimensions" : [ "column_1", "column_2", "column_3" ],
    "spatialDimensions" : [ ]
  },
  "granularitySpec" : {
    "type" : "uniform",
    "gran" : "DAY",
    "intervals" : [ "2010-01-01T00:00:00.000Z/2020-01-01T00:00:00.000Z" ]
  },
  "pathSpec" : {
    "type" : "static",
    "paths" : "hdfs://localhost:8020/user/ubuntu"
  },
  "workingPath" : "/tmp/druid-indexing",
  "segmentOutputPath" : "file:///user/ubuntu/the_data_source",
  "version" : "2014-02-19T09:51:22.134Z",
  "partitionsSpec" : {
    "partitionDimension" : null,
    "targetPartitionSize" : -1,
    "maxPartitionSize" : -1,
    "assumeGrouped" : false
  },
  "leaveIntermediate" : false,
  "cleanupOnFailure" : true,
  "shardSpecs" : { },
  "overwriteFiles" : false,
  "rollupSpec" : {
    "aggs" : [ {
      "type" : "count",
      "name" : "event_count"
    }, {
      "type" : "doubleSum",
      "name" : "revenue",
      "fieldName" : "column_4"
    }, {
      "type" : "longSum",
      "name" : "clicks",
      "fieldName" : "column_5"
    } ],
    "rollupGranularity" : {
      "type" : "duration",
      "duration" : 60000,
      "origin" : "1970-01-01T00:00:00.000Z"
    },
    "rowFlushBoundary" : 500000
  },
  "updaterJobSpec" : null,
  "ignoreInvalidRows" : false
}
2014-02-19 09:51:50,559 INFO [task-runner-0] io.druid.server.initialization.PropertiesModule - Loading properties from runtime.properties
2014-02-19 09:51:50,676 INFO [task-runner-0] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.initialization.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, coordinates=[], localRepository='/home/ubuntu/.m2/repository', remoteRepositories=[http://repo1.maven.org/maven2/, https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local]}]
2014-02-19 09:51:50,676 INFO [task-runner-0] io.druid.indexing.common.task.HadoopIndexTask - Starting a hadoop index generator job...
2014-02-19 09:51:51,744 INFO [task-runner-0] io.druid.indexer.path.StaticPathSpec - Adding paths[hdfs://localhost:8020/user/ubuntu]
2014-02-19 09:51:52,802 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_the_data_source_2014-02-19T09:51:22.131Z, type=index_hadoop, dataSource=the_data_source}]
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:185)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:216)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:195)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at io.druid.indexer.HadoopDruidIndexerJob.ensurePaths(HadoopDruidIndexerJob.java:152)
        at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:73)
        at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexTaskInnerProcessing.runTask(HadoopIndexTask.java:228)
        ... 11 more
Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
        at org.apache.hadoop.ipc.Client.call(Client.java:1070)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at com.sun.proxy.$Proxy156.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPaths(FileInputFormat.java:337)
        at io.druid.indexer.path.StaticPathSpec.addInputPaths(StaticPathSpec.java:58)
        at io.druid.indexer.HadoopDruidIndexerConfig.addInputPaths(HadoopDruidIndexerConfig.java:444)
        at io.druid.indexer.HadoopDruidIndexerJob.ensurePaths(HadoopDruidIndexerJob.java:149)
        ... 13 more
2014-02-19 09:51:52,817 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_the_data_source_2014-02-19T09:51:22.131Z",
  "status" : "FAILED",
  "duration" : 11516
}
2014-02-19 09:51:52,820 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@37efa9d4].
2014-02-19 09:51:52,820 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig$$EnhancerByCGLIB$$2d89c3f1@24436a28]
2014-02-19 09:51:52,821 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/localhost:8088]
2014-02-19 09:51:52,833 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.SingleDataSegmentAnnouncer@331e37ed].
2014-02-19 09:51:52,833 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.SingleDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig$$EnhancerByCGLIB$$2d89c3f1@24436a28]
2014-02-19 09:51:52,834 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/localhost:8088]
2014-02-19 09:51:52,834 ERROR [main] io.druid.curator.announcement.Announcer - Path[/druid/announcements/localhost:8088] not announced, cannot unannounce.
2014-02-19 09:51:52,834 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.worker.executor.ExecutorLifecycle.stop()] on object[io.druid.indexing.worker.executor.ExecutorLifecycle@33ac7c40].
2014-02-19 09:51:52,835 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - Closing inventory cache for localhost:8088. Also removing listeners.
2014-02-19 09:51:52,835 INFO [ServerInventoryView-0] io.druid.client.SingleServerInventoryView - Server Disappeared[DruidServerMetadata{name='localhost:8088', host='localhost:8088', maxSize=0, tier='_default_tier', type='indexer-executor'}]
2014-02-19 09:51:52,940 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - stopped o.e.j.s.ServletContextHandler{/,null}
2014-02-19 09:51:52,940 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - stopped o.e.j.s.ServletContextHandler{/,file:/}
2014-02-19 09:51:53,029 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.overlord.ThreadPoolTaskRunner.stop()] on object[io.druid.indexing.overlord.ThreadPoolTaskRunner@3df86ed1].
2014-02-19 09:51:53,030 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.client.ServerInventoryView.stop() throws java.io.IOException] on object[io.druid.client.SingleServerInventoryView@35915368].
2014-02-19 09:51:53,030 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.announcement.Announcer.stop()] on object[io.druid.curator.announcement.Announcer@2fed0cb8].
2014-02-19 09:51:53,030 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@1520a43d].
2014-02-19 09:51:53,031 INFO [main] io.druid.curator.CuratorModule - Stopping Curator
2014-02-19 09:51:53,043 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x1442481192d005d closed
2014-02-19 09:51:53,043 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.http.client.HttpClient.stop()] on object[com.metamx.http.client.HttpClient@682cd837].
2014-02-19 09:51:53,043 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down
2014-02-19 09:51:53,057 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.metrics.MonitorScheduler.stop()] on object[com.metamx.metrics.MonitorScheduler@777387a6].
2014-02-19 09:51:53,060 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.service.ServiceEmitter@11fcf9cb].
2014-02-19 09:51:53,060 INFO [main] com.metamx.emitter.core.LoggingEmitter - Close: started [false]
2014-02-19 09:51:53,060 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.core.LoggingEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.core.LoggingEmitter@74ec5dce].

Could you help me on what the error is, or is there a way to get more specific error messages? 'cos I was hoping for more specific error messages so I could rectify the problem, but seems like the log just told me the task failed.

Thank you so much!

Nishant Bangarwa

unread,
Feb 19, 2014, 5:36:58 AM2/19/14
to druid-de...@googlegroups.com
Hi Chadin, 

The error you are getting ( org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4)usually comes due to hadoop version mismatch. The default version of hadoop druid is compiled with is 1.0.3. 
Which version of hadoop you are using ?
You can either change your hadoop installation version to 1.0.3 or modify the hadoop version in druid pom.xml and recompile druid source with modified hadoop version to fix it.

 



For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 20, 2014, 12:31:34 AM2/20/14
to druid-de...@googlegroups.com
Hi Nishant, 

Thank you. I have decided to re-compile with changing Hadoop version to 2.0.0, which is my Hadoop version, and the compile failed. Below are some excerpts: 

2014-02-20 05:26:28,742 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T23:00:00.000Z_2012-01-02T00:00:00.000Z_2014-02-20T05:26:28.738Z}
2014-02-20 05:26:28,743 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T22:00:00.000Z_2012-01-01T23:00:00.000Z_2014-02-20T05:26:28.738Z}
2014-02-20 05:26:28,743 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T21:00:00.000Z_2012-01-01T22:00:00.000Z_2014-02-20T05:26:28.738Z}
2014-02-20 05:26:28,743 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T20:00:00.000Z_2012-01-01T21:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,744 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T19:00:00.000Z_2012-01-01T20:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,744 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T18:00:00.000Z_2012-01-01T19:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,744 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T17:00:00.000Z_2012-01-01T18:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,744 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T16:00:00.000Z_2012-01-01T17:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,745 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T15:00:00.000Z_2012-01-01T16:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,745 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T14:00:00.000Z_2012-01-01T15:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,745 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T13:00:00.000Z_2012-01-01T14:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,745 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T12:00:00.000Z_2012-01-01T13:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,746 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T11:00:00.000Z_2012-01-01T12:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,746 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T10:00:00.000Z_2012-01-01T11:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,746 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T09:00:00.000Z_2012-01-01T10:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,747 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T08:00:00.000Z_2012-01-01T09:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,747 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T07:00:00.000Z_2012-01-01T08:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,747 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T06:00:00.000Z_2012-01-01T07:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,747 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T05:00:00.000Z_2012-01-01T06:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,748 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T04:00:00.000Z_2012-01-01T05:00:00.000Z_2014-02-20T05:26:28.737Z}
2014-02-20 05:26:28,748 ERROR [main] io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner - Unable to find a matching rule for dataSource[test]: {class=io.druid.server.coordinator.helper.DruidCoordinatorRuleRunner, segment=test_2012-01-01T03:00:00.000Z_2012-01-01T04:00:00.000Z_2014-02-20T05:26:28.737Z}

...

2014-02-20 05:26:32,453 INFO [main] io.druid.initialization.Initialization - Adding local module[class io.druid.initialization.InitializationTest$TestDruidModule]
2014-02-20 05:26:33,098 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.initialization.ExtensionsConfig] from props[druid.extensions.] as [ExtensionsConfig{searchCurrentClassloader=true, coordinates=[], localRepository='/home/ubuntu/.m2/repository', remoteRepositories=[http://repo1.maven.org/maven2/, https://metamx.artifactoryonline.com/metamx/pub-libs-releases-local]}]
2014-02-20 05:26:33,100 INFO [main] io.druid.initialization.Initialization - Adding local module[class io.druid.initialization.InitializationTest$TestDruidModule]
2014-02-20 05:26:34,416 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class com.metamx.emitter.core.LoggingEmitterConfig] from props[druid.emitter.logging.] as [LoggingEmitterConfig{loggerClass='com.metamx.emitter.core.LoggingEmitter', logLevel='info'}]
2014-02-20 05:26:34,432 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.metrics.DruidMonitorSchedulerConfig] from props[druid.monitoring.] as [io.druid.server.metrics.DruidMonitorSchedulerConfig@1169d3a]
2014-02-20 05:26:34,448 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.metrics.MonitorsConfig] from props[druid.monitoring.] as [MonitorsConfig{monitors=[]}]
2014-02-20 05:26:34,652 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.DruidNode] from props[druid.] as [DruidNode{serviceName='test-service', host='test-host:8080', port=8080}]
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.152 sec

Results :

Tests in error:
  testRun(io.druid.client.client.BatchServerInventoryViewTest): position (0) must be less than the number of elements that remained (0)

Tests run: 101, Failures: 0, Errors: 1, Skipped: 0

[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] druid ............................................. SUCCESS [0.002s]
[INFO] druid-common ...................................... SUCCESS [33.568s]
[INFO] druid-processing .................................. SUCCESS [1:46.779s]
[INFO] druid-server ...................................... FAILURE [1:15.628s]
[INFO] druid-examples .................................... SKIPPED
[INFO] druid-indexing-hadoop ............................. SKIPPED
[INFO] druid-indexing-service ............................ SKIPPED
[INFO] druid-services .................................... SKIPPED
[INFO] druid-cassandra-storage ........................... SKIPPED
[INFO] druid-hdfs-storage ................................ SKIPPED
[INFO] druid-s3-extensions ............................... SKIPPED
[INFO] druid-kafka-seven ................................. SKIPPED
[INFO] druid-kafka-eight ................................. SKIPPED
[INFO] druid-rabbitmq .................................... SKIPPED
[INFO] druid-hll ......................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3:40.860s
[INFO] Finished at: Thu Feb 20 05:26:35 UTC 2014
[INFO] Final Memory: 36M/106M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12.2:test (default-test) on project druid-server: There are test failures.
[ERROR]
[ERROR] Please refer to /home/ubuntu/git/druid/server/target/surefire-reports for the individual test results.
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :druid-server

From the look of it, either I misconfigured in pom.xml, or the test code is trying to reach the Hadoop server so I need to configure the Hadoop server. Am I right, is it something else?

Let me know if you need to full compile log.

Thank you so much!

Fangjin Yang

unread,
Feb 20, 2014, 12:39:36 AM2/20/14
to druid-de...@googlegroups.com
Hi Chadin,

What version are you running? Are those logs about missing rules from your coordinator?
If so, can you paste the output of <coordinator_ip>:<port>/info/rules?

As for the failed batch inventory view test, can you try and run things again? I wonder if there is a strange non deterministic case there. I haven't seen that test fail before though.

-- FJ


Chadin Anuwattanaporn

unread,
Feb 20, 2014, 2:41:23 AM2/20/14
to druid-de...@googlegroups.com
Hi FJ, 

Here it is: 

{"_default":[{"period":"P5000Y","replicants":2,"tier":"_default_tier","type":"loadByPeriod"}]}

I ran the compilation again and attached is the log file.

Note that when I changed the version of Hadoop back to 1.0.3 the compilation and test went just fine.
log.log

Nishant Bangarwa

unread,
Feb 20, 2014, 3:20:17 AM2/20/14
to druid-de...@googlegroups.com
Hi Chadin, 
In the attached logs the error i can see is "Could not find artifact org.apache.hadoop:hadoop-core:jar:2.0.0 " 
this is failing since there is no hadoop-core mvn artifact for 2.0.0 version, 

For hadoop 2.0, you can switch the dependency to hadoop-client with the version of your hadoop installation
Attaching a patch that i used to compile druid with hadoop 2.2.0. Hope it helps.




For more options, visit https://groups.google.com/groups/opt_out.
hadoop-2.2.0.patch

Chadin Anuwattanaporn

unread,
Feb 20, 2014, 3:55:36 AM2/20/14
to druid-de...@googlegroups.com
Hi Nishant, 

I thought the same too! While I was waiting, I saw the error in the log file and proceeded to rectify it. I am now able to successfully compile Druid with hadoop-client 2.2.0.

Now, when I ran the examples ("run_example_server.sh") they worked fine, but when I tried to start the overlord node: 

java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:config/overlord io.druid.cli.Main server overlord

When I am in main directory (the root of the git repo) I couldn't find the class. I saw "Main.class" in services/target/classes/io/druid/cli so I "cd"ed down until I arrived there, but couldn't run the command to start the overlord node.

Where am I supposed to go to start the overlord node from the compiled repo?

Sorry if this is a really newbie question...

Nishant Bangarwa

unread,
Feb 20, 2014, 4:59:54 AM2/20/14
to druid-de...@googlegroups.com
Hi Chadin, 

After compiling druid, you should be able to see a jar named "druid-services-<VERSION>-selfcontained.jar" in the directory services/target from root of git. 
try adding that jar to the classpath. 



For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 20, 2014, 11:13:07 PM2/20/14
to druid-de...@googlegroups.com
Hi Nishant, 

Yup, I was able to locate the jar file. After including it in the class path I was able to start up the overlord node using the following command: 

java -Xmx2g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath services/target/*:examples/config/overlord io.druid.cli.Main server overlord

So I submitted the same task, same config as above (except adding the correct Hadoop coordinates) successfully. However, the task failed to execute with the following error: 

2014-02-21 03:30:56,923 WARN [task-runner-0] org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2014-02-21 03:30:56,965 INFO [task-runner-0] io.druid.indexer.path.StaticPathSpec - Adding paths[hdfs://localhost:8020/user/ubuntu]
2014-02-21 03:30:58,805 ERROR [task-runner-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_the_data_source_2014-02-21T03:29:46.059Z, type=index_hadoop, dataSource=the_data_source}]
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:188)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:216)
        at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:195)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$CreateSnapshotRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
        at java.lang.Class.privateGetPublicMethods(Class.java:2651)
        at java.lang.Class.privateGetPublicMethods(Class.java:2661)
        at java.lang.Class.getMethods(Class.java:1467)
        at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:426)
        at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:323)
        at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:636)
        at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:722)
        at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:92)
        at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:537)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:328)
        at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:235)
        at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:139)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:510)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:453)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:136)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2433)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:466)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPaths(FileInputFormat.java:431)
        at io.druid.indexer.path.StaticPathSpec.addInputPaths(StaticPathSpec.java:58)
        at io.druid.indexer.HadoopDruidIndexerConfig.addInputPaths(HadoopDruidIndexerConfig.java:446)
        at io.druid.indexer.JobHelper.ensurePaths(JobHelper.java:123)
        at io.druid.indexer.HadoopDruidDetermineConfigurationJob.run(HadoopDruidDetermineConfigurationJob.java:54)
        at io.druid.indexing.common.task.HadoopIndexTask$HadoopDetermineConfigInnerProcessing.runTask(HadoopIndexTask.java:285)
        ... 11 more
2014-02-21 03:30:58,812 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_hadoop_the_data_source_2014-02-21T03:29:46.059Z",
  "status" : "FAILED",
  "duration" : 54169
}
2014-02-21 03:30:58,816 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.BatchDataSegmentAnnouncer@6c5c4442].
2014-02-21 03:30:58,817 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.BatchDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig$$EnhancerByCGLIB$$63fccae@2b28e3f]
2014-02-21 03:30:58,817 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/localhost:8088]
2014-02-21 03:30:58,823 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - Closing inventory cache for localhost:8088. Also removing listeners.
2014-02-21 03:30:58,823 INFO [ServerInventoryView-0] io.druid.client.SingleServerInventoryView - Server Disappeared[DruidServerMetadata{name='localhost:8088', host='localhost:8088', maxSize=0, tier='_default_tier', type='indexer-executor', priority='0'}]
2014-02-21 03:30:58,824 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.server.coordination.AbstractDataSegmentAnnouncer.stop()] on object[io.druid.server.coordination.SingleDataSegmentAnnouncer@846b70e].
2014-02-21 03:30:58,824 INFO [main] io.druid.server.coordination.AbstractDataSegmentAnnouncer - Stopping class io.druid.server.coordination.SingleDataSegmentAnnouncer with config[io.druid.server.initialization.ZkPathsConfig$$EnhancerByCGLIB$$63fccae@2b28e3f]
2014-02-21 03:30:58,824 INFO [main] io.druid.curator.announcement.Announcer - unannouncing [/druid/announcements/localhost:8088]
2014-02-21 03:30:58,824 ERROR [main] io.druid.curator.announcement.Announcer - Path[/druid/announcements/localhost:8088] not announced, cannot unannounce.
2014-02-21 03:30:58,825 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.worker.executor.ExecutorLifecycle.stop()] on object[io.druid.indexing.worker.executor.ExecutorLifecycle@27c01ad2].
2014-02-21 03:30:58,837 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - stopped o.e.j.s.ServletContextHandler{/,null}
2014-02-21 03:30:58,838 INFO [main] org.eclipse.jetty.server.handler.ContextHandler - stopped o.e.j.s.ServletContextHandler{/,file:/}
2014-02-21 03:30:58,890 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.indexing.overlord.ThreadPoolTaskRunner.stop()] on object[io.druid.indexing.overlord.ThreadPoolTaskRunner@616a6b9b].
2014-02-21 03:30:58,890 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.client.ServerInventoryView.stop() throws java.io.IOException] on object[io.druid.client.SingleServerInventoryView@122b7d0a].
2014-02-21 03:30:58,891 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.announcement.Announcer.stop()] on object[io.druid.curator.announcement.Announcer@2b0f1190].
2014-02-21 03:30:58,891 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@f6e9d5d].
2014-02-21 03:30:58,892 INFO [main] io.druid.curator.CuratorModule - Stopping Curator
2014-02-21 03:30:58,897 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn - EventThread shut down
2014-02-21 03:30:58,897 INFO [main] org.apache.zookeeper.ZooKeeper - Session: 0x1442481192d0066 closed
2014-02-21 03:30:58,897 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.http.client.HttpClient.stop()] on object[com.metamx.http.client.HttpClient@1763fb5b].
2014-02-21 03:30:58,907 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.metrics.MonitorScheduler.stop()] on object[com.metamx.metrics.MonitorScheduler@4cf99f26].
2014-02-21 03:30:58,907 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.service.ServiceEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.service.ServiceEmitter@6ed2bb19].
2014-02-21 03:30:58,907 INFO [main] com.metamx.emitter.core.LoggingEmitter - Close: started [false]
2014-02-21 03:30:58,907 INFO [main] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void com.metamx.emitter.core.LoggingEmitter.close() throws java.io.IOException] on object[com.metamx.emitter.core.LoggingEmitter@253e41fe].

Could you help me with the above?

Thank you!

Chadin Anuwattanaporn

unread,
Feb 20, 2014, 11:45:36 PM2/20/14
to druid-de...@googlegroups.com
Just for your info, in case it helps, this is what our Hadoop core-site.xml file looks like: 

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
</configuration>
ubuntu@domU-12-31-39-0B-60-6C:~/git/druid$ cat /usr/lib/hadoop-0.20-mapreduce/example-confs/conf.pseudo/core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:8020</value>
  </property>

  <property>
     <name>hadoop.tmp.dir</name>
     <value>/var/lib/hadoop-0.20/cache/${user.name}</value>
  </property>

  <!-- OOZIE proxy user setting -->
  <property>
    <name>hadoop.proxyuser.oozie.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.oozie.groups</name>
    <value>*</value>
  </property>

</configuration>
...

Nishant Bangarwa

unread,
Feb 21, 2014, 12:16:10 AM2/21/14
to druid-de...@googlegroups.com
Hi Chadin, 
It seems to be caused by a version conflict in google protobuf library, 
Hadoop 2 upgraded its version of protobuf library to 2.5 in HADOOP-9845
Can you try by recompiling druid with protobuf 2.5 ? 


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 21, 2014, 12:58:22 AM2/21/14
to druid-de...@googlegroups.com
Hi Nishant, 

I made the required changes, and the compilation failed. Log as attached.
        at io.druid.server.http.RedirectF</u
...
log2.log
Message has been deleted

Chadin Anuwattanaporn

unread,
Feb 21, 2014, 3:45:22 AM2/21/14
to druid-de...@googlegroups.com
Hi Nishant, 

Some extra info, in case it helps. Only one test failed as the log file indicated, and I have attached the full report of that failed test below. Seems to indicate that the new protobuf does not work with Druid somehow…?
...
TEST-io.druid.data.input.ProtoBufInputRowParserTest.xml

Nishant Bangarwa

unread,
Feb 21, 2014, 8:41:07 AM2/21/14
to druid-de...@googlegroups.com
Hi Chadin, 
the test is failing due to an incompatible change in protobuf library 

Druid uses protobuf in ProtoBufInputRowParser, on a quick look, this change does not seem to effect the funstionality of the parser and the parser seems to be compatible with protobuf 2.5.

the test can be fixed by overriding the getUnknownFields in ProtoTestEventWrapper class as in the below patch -    

diff --git a/processing/src/test/java/io/druid/data/input/ProtoTestEventWrapper.java b/processing/src/test/java/io/druid/data/input/ProtoTestEventWrapper.java
index 965859f..88d86a3 100644
--- a/processing/src/test/java/io/druid/data/input/ProtoTestEventWrapper.java
+++ b/processing/src/test/java/io/druid/data/input/ProtoTestEventWrapper.java
@@ -23,6 +23,9 @@
 package io.druid.data.input;

 import com.google.protobuf.AbstractMessage;
+import com.google.protobuf.UnknownFieldSet;
+
+import java.util.Collections;

 public final class ProtoTestEventWrapper {
   private ProtoTestEventWrapper() {}
@@ -414,7 +417,13 @@ public final class ProtoTestEventWrapper {
       memoizedSerializedSize = size;
       return size;
     }
-
+
+    @Override
+    public UnknownFieldSet getUnknownFields()
+    {
+      return UnknownFieldSet.getDefaultInstance();
+    }
+
     private static final long serialVersionUID = 0L;
     @java.lang.Override
     protected java.lang.Object writeReplace()



--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 23, 2014, 11:44:29 PM2/23/14
to druid-de...@googlegroups.com
Hi Nishant, 

I made the modifications above, successfully compiled and submitted the task. I'm happy to report that the previous error has disappeared!

The task ran, but once again failed. The log is as attached. I'm not sure if it's a problem with the task JSON or connection to HDFS, or something else. Could you help me?

Thank you so much for the help all these days!
...
index_hadoop_the_data_source_2014-02-24T03:57:37.157Z.log

Nishant Bangarwa

unread,
Feb 24, 2014, 1:24:17 AM2/24/14
to druid-de...@googlegroups.com
Hi Chadin, 
This task has failed due to this exception 

Caused by: java.io.IOException: Failed on local exception: com.google.protobuf.InvalidProtocolBufferException: Protocol message contained an invalid tag (zero).; Host Details : local host is: "domU-12-31-39-0B-60-6C/10.214.103.154"; destination host is: "localhost":8020; 
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)

It seems like some version mismatch between your hadoop installation and the version you setup with. 
The task seems to be using hadoop 2.2.0 and protobuf 2.5.0. 

Can you verify that the version of your installation is same as these and you are adding correct hadoop config files in druid classpath?



--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Nishant Bangarwa

unread,
Feb 24, 2014, 1:38:25 AM2/24/14
to druid-de...@googlegroups.com
Hi Chadin, 

Druid has built in support for local, S3, HDFS, Cassandra and Riak-CS. 
Druid is designed for extensions and its possible to add a new module for any custom deep storage 

More Info on extending Druid is here - 



On Fri, Feb 21, 2014 at 11:37 AM, Chadin Anuwattanaporn <cha...@sogamo.com> wrote:
Hi Nishant,

Not related but I have a question: what are other possible data sources that Druid can ingest, other than local files, S3 and HDFS?
 
...

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 24, 2014, 3:45:27 AM2/24/14
to druid-de...@googlegroups.com
Hi Nishant, 

Thank you for your response.

I did assume (perhaps incorrectly) that Hadoop 2.2.0 client would work with Hadoop 2.0.0, since the hadoop-client package for 2.0.0 had an alpha tag to it, so I used it. Didn't think it'd cause problems.

I changed the version to 2.0.0-alpha for hadoop-client and Druid compiled successfully. It now ran the task and ended up with the attached error in log file. It looks similar to the IPC error that we got quite a while back due to version mismatch, though.

Could you help me out?

Thank you!
...
index_hadoop_the_data_source_2014-02-24T08:41:06.080Z.log

Nishant Bangarwa

unread,
Feb 24, 2014, 4:22:01 AM2/24/14
to druid-de...@googlegroups.com
ok, so it was failing due to version mismatch between your hadoop installation and druid config, 

Now its failing due to protobuf version mismatch,
hadoop 2.0.0-alpha is compatible with protobuf 2.4.0a as shown here - 
you need to change the protobuf version back to 2.4.0a. 



--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 24, 2014, 5:04:04 AM2/24/14
to druid-de...@googlegroups.com
Hi Nishant, 

I changed the protobuf version, compiling successfully and ran the task.

The next error that I got is as attached in the log.

It seems to be a data formatting issue, unless you'd suggest otherwise, so let me show you the data format in the file and the config file.

This is what our data looks like each line: 

2014-02-13T12:28:21Z    track.0d26fd6e62fe4bb0ae92e2da9e834191.41192    {"action":"Sign in","api_key":"0d26fd6e62fe4bb0ae92e2da9e834191","player_id":"41192","timestamp":1392294501700}

The file is a tab-separated file (TSV) with three columns: ts (timestamp), tag, and data (a JSON object).

The timestamp format is as above, and currently in the config file I use auto. Might be a good idea to change to a specified format.

The config file (which has remained unchanged as far as I remember) is reproduced below: 

,
  "hadoopCoordinates" : "org.apache.hadoop:hadoop-client:2.0.0-alpha"
}

The JSON object does not have a structured format, meaning the key can vary from record to record.

My primary use case is to be able to filter and aggregate arbitrary keys in JSON file. For example, I might want to find out the number of records with "action" key value of "Sign in" aggregated by "timestamp" key in the JSON object. (Actually the "ts" column and "tag" column have no practical use for now, and the timestamp we take is the one inside the JSON object, which is a UNIX timestamp.)

As such, how should I configure the configuration file? Or is Druid appropriate for my use case?

Or is the problem here nothing to do with data format and/or config file in the first place?
...
index_hadoop_the_data_source_2014-02-24T09:51:24.322Z.log

Nishant Bangarwa

unread,
Feb 24, 2014, 5:35:50 AM2/24/14
to druid-de...@googlegroups.com
From the task logs the exception for failure is - 
Caused by: java.lang.ClassNotFoundException: org.apache.avro.io.DatumReader

You seem to be hitting HADOOP-8466 fixed in 2.0.2-alpha. 
you can work around this by adding avro jar to the classpath or as an explicit dependency. 

BTW, is there any specific reason for working with 2.0.0-alpha instead of a stable release ?
If you can change your hadoop installation version, I would recommend working with a stable hadoop release.

you can also discuss this over IRC. #druid-dev on irc.freenode.net


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 24, 2014, 10:34:42 PM2/24/14
to druid-de...@googlegroups.com
Hi Nishant, 

I'm working with Cloudera Hadoop, which is version 2.0.0. "hadoop version" gets me the below: 

Hadoop 2.0.0-cdh4.5.0
Subversion git://ubuntu64-12-04-mk1/var/lib/jenkins/workspace/generic-package-ubuntu64-12-04/CDH4.5.0-Packaging-Hadoop-2013-11-20_14-31-53/hadoop-2.0.0+1518-1.cdh4.5.0.p0.24~precise/src/hadoop-common-project/hadoop-common -r 8e266e052e423af592871e2dfe09d54c03f6a0e8
Compiled by jenkins on Wed Nov 20 15:10:35 PST 2013
From source with checksum 9848b0f85b461913ed63fa19c2b79ccc
This command was run using /usr/lib/hadoop/hadoop-common-2.0.0-cdh4.5.0.jar

I would love to change to work with a more stable release of Hadoop but don't know of a way to upgrade at the moment.

So if I get it correct, that means I'd have to add avro jar to the Hadoop class path, and recompile Hadoop? Sounds like a long way. Is it possible to just switch the Hadoop client to 2.0.2-alpha? Will that cause a version mismatch?

And I'm on IRC. Nickname "chadin_anuwattan".
...

Nishant Bangarwa

unread,
Feb 25, 2014, 4:57:59 AM2/25/14
to druid-de...@googlegroups.com
Hi Chadin, 
you need to use the same client version as your installation to avoid version conflicts. 
switch your hadoop-client version to 2.0.0-cdh4.5.0 instead of 2.0.0-alpha

You might also need to add cdh repos to be able to use cdh artifacts. 



--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 26, 2014, 1:49:35 AM2/26/14
to druid-de...@googlegroups.com
Hi Nishant, 

Please ignore the direct message to you. I clicked the button accidentally, and also the information is out-of-date, so please refer to this post instead.

I made the changes that you recommended, and compiled successfully, and submitted the task. I ran into an error executing "mkdir" command, but that was solved by created "/user" directory in the file system and giving its ownership to the correct user.

The latest error is as in the log file. Could you help me with what the problem might be?

I've cut down greatly on non-error parts of the file for succinctness.

Thank you!
...

Nishant Bangarwa

unread,
Feb 26, 2014, 2:27:23 AM2/26/14
to druid-de...@googlegroups.com
did you forgot to attach the log file ? 
Can you show that to me on IRC or attach it in mail.



For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 27, 2014, 12:45:33 AM2/27/14
to druid-de...@googlegroups.com
Hi Nishant, 

I added to the class path, and now the task ran successfully!

Hence, to verify that the task processed correctly, I ran the following simple query: 

{
  "queryType": "groupBy",
  "dataSource": "the_data_source",
  "granularity": "all",
  "dimensions": [],
  "aggregations": [
    { "type": "count", "name": "rows" }
  ],
  "postAggregations": [],
  "intervals": ["2010-01-01/2020-01-01"]
}

The results, however, was an empty array.

Is the query above correct?
...

Nishant Bangarwa

unread,
Feb 27, 2014, 2:57:18 AM2/27/14
to druid-de...@googlegroups.com
Hi Chadin, 

your query seems fine, druid can return empty results if the data segment is not loaded yet. 
when a realtime task completes it create an immutable segment and create a new entry in the Druid segment metadata table in mysql. It also upload this segment to deep storage. Druid coordinator nodes watch mysql for new segment metadata entries and assign these segments to be downloaded by historical nodes.
Can you verify from the coordinator and historical nodes that segments were properly handed over to historical nodes? 




To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 27, 2014, 4:39:47 AM2/27/14
to druid-de...@googlegroups.com
Hi Nishant, 

I shut down all nodes, restarted them, submitted the task to load data, historical node looked fine, submitted the query and it works! Thank you.

Now I'm trying a search query. Basically I'm trying to find entries that match a specific pattern in JSON, so I submitted a searchQuery like this: 

{
  "queryType": "search",
  "dataSource": "the_data_source",
  "granularity": "all",
  "query": {
    "type": "insensitive_contains",
    "value": "action"
  },
  "sort" : {
    "type": "lexicographic"
  },
  "intervals": [
    "2010-01-01/2020-01-01"
  ]
}

I assume that this means searching in all columns in data and returning any element that contains the string "action".

The query turned up empty. Is my understanding correct? Or is there something else I need to do?
...

Nishant Bangarwa

unread,
Feb 27, 2014, 5:44:58 AM2/27/14
to druid-de...@googlegroups.com
Hi Chadin, 
your understanding of the query is correct, above query will check for all dimensions that contain "action".
If its not returning the results as you expected check for any exceptions in the logs. 



--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Chadin Anuwattanaporn

unread,
Feb 27, 2014, 6:04:52 AM2/27/14
to druid-de...@googlegroups.com
Hi Nishant, 

I re-loaded the data, correcting the dimensions parameter in the config, and I'm now able to see results show up in search query.

Thank you! I'm trying out different queries in Druid now. I'll set up a separate topic and ask if I have query-related questions.
Hi Nishant, 
Hi Nishant, 

Hi Nishant, 

  },
  "rollupSpec": {
    "aggs": [
      {</d
...

Nishant Bangarwa

unread,
Feb 27, 2014, 6:18:10 AM2/27/14
to druid-de...@googlegroups.com
great to know that things are working now :)


--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.
Reply all
Reply to author
Forward
0 new messages