Ingesting HDFS data with Druid

Chadin Anuwattanaporn

unread,

Feb 18, 2014, 5:19:18 AM2/18/14

to druid-de...@googlegroups.com

Hi all,

I'm getting started with Druid. I have successfully completed Druid Cluster tutorial and finished part 1 of Loading Your Data tutorial. I currently would like to configure Druid to ingest data from HDFS. I tried to look around, but it seems Firehose doesn't support ingesting HDFS. I looked at batch ingestion, but there was no specific instructions to set it up with HDFS.

Could you point me to the resources I need, or advise me on how to go about doing this?

Any clarifications needed, feel free to let me know, too.

Thank you!

Best,

Chadin

Nishant Bangarwa

unread,

Feb 18, 2014, 7:01:56 AM2/18/14

to druid-de...@googlegroups.com

Hi Chadin,

To ingest data from HDFS, you can set your pathSpec for the batch ingestion to the location where your input files are present in hdfs.

"pathSpec": {

"type": "static",

"paths": "hdfs://<path-to-input-files>"

}

Additionally If you want to store your segments in hdfs you will need to add these configs to your nodes -

druid.storage.type=hdfs

druid.extensions.coordinates=["io.druid.extensions:druid-hdfs-storage:0.6.<version>"]

druid.storage.storageDirectory=<hdfs-dir>

For more info you can have a look at -

http://druid.io/docs/latest/Batch-ingestion.html

http://druid.io/docs/latest/Configuration.html

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/ffb4b037-262c-4043-8c9b-fe64e9d731ee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--

Nishant

Software Engineer

|

METAMARKETS

m	+91-9729200044

nishant....@metamarkets.com

Chadin Anuwattanaporn

unread,

Feb 18, 2014, 10:45:02 PM2/18/14

to druid-de...@googlegroups.com

Hi Nishant,

Thank you very much for your reply.

I started with the config file provided in the page, changed the config file to what you stated. The current config file is as follows: (changes highlighted in green)

{

"dataSource": "chadin",

"timestampSpec" : {

"column": "ts",

"format": "auto"

},

"dataSpec": {

"format": "tsv",

"columns": [

"ts",

"tag",

"data"

],

"dimensions": [

"column_1",

"column_2",

"column_3"

]

},

"granularitySpec": {

"type": "uniform",

"intervals": [

"<ISO8601 interval:http:\/\/en.wikipedia.org\/wiki\/ISO_8601#Time_intervals>"

],

"gran": "day"

},

"pathSpec": {

"type": "static",

"inputPath": "hdfs:\/\/user\/ubuntu"

},

"rollupSpec": {

"aggs": [

{

"type": "count",

"name": "event_count"

},

{

"type": "doubleSum",

"fieldName": "column_4",

"name": "revenue"

},

{

"type": "longSum",

"fieldName": "column_5",

"name": "clicks"

}

],

"rollupGranularity": "minute"

},

"workingPath": "\/tmp\/path\/on\/hdfs",

"segmentOutputPath": "hdfs:\/\/user\/ubuntu",

"leaveIntermediate": "false",

"partitionsSpec": {

"targetPartitionSize": 5000000

},

"updaterJobSpec": {

"type": "db",

"connectURI": "jdbc:mysql:\/\/localhost:7980\/test_db",

"user": "username",

"password": "passmeup",

"segmentTable": "segments"

}

I ran the following command:

curl -X 'POST' -H 'Content-Type:application/json' -d @examples/indexing/chadin_index_task.json localhost:8087/druid/indexer/v1/task

And the server returned a "server error", and the Overlord node produced the following error:

Feb 19, 2014 3:38:49 AM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException

SEVERE: The exception contained within MappableContainerException could not be mapped to a response, re-throwing to the HTTP container

com.fasterxml.jackson.databind.JsonMappingException: Unexpected token (END_OBJECT), expected FIELD_NAME: missing property 'type' that is to contain type id (for class io.druid.indexing.common.task.Task)

at [Source: org.eclipse.jetty.server.HttpInput@1584da34; line: 1, column: 1208]

at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)

at com.fasterxml.jackson.databind.DeserializationContext.wrongTokenException(DeserializationContext.java:668)

at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer._deserializeTypedUsingDefaultImpl(AsPropertyTypeDeserializer.java:141)

at com.fasterxml.jackson.databind.jsontype.impl.AsPropertyTypeDeserializer.deserializeTypedFromObject(AsPropertyTypeDeserializer.java:90)

at com.fasterxml.jackson.databind.deser.AbstractDeserializer.deserializeWithType(AbstractDeserializer.java:106)

at com.fasterxml.jackson.databind.deser.impl.TypeWrappedDeserializer.deserialize(TypeWrappedDeserializer.java:36)

at com.fasterxml.jackson.databind.ObjectReader._bind(ObjectReader.java:1179)

at com.fasterxml.jackson.databind.ObjectReader.readValue(ObjectReader.java:635)

at com.fasterxml.jackson.jaxrs.base.ProviderBase.readFrom(ProviderBase.java:587)

at com.sun.jersey.spi.container.ContainerRequest.getEntity(ContainerRequest.java:488)

at com.sun.jersey.server.impl.model.method.dispatch.EntityParamDispatchProvider$EntityInjectable.getValue(EntityParamDispatchProvider.java:123)

at com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)

at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)

at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)

at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)

at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)

at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)

at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)

at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)

at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)

at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1511)

at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1442)

at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1391)

at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1381)

at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)

at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:538)

at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:716)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)

at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:278)

at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:268)

at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:180)

at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:93)

at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)

at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:120)

at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:132)

at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:129)

at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:206)

at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:129)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)

at org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:82)

at org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:256)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)

at io.druid.server.http.RedirectFilter.doFilter(RedirectFilter.java:71)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)

at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)

at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)

at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)

at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)

at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)

at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)

at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)

at org.eclipse.jetty.server.Server.handle(Server.java:370)

at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)

at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:949)

at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1011)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:651)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)

at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)

at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:668)

at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)

at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)

at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)

at java.lang.Thread.run(Thread.java:744)

2014-02-19 03:38:49,533 WARN [qtp2088428874-25] org.eclipse.jetty.servlet.ServletHandler -

javax.servlet.ServletException: com.fasterxml.jackson.databind.JsonMappingException: Unexpected token (END_OBJECT), expected FIELD_NAME: missing property 'type' that is to contain type id (for class io.druid.indexing.common.task.Task)

at [Source: org.eclipse.jetty.server.HttpInput@1584da34; line: 1, column: 1208]

at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:420)