kafka indexing: Protobuf with extension

289 views
Skip to first unread message

Suhas Anand

unread,
Jan 19, 2017, 5:31:52 PM1/19/17
to Druid User, Fangjin Yang
Hey Guys, 

We are quite new to druid and currently evaluating druid vs our old influxdb cluster. 

For druid we are ingesting data from kafka using imply's kafka indexing service. (https://imply.io/docs/latest/tutorial-kafka-indexing-service.html). This works fine for simple JSON data, however most of our data in Kafka are in binary protobuf format and these protobuf data are quite complex with extensions (https://developers.google.com/protocol-buffers/docs/proto#extensions

I unsuccessfully tried looking around to check if there is a way to enable kafka indexing service to support this kind of data, we wanted to quickly reach out to community to check if anybody has try doing this, if so share their approach to configure druid to index this kind of data. 

-Suhas

Gian Merlino

unread,
Jan 20, 2017, 6:09:04 PM1/20/17
to druid...@googlegroups.com, Fangjin Yang
Hey Suhas,

Druid does support protobuf data through the "protobuf" parser. It appears to be undocumented (not sure why) and to only support flat formats. This is the code that can parse protobuf messages: https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/data/input/ProtoBufInputRowParser.java (check out the method "buildStringKeyMap").

There was some other work here in the past.


I'm not sure if knoguchi is still working on a protobuf3 extension but that seems to be the direction things were going.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/59a77463-4f47-4ae6-8c9d-2d9be8dbd7ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Suhas Anand

unread,
Jan 20, 2017, 7:59:40 PM1/20/17
to Druid User, fan...@imply.io
Thanks for the reply Gian. I am aware that druid supports protobuf data, the problem as you mentioned is it only supports simple formats, most of the enterprise have complex format that involves extensions

Thanks for sharing the other work, looks like none of time provide solutions to enterprise format :-( 
 
-Suhas


On Friday, January 20, 2017 at 3:09:04 PM UTC-8, Gian Merlino wrote:
Hey Suhas,

Druid does support protobuf data through the "protobuf" parser. It appears to be undocumented (not sure why) and to only support flat formats. This is the code that can parse protobuf messages: https://github.com/druid-io/druid/blob/master/processing/src/main/java/io/druid/data/input/ProtoBufInputRowParser.java (check out the method "buildStringKeyMap").

There was some other work here in the past.


I'm not sure if knoguchi is still working on a protobuf3 extension but that seems to be the direction things were going.

Gian

On Thu, Jan 19, 2017 at 2:31 PM, Suhas Anand <suha...@gmail.com> wrote:
Hey Guys, 

We are quite new to druid and currently evaluating druid vs our old influxdb cluster. 

For druid we are ingesting data from kafka using imply's kafka indexing service. (https://imply.io/docs/latest/tutorial-kafka-indexing-service.html). This works fine for simple JSON data, however most of our data in Kafka are in binary protobuf format and these protobuf data are quite complex with extensions (https://developers.google.com/protocol-buffers/docs/proto#extensions

I unsuccessfully tried looking around to check if there is a way to enable kafka indexing service to support this kind of data, we wanted to quickly reach out to community to check if anybody has try doing this, if so share their approach to configure druid to index this kind of data. 

-Suhas

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.

Kenji Noguchi

unread,
Mar 11, 2017, 10:21:44 PM3/11/17
to Druid User, fan...@imply.io
I will resume the protobuf extensions work shortly.  

Kenji Noguchi



Kenji Noguchi

unread,
Mar 13, 2017, 4:38:20 PM3/13/17
to Druid User, fan...@imply.io
I created https://github.com/druid-io/druid/pull/4039
Please let me know if this satisfies your use case.

The PB "extension" should work as long as the Google's protobuf-java-util package can convert it to nested JSON.

-kenji

Reply all
Reply to author
Forward
0 new messages