- Do you want protobuf support?
- The approach allows only for flat protobuf structures. This is a limitation. Parsing nested messages can only be done by custom implementations IMHO. Is there another way of letting the user inject his custom message parser code somehow other then having a separate firehose factory?
- Is it the right way to not create a new KafkaFirehoseFactory and do a switch in the existing one?
- I haven't completely understood the jackson-based wiring yet. What I did works but it feels not very good, especially the instanceof for instantiating the actual firehose instances. Can you give me a hint here?
- Regarding configuration: The parser part in realtime.spec would look like this with my implementation:
"parser" : { "type" : "protobuf","timestampSpec" : { "column" : "ts", "format" : "iso" },"data" : { "descriptor" : "MyEvent.desc","format" : "protobuf","dimensions" : ["dim1","dim2","dim3"] }}},
- Note the double mentioning of "protobuf". How could I avoid this? (Relates to wiring I guess)
- Is dataspec the best place to configure the descriptor file?
1) MapInputRowParser doesn't need the full DataSpec, it looks like it just needs a "List<String> customDimensions" and if that is null, it can fall back onto the dimensionExclusions like it does now when it does the hasCustomDimension() check2) With that change, you shouldn't need to even have a DataSpec in your protobuf parser anymore. You can elevate the descriptor and dimensions up to fields on the protobuf parser and that should make it a bit cleaner.3) The current KafkaFirehose is hard-coding that the messages have Strings and your instanceof check is working around that and saying that if the InputRowParser is the Protobuf one, then it should expect bytes. Instead, we shoulda) Make StringInputRowParser be an InputRowParser<ByteBuffer>b) Do all of the current conversion into a String inside the StringInputRowParser, do the ByteString conversion inside the Protobuf parserDoes that all make sense?
1) MapInputRowParser doesn't need the full DataSpec, it looks like it just needs a "List<String> customDimensions" and if that is null, it can fall back onto the dimensionExclusions like it does now when it does the hasCustomDimension() check2) With that change, you shouldn't need to even have a DataSpec in your protobuf parser anymore. You can elevate the descriptor and dimensions up to fields on the protobuf parser and that should make it a bit cleaner.3) The current KafkaFirehose is hard-coding that the messages have Strings and your instanceof check is working around that and saying that if the InputRowParser is the Protobuf one, then it should expect bytes. Instead, we shoulda) Make StringInputRowParser be an InputRowParser<ByteBuffer>b) Do all of the current conversion into a String inside the StringInputRowParser, do the ByteString conversion inside the Protobuf parserDoes that all make sense?Yes, I made the changes that way. Looks much better now and seems to work in first simple tests. Still some points that I want to clarify with you before testing more intensively:
- Configuration of protobuf and string parsers in realtime.spec is now slightly different. "string" takes a dataspec, "protobuf" not - is that OK and should we rename class DataSpec to StringDataSpec or Something like that?
- After changing the parameterization of StringInputRowParser I found myself changing lot of client code outside the realtime node. I had to insert ByteBuffer.wrap(line.getBytes()) in S3Firehose for instance. Also affected are e.g. HadoopDruidIndexer, IndexeGenerationJob, FlightsFirehose, etc. This introduces intermediate object creation and lots of place where I would have to test that behaviour hasn't changed. I would like to have another delegate parser implementation only for the Realtime Kafka Firehose and keep using the old StringInputRowParser in the other places. BUT there I got stuck in how to interpret the realtime.spec for cases where there is (type=kafka && parser = string) vs. (type !=kafka && parser == string). You see my problem? Any hint here?
BTW. The company CLA is almost signed.--Jan
--To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAKtGoKCq1v%2BVL5b9AwRE7w2FpZTLizuOwpn%3Dbf5nYvXA7GokGQ%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB6tGCH0pvjxKsYSt-da9%3DUwnTVvxr9qk1-vy%3D5RcT6jiCbgGw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAKtGoKBai2eHo5kHjEZQSAyGKoiW2yb%3DXaVrhWRe5eX9M9pQqQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB6tGCHitT8-%2Bti6k4C9WqBkD4ATk0chvy71Ht5K6kEBrPzWHQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAKtGoKC5ohPkQmymX3DXAbhShPCOqL8KFrQdzfr-X2cXrbt_7A%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB6tGCEV4mHbrj_UWihCDCM1%3DTRYk%2B%2B38MQraWwNPxiNMH6MDg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAKtGoKDRWch7vePn-teOySsyT_rno82TupyeWmyy6yS9jVSTvw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAB6tGCG17rWj4Qfa9OZw3GMryXJL4dAGvrN9k5%3DuYeQa2uhjGA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/CAKtGoKDeyifU6Vd81A%3DghM6%3DBfi9Tbj0CEVsbvMvNSbEstZYDA%40mail.gmail.com.