Batching with Kafka Rest

300 views
Skip to first unread message

John Omernik

unread,
May 17, 2015, 9:41:21 AM5/17/15
to confluent...@googlegroups.com
I am working on taking some log files and reading them in with a script and pushing them to Kafka Rest. It's not the cleanest way, but for the event source, it's one of the few options. (I wonder if there are options for Flume to integrate with Kafka/Schema Registry... that would be cool).  That said, Is it smart to batch records where possible? I am thinking that if I can produce in batches of 10 or 50 there must be a good balance between efficiency for the overhead of the request. Thoughts?

Gwen Shapira

unread,
May 17, 2015, 10:13:41 AM5/17/15
to confluent...@googlegroups.com
Flume has Kafka Source/Sink/Channel, but it doesn't integrate with Confluent directly.

I'd love to integrate it with the schema registry, but there are few issues. Some technical and some not :)
The technical bit is that Confluent's schema repo expects the schema-id in the event body, everything else that does Flume+Avro expects the Avro schema (or ID or URI) in the event header (i.e the key). So its an awkward fit, although obviously solvable.

Regarding batches, we usually use significantly larger batches with Flume on high-throughput systems - 100 - 1000 events per batch is a pretty normal range, depending on how many events/second you have.


Gwen

On Sun, May 17, 2015 at 4:41 PM, John Omernik <jo...@omernik.com> wrote:
I am working on taking some log files and reading them in with a script and pushing them to Kafka Rest. It's not the cleanest way, but for the event source, it's one of the few options. (I wonder if there are options for Flume to integrate with Kafka/Schema Registry... that would be cool).  That said, Is it smart to batch records where possible? I am thinking that if I can produce in batches of 10 or 50 there must be a good balance between efficiency for the overhead of the request. Thoughts?

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/560c602e-72fb-48a4-b9e9-2c5c89453c72%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ewen Cheslack-Postava

unread,
May 17, 2015, 6:35:33 PM5/17/15
to confluent...@googlegroups.com
Agreed that you definitely want to batch if possible. Consider that a lot of logs may only have 50-100 bytes per log line. Even if you batch 100 events together, that's still only about 10k, which is a pretty small request. The REST proxy doesn't require too many headers so the overhead isn't too bad, but that's still overhead. At 100 events, it's should definitely be enough to outweigh the overhead of the HTTP request, but increasing to even 1000 if possible is probably ideal.

But if you're *streaming* logs in rather than bulk loading them after they are created, then you might have different constraints. For example, you might care more about latency. You'll probably want to add a batching scheme similar to what the new consumer does/similar to Nagle's where you batch as long as there is an outstanding request and/or during some "linger" period to allow multiple events to pile up before sending a request. By adjusting the parameters that control this batching you can tradeoff latency, throughput, and per-request overhead.

-Ewen


For more options, visit https://groups.google.com/d/optout.



--
Thanks,
Ewen

Chirag Patel

unread,
Oct 2, 2016, 2:03:45 PM10/2/16
to Confluent Platform
Hey John,
I am also trying to do the same, I can see that this post is year old and Is there any way by which this is possible?

George @paytm.com

unread,
Oct 3, 2016, 3:08:44 AM10/3/16
to confluent...@googlegroups.com

In our production we batch the incoming jsons and  flush/send to Kafka rest every 100ms


--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/a5115e50-eef7-4bb7-850e-11746cc7f337%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Disclaimer :-

This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail and destroy all copies of this message and any attachments. Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. 

Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments.

Chirag Patel

unread,
Oct 5, 2016, 8:33:36 PM10/5/16
to Confluent Platform
Hey George,
I am interested in knowing more about Flume and kafka+confluent integration.
Is there any documentation that you can suggest?

Reply all
Reply to author
Forward
0 new messages