Kafka for low memory environments. Guidance required

2,492 views
Skip to first unread message

parco...@gmail.com

unread,
Jan 19, 2016, 6:21:31 PM1/19/16
to Confluent Platform
I am in the process of evaluating/considering Kafka for a single node low memory footprint architecture. Memory can be 4G.

I did some basic throughput measurement and it was not a concern. Memory footprint measurements on Ubuntu 14 showed me that Kafka + ZK (RSS) was around 250MB.
Latency had high variance, from 2ms - 900ms based on size and # of messages and time of doing the test.

The main comparable to not using Kafka in this environment would be ZeroMQ, and I will lay out my thought process here.
Someone please tell me if this line of thinking makes sense. Also if someone has experience in using these please chime in.
Much appreciated.

Kafka is a message broker. ZeroMQ is brokerless - The main problem here is if there are multiple producers and multiple consumers, the
full connectivity/crossbar needs to be provided to to all the producers and consumers which is very painful config management problem, especially
when producers and consumers can be customer apps. 
In a brokered architecture, everybody just needs the Broker's URI. If we go down the road with ZeroMQ, we will have to end up adding a broker abstraction.

Kafka has high throughput - 500K m/s . ZeroMQ can handle even more, but at that throughput level, I am not concerned.

Latency - ZeroMQ latency is very low 20ms - 60ms. Kafka should be more like 500ms. If  we end up building a broker on top of ZeroMQ, its latency will (significantly) degrade.

Implementation Burden - ZeroMQ is a socket like library... i.e. too low level. I will have to end up picking up lots of implementation burden for any features I need in the future.

Delivery Guarantees - ZeroMQ has no delivery guarantees . It will drop messages if things are not kosher (i.e. say consumer is slow). Kafka has atleast once delivery guarantees which is good.
Kafka has persistence, ZeroMQ (ootb) does not have.

Memory Usage - Kafka memory usage is significantly higher than ZeroMQ, because it provides way more features and is built using Java/JVM. ZK adds its own overhead.

Also any specific thoughts on memory usage of kafka and its latency (how to reduce it ) would be very helpful. Thanks for your time and thoughts.. I know it is not a very specific question.

best.

Lars Albertsson

unread,
Jan 20, 2016, 3:43:03 AM1/20/16
to confluent...@googlegroups.com
Retention vs no retention is the main difference between Kafka and
ZeroMQ. If you only care about here and now, and appreciate low
latency, ZeroMQ is likely a better choice.

If your use case is data collection for storage or stream processing,
Kafka's retention makes it easier to build robust pipelines, since you
can recover from consumer process failures, take down consumers for
upgrades, etc. You can also reprocess old data if you discover stream
processing bugs.

Kafka latency is high on low-volume topics, since it waits for buffers
to fill before making incoming data available for consumption. If you
want to minimise latency, you can turn these knobs:

log.flush.interval.messages
log.flush.interval.ms
log.flush.scheduler.interval.ms

If you search for these strings and/or "Kafka latency" on the web, you
will find discussions with advice on how to turn them, and what
latency to expect.

Regards,


Lars Albertsson
Data engineer consultant
www.mapflat.com
+46 70 7687109
> --
> You received this message because you are subscribed to the Google Groups
> "Confluent Platform" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to confluent-platf...@googlegroups.com.
> To post to this group, send email to confluent...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/confluent-platform/13ebeef3-1516-4937-ab49-e6b592f0d0ba%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Oded Sofer

unread,
May 2, 2018, 2:58:20 AM5/2/18
to Confluent Platform
Can you please share how did you set it to minimize the footprint? 
On our environment it seems to consume around 32gb, though the volume of consumers/producers are very low  
Reply all
Reply to author
Forward
0 new messages