Kafka message key hashing

21 views
Skip to first unread message

Alexandre Berthaud

unread,
Dec 11, 2020, 5:38:04 AM12/11/20
to Warp 10 users

Hey everyone,

Because of an issue we experienced, I found out that kafka message keys written by the Ingress are not hashed. They are basically a concatenation of the class and the labels. This means that all points related to a class go into the same partition / group of partitions.

Of course, this is fine in most cases.

In the case where data of a specific class is significantly bigger than the rest of the data, this can be an issue (big strings versus int / floats) where some store threads will have lots of work to do and others not so much.

Besides performance concerns, do you think having the key hashed (which would spread out messages of any class over all partitions but still keep the order guarantee for a given GTS) could cause problems? Maybe this could be an option?

Thanks for your input!

mathias....@gmail.com

unread,
Dec 11, 2020, 6:01:41 AM12/11/20
to Warp 10 users
The Ingress component uses a custom partitioner which does already compute a hash on the GTS id:

partitioner.class = io.warp10.continuum.KafkaPartitioner


Alexandre Berthaud

unread,
Dec 11, 2020, 6:06:52 AM12/11/20
to Warp 10 users
Hm this is weird.

It would mean that the delta in size we saw between a specific partition and the rest of the partitions is due to a specific GTS, not a class (or sheer bad luck).

I will look into this further then.

Thanks.
Reply all
Reply to author
Forward
0 new messages