topic partitioning

2,263 views
Skip to first unread message

Scott Ferguson

unread,
Oct 13, 2016, 10:30:28 AM10/13/16
to debezium
What does debezium provide kafka as the partition key for a given topic?

Apologies if this is in the doc, I haven't been able to find it there.
Thanks,
Scott

Randall Hauch

unread,
Oct 13, 2016, 10:43:25 AM10/13/16
to Scott Ferguson, debezium
Debezium does not specify the partition number for events. That means that Kafka Connect’s producer will then use the default partitioning logic that computes the partition using a consistent hash of the message key, which in Debezium’s case is a struct containing the affected row’s primary/unique key. You can also customize this behavior via the Kafka Connect worker configuration properties that control the producer behavior.

Just be sure that if you’re using more than one partition for a table topic that you understand this will affect the ordering guarantees for the events in that topic, since Kafka only guarantees total order within each topic partition. So, with one partition, all changes events will have the same total order as occurred in the database. With multiple partitions, the order of events in separate partitions will not be guaranteed.

Having said that, some use cases may want to “shard” their events into multiple partitions per topic. This works especially well when downstream consumers care only about the order of change events for each row, but don’t really care how the events for different rows are ordered.

Best regards,

Randall
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To post to this group, send email to debe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/e1d7722c-b124-435f-ac61-b5bd70f3b071%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Scott Ferguson

unread,
Oct 13, 2016, 11:33:14 AM10/13/16
to debezium
Thanks. That makes sense. We need the ability to shard on a key other than the primary key so being able to specify that is a critical feature. Great to have a little insight into how/where to take care of it.

Nick Adelman

unread,
Aug 11, 2017, 11:12:31 AM8/11/17
to debezium
I am starting to look ingesting Mongo Oplog data with Debezium, and had the same immediate concerns about total ordering. Scott, I'm wondering if you ended up using a custom partitioning strategy, and how that worked out for you. We ultimately want to take that approach, but I have concerns that if we shard on a key that is drawn from a list of reference values (as an example), we will end up having to continually add partitions and refactor the custom partitioning strategy as the possible set of key values grows over time.

Thanks,

Nick Adelman

Scott Ferguson

unread,
Aug 11, 2017, 11:34:27 AM8/11/17
to debe...@googlegroups.com
We ended up partitioning by the value of a specific column in our MySQL databases (there is a mostly common schema). We set the number of partitions for each topic relatively high compared to the number of potential values for the partitioning column (>10x). We've been running this way for 6 months and it works well so far. 

Regarding adding partitions, my understanding of Kafka is that this shouldn't be a concern. A partition is not dedicated to one key, it simply ensures that all data for a key is well ordered. Imagine a case where you were partitioning by the primary key on a table, it's not practical to have a partition per primary key.

I should note that we didn't go with Debezium for CDC. We love the project and would've liked to use it, but we run in AWS RDS and it wasn't ready for that at the time.

Hope this helps. LMK if you have questions/concerns.

--
You received this message because you are subscribed to a topic in the Google Groups "debezium" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/debezium/L0TOC-vkO8Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to debezium+unsubscribe@googlegroups.com.

To post to this group, send email to debe...@googlegroups.com.

Gunnar Morling

unread,
Aug 12, 2017, 7:03:28 AM8/12/17
to debezium
Hi Scott,

Thanks a lot for getting back to this thread. I'm still catching up with the issues around using Debezium on RDS. Could you perhaps share (here or in a separate mail to me if you like) what is missing for that from your perspective? Running on RDS is definitely an important goal and I'd like to work towards supporting this as good as possible.

Thanks,

--Gunnar
To unsubscribe from this group and all its topics, send an email to debezium+u...@googlegroups.com.

To post to this group, send email to debe...@googlegroups.com.

Scott Ferguson

unread,
Aug 12, 2017, 9:12:37 AM8/12/17
to debe...@googlegroups.com
Hey, RDS support was added around March of this year. It's marked as preliminary so you may need to investigate a little. We haven't had the chance to dig into it yet.

To unsubscribe from this group and all its topics, send an email to debezium+unsubscribe@googlegroups.com.

To post to this group, send email to debe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages