Seeking advice: I want to remove the "one postgres connector per topic.prefix" limitation

87 views
Skip to first unread message

Chris Redekop

unread,
Mar 6, 2024, 3:03:53 PM3/6/24
to debezium
Hello 👋, I'm using the postgres source connector to capture changes from a very high traffic postgres database. The throughput is more than a single connector can handle, so I need to run multiple connectors with a subset of the tables being handled by each connector. The problem is each connector instance needs to have a unique topic.prefix configured (as per https://debezium.io/documentation/reference/stable/connectors/postgresql.html#postgresql-property-topic-prefix )....which is kind of unfortunate because then all the sinks/consumers are affected whenever we change how the tables are partitioned across the source connectors.  We'd rather all tables be published to a unified prefix regardless of which connector instance is sourcing the data. 

I'm fairly new to debezium so I'd appreciate any advice if there is a known/common way to accomplish this.  If not, I'd be happy to submit a PR to allow this, any advice on the direction to take and/or gotchas to watch out for?  Thanks!

High level approach ideas I had:
1. Simple and straight-forward: provide a new configuration value to explicitly provide an overridden value for the usages where "topic.prefix" is being used as some sort of internal key. When this new config value is provided "topic.prefix" is only used to form the kafka topic.  Using this approach, the multiple connector instances would need to be configured and managed manually.
2. Complex and fancy: Theoretically we could support "tasks.max" values greater than "1". If you specified a "tasks.max" value of "3" it would create 3 replication slots and generate the publications such that the tables are evenly balanced across the 3.  This way there is still only a single connector, but it could handle far more throughput than it currently does. Conceptually this makes sense in my head, but I have no idea how feasible it is from a technical perspective. I have next-to-zero knowledge about the internal architecture of the connector at this point, so it could be completely unrealistic.

Any insights/advice are appreciated, Thanks!

- Chris

Chris Cranford

unread,
Mar 6, 2024, 3:09:24 PM3/6/24
to debe...@googlegroups.com
Hi Chris -

What you want to accomplish can easily be handled by using a single message transformation to re-route the events from one destination topic to another.  What you would do is in your new connector that you deploy with a unique topic.prefix, you would include configuration for the Topic Routing [1] transformation so that those events are routed to the new topic.

transforms=reroute
transforms.reroute.type=io.debezium.transforms.ByLogicalTableRouter
transforms.reroute.topic.regex=your_custom_prefix.(.*)
transforms.reroute.topic.replacement=your_common_prefix.$1

You can also make use of the Content-based Router transformation [2] as well with a similar configuration.

Thanks,
Chris

[1]: https://debezium.io/documentation/reference/stable/transformations/topic-routing.html
[2]: https://debezium.io/documentation/reference/stable/transformations/content-based-routing.html
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/5ed0c097-1460-43e8-92cf-771fb72e9e4cn%40googlegroups.com.

Chris Redekop

unread,
Mar 6, 2024, 4:25:47 PM3/6/24
to debezium
OMG of course, I don't know how I overlooked that.  Thank you 🙏
Reply all
Reply to author
Forward
0 new messages