Write throughput handling

39 views
Skip to first unread message

Rohit Panwar

unread,
Sep 4, 2025, 6:30:19 AMSep 4
to debezium
Hi community,

Lets say i deploy a debezium connector for a tenant which has very high write throughput. Is it recommended to have multiple connectors for different tables of the tenant? Could there be a scenario in case of say an extreme spike in writes that my single connector starts falling behind?

Chris Cranford

unread,
Sep 4, 2025, 11:42:16 AMSep 4
to debe...@googlegroups.com
Hi -

For MySQL and MariaDB, you'd likely see very negligible gains, if any. This is due to how the binlog client works by reading every binlog event regardless of whether its for a table that is or isn't in its include list. So adding multiple connectors for this type of database source won't add value, just overhead and extra management complexity of multiple connectors.

However, other connectors that push filter configurations to the database level (PG/Oracle/MongoDB) or that have ways to exclude and only read changes on a per-table basis (SQL Server/Db2), those connectors would most definitely see more benefit from such a model, but it's something we usually ask that you try and single connector first, measure it, and see what the gain ultimately is. You may find the gain remains negligible and doesn't justify the complexity or the overhead it may put on your database with multiple connectors.

-cc


On 9/4/25 6:30 AM, Rohit Panwar wrote:
Hi community,

Lets say i deploy a debezium connector for a tenant which has very high write throughput. Is it recommended to have multiple connectors for different tables of the tenant? Could there be a scenario in case of say an extreme spike in writes that my single connector starts falling behind? --
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/34de9c6d-60ab-4d7d-9e9a-011b7b40432bn%40googlegroups.com.

David Cohen

unread,
Sep 16, 2025, 8:09:24 AMSep 16
to debezium
So on an Oracle/PG/Mongo do multiple connectors actually behave differently then multiple threads in the same connector?
(IE on the same database, are there worlds where you get more throughput from 
4 connectors, with 2 threads then 2 connectors with 4 threads)

Chris Cranford

unread,
Sep 16, 2025, 10:35:35 AMSep 16
to debe...@googlegroups.com
Hi David -

Debezium's source connectors are inherently single threaded, so if you are deploying 4 connectors, each of those are an independent singular task. SQL Server is one of the only connectors that supports multiple tasks; however, it's only possible if the `database.names` configuration is specified with multiple databases. If your SQL Server connector is capturing changes from a single database, the most the connector will use is 1 task.

So really this comes down to a question of horizontal scaling of the pipeline across multiple connector deployments.

A database transaction log is predominately singular, meaning regardless of which source we speak (MySQL/MariaDB/Oracle/PG/SQL Server/MongoDB), deployment of 2 or more connectors against the same database results in transaction logs read by each connector independently from the other deployments. And by virtue, this means that you increase Disk IO and CPU Utilization by the number of connectors in many cases.

For MySQL/MariaDB, these connectors do not have the ability to do early pruning of changes, and so every binlog event is serialized over the wire and consumed by the connector, even if that event is discarded. So whether you decide to split the connector deployment out horizontally by tenant due to their activity, all connectors will read the same stream of changes, it's just that the connector skips events based on your include filters. You would want to measure the difference, but it's very likely that this would be negligible as I indicated.

When we look at other connectors like Oracle/PG/SQL Server/Db2 and to a lesser degree of MongoDB, you have the ability to specify configurations that impact several key performance aspects.

For PostgreSQL, you can create the publication in filter mode, meaning PostgreSQL only serializes events over the network to Debezium if its for a table that is of interest. This omits the need for Debezium to filter out unwanted events and reduces network overhead. For Oracle, you can specify to use query filters so that we apply extra predicates to limit the data returned by the LogMiner query to include only the tables of interest plus transaction markers. There is also a new experimental CTE query feature that goes even further to reduce the payloads over the network. For SQL Server and Db2, we only read changes from the capture tables based on your include filter setup, which minimizes the dataset read by the each connector and what is sent over the network.

In all cases, you need to measure and see whether or not these really yield any better performance versus a single deployment. You most likely will with SQL Server/Db2 in most cases but for others, PostgreSQL has an extra replication slot at play and Oracle an extra LogMiner process -- which may likely lead to higher memory consumption, Disk IO, and CPU Utilization on your database. Then its a question of whether the performance gain is worth that extra overhead.

-cc
Reply all
Reply to author
Forward
0 new messages