Debezium and complex transactions

210 views
Skip to first unread message

Johan Vandevenne

unread,
Jun 16, 2022, 5:33:00 AM6/16/22
to debezium
Hello,

We are currently evaluating Debezium for synchronizing DB changes from different Oracle schema's to an ODS (also Oracle).

We are curious how Debezium would handle the scenario where you have long-running transactions with updates in multiple tables. 
Are these transactions also replicated in one transaction in the destination database ? How does this work when you have a topic per table ?

kind regards
Johan

Chris Cranford

unread,
Jun 16, 2022, 9:58:47 AM6/16/22
to debe...@googlegroups.com, Johan Vandevenne
Hi Johan -

The Oracle connector maintains a transaction cache.  This cache is managed by a watermark system with a low and high SCN watermark.  The low watermark is the SCN where the oldest, but yet not committed or rolled back transaction began.  Since Oracle tracks all changes in the redo logs, including those that are rolled back, we mine continuously and maintain events in the transaction cache until we observe a COMMIT or ROLLBACK.  The high watermark is the SCN of the most recent COMMIT or ROLLBACK.  This enables the connector to avoid re-sending committed transactions between the low and high SCN watermarks when a connector is restarted. 

So how does that relate to long running transactions?

The biggest drawback to long running transactions is that most Oracle environments do not maintain archive logs indefinitely.  Some environments purge them after 24 hours, others are less aggressive.  So if you have a long running transaction that lasts longer than your archive log retention period, you can quickly see that if the connector is restarted for any reason and the low watermark SCN points to a position in the redo logs inside a log that has been removed, the connector will no longer be capable of starting without adjusting the offsets and accepting data loss.  You can obviously address the data loss by running an Incremental Snapshot after you've manipulated the offsets, but that's not always ideal and depends entirely on the volume of data in question.

And finally, to touch on your last question, the transaction semantics on the destination depends on the sink connector, which isn't Debezium related.  I believe the JdbcSinkConnector likely handles the sink side as one transaction per event.  There may be some ways you can work around this by writing a KStreams application to combine the transaction metadata topic with all the table-specific topics for a given transaction and emit an aggregated event to a new topic for consumption, but you'd be looking at an entirely custom sink connector to handle that aggregate.

Hope that answers your questions,
Chris
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/81be89fe-7a14-4cef-9984-dd5935dcc795n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages