Is there a way to tuning performance for debezium and oracle 11g with 2 TB of data?

217 views
Skip to first unread message

FairyTail1279

unread,
Feb 11, 2022, 9:43:10 AM2/11/22
to debezium
Hi.
How should I configure to handle 2 TB data in migration and cdc in 6 hours, do you have any idea?

Thanks
FairyTail

Chris Cranford

unread,
Feb 11, 2022, 9:53:41 AM2/11/22
to debe...@googlegroups.com, FairyTail1279
Hi FairyTail,

By migration I assume you mean snapshoting data.  Is this something where you are concerned about streaming changes concurrently or is the data mostly read-only where you need to migrate to another database?

Chris
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/2ac60c96-6c43-411d-bdad-f15f2e34e8a8n%40googlegroups.com.

FairyTail1279

unread,
Feb 11, 2022, 10:08:13 AM2/11/22
to debezium
Yes, the requirement is to take a snapshot of 2TB data first, and then continue streaming data change from source database to save to destination database non-stop because source database will always have data flowing.

Fairy

ในวันที่ วันศุกร์ที่ 11 กุมภาพันธ์ ค.ศ. 2022 เวลา 21 นาฬิกา 53 นาที 41 วินาที UTC+7 Chris Cranford เขียนว่า:

Chris Cranford

unread,
Feb 11, 2022, 11:55:43 AM2/11/22
to debe...@googlegroups.com, FairyTail1279
Hi Fairy

So in that case you have two options and with Oracle the option really depends on whether your DBA can guarantee that the UNDO_RETENTION period is long enough to satisfy the snapshot of the 2TB of data, let me explain.

So in Oracle, the traditional snapshot phase that happens at the start of a new connector relies on using Flashback queries which are based on data that is held in the database's undo area.  This undo area has a size that is based specifically on this UNDO_RETENTION database parameter.  In the event that a Flashback query wants to return data based on an SCN that has aged out of the undo area; an exception happens and the query fails.  So when using the traditional snapshot approach, you have to make sure that the UNDO_RETENTION is large enough that the database can retain SCN data as far back to the point when the connector starts.  In other words, if your snapshot were to theoretically take 36 hours to complete, your UNDO_RETENTION needs to be configured to store slightly longer than 36 hours worth of data.  Also in addition to the UNDO_RETENTION, you also need to guarantee that for however long the snapshot runs that all your archive logs within that same window of time are retained on the database server so that when streaming begins, it can read changes without any loss of change events from the time the snapshot began.

Sometimes adjusting the UNDO_RETENTION parameter or keeping days worth of archive logs on the server isn't ideal and that's okay because there is an alternative called Incremental Snapshots.

With Incremental Snapshots, we stream changes and snapshot entries concurrently.  This has the benefit that a large UNDO_RETENTION doesn't need to exist as well as archive log retention policies can usually be kept as they are, assuming they're being properly managed from the outset.  There are some minor nuances with how events may show up in the topics as they'll no longer be all snapshot records and then changes like the traditional approach because we are now emitting events for both phases concurrently.  However, with Oracle there is one caveat you have to be able to honor to use Incremental Snapshots and that is you cannot change the schema of the tables you are capturing while an incremental snapshot is ongoing.  Once the incremental snapshot has concluded, you can safely manipulate the captured table's schema but not while the snapshot is in progress.

In terms of duration of the snapshot, at least with Oracle the incremental snapshot style will be slightly slower than the traditional snapshot approach.  This is largely due to how we interact with Oracle LogMiner and how the incremental snapshot triggers are received as change events using a poll style mechanism.  You can adjust the polling to be faster or slower by adjusting log.mining.sleep.time.xxx settings; however be aware that having LogMiner queries fired without a small delay can lead to high CPU usage on the Oracle database server; hence why these sleep/poll values exist.

You mentioned 6 hours in your original question, is that the maximum allowed time you can have the snapshot run or is that the duration of a traditional snapshot that you've observed?  If that's what you observed, that seems pretty reasonable given the amount of data.  If that's the maximum duration you can have a snapshot run, what is driving that?  Is that due to UNDO_RETENTION or some other factor?

Chris
Reply all
Reply to author
Forward
0 new messages