Oracle connector's Log Swap Frequency

243 views
Skip to first unread message

Chris Cranford

unread,
Jul 21, 2021, 3:16:00 AM7/21/21
to debe...@googlegroups.com
Hi everyone -

The Debezium Oracle connector's default mining strategy forces the connector to write the database data dictionary to the redo logs at start-up & anytime a log switch is detected.  Depending on the size of the data dictionary, the size of the redo logs & the number of log groups, this can be an expensive operation and leads to a small burst of archive logs being generated in quick succession.  What I want to do is to see if we can minimize the burst of archive logs produced by the connector.

Currently the way the connector works, we gather all archive and redo logs that have changes from a point in time going forward.  When a log switch occurs, the connector asks Oracle to rebuild the data dictionary to the redo logs, this generates a small burst of archive logs, and then the mining session will be seeded with all the archive logs that were generated along with any of the current online redo logs used by the Oracle instance.  As logs continue to switch, this process is repeated and so therefore the number of logs mined on a given session is relatively static.

What I would like to consider is adjusting this behavior slightly so that we reduce the frequency at which the data dictionary is re-built.

To accomplish this task, the connector would not only include archive logs and redo logs that contain the starting system change number we want to mine from, but it would also look at older archive logs to see if any within a given threshold has the data dictionary.  If such archive logs exist, the connector would actually add those logs and all logs thereafter to the session.  Only if the age threshold has been reached or if no available archive logs have the data dictionary would we trigger a re-build of the data dictionary to the redo logs.  This means as log switches occur, the number of logs mined on a given session would actually grow over time.  Since this threshold is configurable, we could see this to a decent default but then users could adjust this based on your system needs.

I'd like to understand from those with Oracle LogMiner experience whether or not you foresee any issues with this approach?  My biggest concern is mostly around how much PGA memory would this consume.

Looking forward to your feedback!
CC




Milo van der Zee

unread,
Jul 21, 2021, 3:53:42 AM7/21/21
to debe...@googlegroups.com
Hello Chris,

What I see is that reading redo logs consumes a lot of disk io. Especially in a cloud environment this can become an issue due to limited storage bandwidth. So reading multiple redo logs just to get the dictionary seems quite expensive.

I think the current behaviour you describe is more optimal. If this leads to too many dictionary rebuilds it might be better to increase the log file size and so limit the number of log switches.

But why do a dictionary update after every log switch? The dictionary is updated by DDL statements and so stays up-to-date over time. Just include the log files containing the dictionary every time. Then the number of included log files will stay static and you don't have impact of recreating the dictionary all the time. Doing an update once a day during low traffic times seems optimal.
And why do it at startup time? If the dictionary is available there should not be a need for that?

MAG,
Milo

Op wo 21 jul. 2021 om 09:16 schreef Chris Cranford <cran...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/28436ff8-0d9a-bab9-9f30-9b06f835e714%40gmail.com.

Chris Cranford

unread,
Jul 21, 2021, 11:18:19 AM7/21/21
to debe...@googlegroups.com, Milo van der Zee
Hi Milo -

Please see inline.

On 7/21/21 3:53 AM, Milo van der Zee wrote:
Hello Chris,

What I see is that reading redo logs consumes a lot of disk io. Especially in a cloud environment this can become an issue due to limited storage bandwidth. So reading multiple redo logs just to get the dictionary seems quite expensive.

I think the current behaviour you describe is more optimal. If this leads to too many dictionary rebuilds it might be better to increase the log file size and so limit the number of log switches.

I agree.

What drove me to considering some options here was our test suite.  We constantly start and stop the connector and in a recent run I generated almost 450 archive logs just with one test suite execution.  This requires that either we must baby-sit the Oracle instance and clean-up the archive space regularly to avoid a ARC process failure or use the online_catalog mode as an alternative.  Fwiw, the same test suite run generates 4 archive logs using the online_catalog which is pretty substantial.  I realize in deployment scenarios, people aren't constantly starting and shutting down the connector so the archive log swap frequency is likely not as potent but it seems like a problem in either case.


But why do a dictionary update after every log switch? The dictionary is updated by DDL statements and so stays up-to-date over time. Just include the log files containing the dictionary every time. Then the number of included log files will stay static and you don't have impact of recreating the dictionary all the time. Doing an update once a day during low traffic times seems optimal.

My original plan was to only include the data dictionary archive log(s) and then any logs that needed to be mined, thus creating a log gap; however Oracle LogMiner continually failed with "missing logfile" when starting the mining session.  I assume that when specifying the DDL_DICT_TRACKING option on the LogMiner session that LogMiner cannot have any unexpected gaps in the log sequences?  I'll go back and see if perhaps I made a mistake if this works but ideally this was what I wanted to do as it seems to mimic precisely what Xstream does.


And why do it at startup time? If the dictionary is available there should not be a need for that?

Absolutely.  If I can reduce the dictionary build frequency, thus reducing the number of archive log swaps just caused by connector actions in general, it's a win.  But i want to do it in such a way that it both does not introduce any unexpected or unnecessary overhead on both database resources as well as the runtime duration to be a solid win in my book.

MAG,
Milo

Op wo 21 jul. 2021 om 09:16 schreef Chris Cranford <cran...@gmail.com>:
Hi everyone -

The Debezium Oracle connector's default mining strategy forces the connector to write the database data dictionary to the redo logs at start-up & anytime a log switch is detected.  Depending on the size of the data dictionary, the size of the redo logs & the number of log groups, this can be an expensive operation and leads to a small burst of archive logs being generated in quick succession.  What I want to do is to see if we can minimize the burst of archive logs produced by the connector.

Currently the way the connector works, we gather all archive and redo logs that have changes from a point in time going forward.  When a log switch occurs, the connector asks Oracle to rebuild the data dictionary to the redo logs, this generates a small burst of archive logs, and then the mining session will be seeded with all the archive logs that were generated along with any of the current online redo logs used by the Oracle instance.  As logs continue to switch, this process is repeated and so therefore the number of logs mined on a given session is relatively static.

What I would like to consider is adjusting this behavior slightly so that we reduce the frequency at which the data dictionary is re-built.

To accomplish this task, the connector would not only include archive logs and redo logs that contain the starting system change number we want to mine from, but it would also look at older archive logs to see if any within a given threshold has the data dictionary.  If such archive logs exist, the connector would actually add those logs and all logs thereafter to the session.  Only if the age threshold has been reached or if no available archive logs have the data dictionary would we trigger a re-build of the data dictionary to the redo logs.  This means as log switches occur, the number of logs mined on a given session would actually grow over time.  Since this threshold is configurable, we could see this to a decent default but then users could adjust this based on your system needs.

I'd like to understand from those with Oracle LogMiner experience whether or not you foresee any issues with this approach?  My biggest concern is mostly around how much PGA memory would this consume.

Looking forward to your feedback!
CC




--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/28436ff8-0d9a-bab9-9f30-9b06f835e714%40gmail.com.
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages