Hi Milo -
Please see inline.
On 7/21/21 3:53 AM, Milo van der Zee
wrote:
Hello Chris,
What I see is that reading redo logs consumes a lot of disk
io. Especially in a cloud environment this can become an issue
due to limited storage bandwidth. So reading multiple redo
logs just to get the dictionary seems quite expensive.
I think the current behaviour you describe is more optimal.
If this leads to too many dictionary rebuilds it might be
better to increase the log file size and so limit the number
of log switches.
I agree.
What drove me to considering some options here was our test suite.
We constantly start and stop the connector and in a recent run I
generated almost 450 archive logs just with one test suite
execution. This requires that either we must baby-sit the Oracle
instance and clean-up the archive space regularly to avoid a ARC
process failure or use the online_catalog mode as an alternative.
Fwiw, the same test suite run generates 4 archive logs using the
online_catalog which is pretty substantial. I realize in deployment
scenarios, people aren't constantly starting and shutting down the
connector so the archive log swap frequency is likely not as potent
but it seems like a problem in either case.
But why do a dictionary update after every log switch? The
dictionary is updated by DDL statements and so stays
up-to-date over time. Just include the log files containing
the dictionary every time. Then the number of included log
files will stay static and you don't have impact of recreating
the dictionary all the time. Doing an update once a day during
low traffic times seems optimal.
My original plan was to only include the data dictionary archive
log(s) and then any logs that needed to be mined, thus creating a
log gap; however Oracle LogMiner continually failed with "missing
logfile" when starting the mining session. I assume that when
specifying the DDL_DICT_TRACKING option on the LogMiner session that
LogMiner cannot have any unexpected gaps in the log sequences? I'll
go back and see if perhaps I made a mistake if this works but
ideally this was what I wanted to do as it seems to mimic precisely
what Xstream does.
And why do it at startup time? If the dictionary is
available there should not be a need for that?
Absolutely. If I can reduce the dictionary build frequency, thus
reducing the number of archive log swaps just caused by connector
actions in general, it's a win. But i want to do it in such a way
that it both does not introduce any unexpected or unnecessary
overhead on both database resources as well as the runtime duration
to be a solid win in my book.
Hi everyone -
The Debezium Oracle connector's default mining strategy
forces the connector to write the database data dictionary
to the redo logs at start-up & anytime a log switch is
detected. Depending on the size of the data dictionary,
the size of the redo logs & the number of log groups,
this can be an expensive operation and leads to a small
burst of archive logs being generated in quick
succession. What I want to do is to see if we can
minimize the burst of archive logs produced by the
connector.
Currently the way the connector works, we gather all
archive and redo logs that have changes from a point in
time going forward. When a log switch occurs, the
connector asks Oracle to rebuild the data dictionary to
the redo logs, this generates a small burst of archive
logs, and then the mining session will be seeded with all
the archive logs that were generated along with any of the
current online redo logs used by the Oracle instance. As
logs continue to switch, this process is repeated and so
therefore the number of logs mined on a given session is
relatively static.
What I would like to consider is adjusting this behavior
slightly so that we reduce the frequency at which the data
dictionary is re-built.
To accomplish this task, the connector would not only
include archive logs and redo logs that contain the
starting system change number we want to mine from, but it
would also look at older archive logs to see if any within
a given threshold has the data dictionary. If such
archive logs exist, the connector would actually add those
logs and all logs thereafter to the session. Only if the
age threshold has been reached or if no available archive
logs have the data dictionary would we trigger a re-build
of the data dictionary to the redo logs. This means as
log switches occur, the number of logs mined on a given
session would actually grow over time. Since this
threshold is configurable, we could see this to a decent
default but then users could adjust this based on your
system needs.
I'd like to understand from those with Oracle LogMiner
experience whether or not you foresee any issues with this
approach? My biggest concern is mostly around how much
PGA memory would this consume.
Looking forward to your feedback!
CC
--
You received this message because you are subscribed to the
Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from
it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/28436ff8-0d9a-bab9-9f30-9b06f835e714%40gmail.com.
--
You received this message because you are subscribed to the Google
Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to debezium+u...@googlegroups.com.