Debezium connector for Oracle - POC

63 views
Skip to first unread message

Anuradha Bhat

unread,
Jun 5, 2024, 12:48:20 AMJun 5
to debezium
Hi Community,

I am working on a POC on using Debezium Oracle Connector to capture both DDL and DML change events. I have a setup which is dockerized. The setup is running Oracle 21c. The  LogMiner reads logs in redo_log_catalog mode with ARCHIVELOG, supplemental logging enabled. It is running docker images (2.2.0.Final) of Zookeeper, Kafka, Kafka connect components. The setup is configured referencing the blogs:

https://debezium.io/blog/2022/09/30/debezium-oracle-series-part-1/

https://debezium.io/blog/2022/10/06/debezium-oracle-series-part-2/

All is well here. Debezium writes DDL change event messages to topic with name = <topic_prex_property_value> and DML changes to topic with name = <topic_prex_property_value>.<schema_name>.<table_namne>

As next step, I have put together a Java application which uses embedded Debezium engine. Here, I do not use Zookeeper/Kafka/Kafka Connect orchestration. My pom.xml has pulled all necessary jars. My property set is as follows:

config = Configuration.empty().withSystemProperties(Function.identity()).edit()

                .with(EmbeddedEngine.CONNECTOR_CLASS, "io.debezium.connector.oracle.OracleConnector")

                .with(EmbeddedEngine.ENGINE_NAME, APP_NAME)

                .with("schemas.enable", false)

                .build();


final Properties props = config.asProperties();

props.setProperty("name", "engine");

props.setProperty("offset.storage", "org.apache.kafka.connect.storage.FileOffsetBackingStore");

props.setProperty("offset.storage.file.filename", "path/set/to/offsets.dat");

props.setProperty("offset.flush.interval.ms", "60000");

/* begin connector properties */

props.setProperty("database.hostname", "localhost");

props.setProperty("database.port", "1521");

props.setProperty("database.dbname", "<DB_NAME>");

props.setProperty("database.user", "<DB_USER>");

props.setProperty("database.password", "<DB_PASSWORD>");

props.setProperty("database.include.list", "<SINGLE_TABLE_NAME>");

props.setProperty("topic.prefix", "fullfilment");

props.setProperty("database.connection.adapter", "logminer");

props.setProperty("log.mining.strategy", "redo_log_catalog");

props.setProperty("include.schema.changes", "true");

props.setProperty("schema.history.internal", "io.debezium.storage.file.history.FileSchemaHistory");

props.setProperty("schema.history.internal.file.filename",

"path/set/to/abc.dat");

props.setProperty("database.encrypt", "false");


Now my observation is that Debezium polls DML changes and sends event messages which I am able to process in Java:


try (DebeziumEngine<ChangeEvent<String, String>> engine = DebeziumEngine.create(Json.class).using(props).notifying((records, committer) -> {

System.out.println("==== Data change found ===" + records);

//Do process records and do stuff here.

}).build()) {

Executors.newSingleThreadExecutor().execute(engine);

ExecutorService executor = Executors.newSingleThreadExecutor();

executor.execute(engine);

}


However, the messages (records) received in the code above do not include DDL changes. All DDL changes are written to the file (abc.dat) which is set as value to schema.history.internal.file.filename property correctly.

Now, my question is how can I configure the Debezium engine to get both DDL change event messages as ChangeEvents which are part of Debezium response. Thus, I can process both DML and DDL events/ messages inside the lambda function shown in the snippet above. Could you please suggest how I can achieve this.

Thanks and regards-

Anu

Chris Cranford

unread,
Jun 5, 2024, 8:38:30 AMJun 5
to debe...@googlegroups.com
Hi -

All that is required is the "include.schema.changes", and you should be notified in the callback with a SourceRecord that is destined for the topic named "<topic-prefix-config-value>".  As our test suite makes use of the engine and we specifically test this use case across a variety of connectors, I believe something else may be happening here.  Can you enable TRACE logging and share the logs with that log level and testing your scenario, please?

Thanks,
Chris
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/73a79a10-f84f-43a5-b72d-de25c682f47en%40googlegroups.com.

Anuradha Bhat

unread,
Jun 5, 2024, 2:12:03 PMJun 5
to debezium
Hi Chris,

Thanks for your response. Also, thanks for the detailed blogs on the topic. 

I have code for DDL and no DDL use cases in my program. It turned out that an oversight was leading to the code block where "include.schema.changes" was getting overwritten as 'false'. I cleaned my code, restarted Eclipse in a new workspace and it is working as expected. My bad !!.

However, I am just curious about a caution posted in the Debezium documentation on Oracle connector:


Section: 'LogMiner Strategies'.

The note in the box states:  "If you configure the connector to use the redo_log_catalog mode, do not use multiple Debezium Oracle connectors to capture changes from the same logical database."

I assume that it refers to a fault tolerant operation say with 2 instances in active/ passive (standby) mode. When for some reason the active instance halts or crashes, the standby instance executes and polls for changes. Therefore, at any given time there would be only one of the instances will be executing. So, does this note mean that when we running the connector with redo_log_catalog strategy, we can not support fault tolerance. Could you please provide some insight ?

Thanks and regards-
Anu

Chris Cranford

unread,
Jun 6, 2024, 11:10:54 AMJun 6
to debe...@googlegroups.com
Hi Anu -

No, this doesn't have anything to do with standby fail-over scenarios.

This note is referring to deploying two or more connectors to stream changes from the same logical Oracle instance, meaning:

    - The same or different pluggable databases (PDBs) on Oracle 12c+ using multitenancy
    - The same Oracle database without multitenancy

In general, a single connector should most often be preferred as it produces the minimum overhead on the database in terms of CPU utilization and IOPS during the mining steps.  Addiing additional connectors increases both of these and when using the default mining strategy, causes extra contention on archive log creation.  So we recommend if you have a business need to create more than one connector for the same Oracle logical instance, you should prefer using "log.mining.strategy" configured with either "online_catalog" or "hybrid" (Debezium 2.6+ only) modes.

Hope that helps.
Chris

Anuradha Bhat

unread,
Jun 6, 2024, 12:41:07 PMJun 6
to debezium
Hi Chris-

Thank you for the clarification. This is very helpful to us to design the solution.

Regards-
Anu
Reply all
Reply to author
Forward
0 new messages