Performance Debezium

109 views
Skip to first unread message

Sven Vollmer

unread,
Aug 1, 2024, 9:50:26 AM8/1/24
to debezium
we use Debezium in Spring Boot and make a log entry for every SourceRecord with a Handler. In my first test with a oracle db the duration between oracle db entry (create time) and the log time in debezium between 1-3s. Our requirments we need a max duration < 1000ms. Is that normal or what is missing in the configuration?


Spring Boot-Config:
 @Bean
    public io.debezium.config.Configuration eventConnector() {
        return io.debezium.config.Configuration.create()
                .with("name", "xx.adapter.cdc2")
                .with("connector.class", "io.debezium.connector.oracle.OracleConnector")
                .with("offset.flush.interval.ms", 6000)
                // try optimizing start
                .with("poll.interval.ms", "500")
                .with("signal.poll.interval.ms", "500")
                .with("snapshot.locking.mode", "none")
                // optimizing end
                //.with("max.batch.size", "128")
                //.with("max.queue.size", "256")
                .with("task.max", 1)
                .with("database.hostname", "xxxx")
                .with("database.port", "xxxx")
                .with("database.user", "c##dbzuser")
                .with("database.password", "xxxx") //dbz
                .with("topic.prefix", "xxxx")
                .with("database.dbname", "xxxx")
                .with("database.pdb.name", "xxxx")
                .with("database.server.name", "xxxx")
                //.with("include.schema.changes", "false")
                .with("log.mining.strategy", "online_catalog")
                .with("snapshots.max.threads",4)
                .with("offset.storage.file.filename", "/tmp/xx_adapter_cdc_storage2.dat")
                .with("database.history.file.filename", "/tmp/xx_adapter_cdc_db_history2.dat")
                .with("database.history", "io.debezium.relational.history.FileDatabaseHistory")
                .with("schema.history.internal", "io.debezium.storage.file.history.FileSchemaHistory")
                .with("schema.history.internal.file.filename", "/tmp/xx_adapter_cdc_schema2.dat")
                .with("table.include.list", "xx.Tabelle")
                .with("transforms","changes, convertTimezone")
                .with("transforms.changes.type","io.debezium.transforms.ExtractChangedRecordState")
                .with("transforms.changes.header.changed.name","Changed")
                .with("transforms.changes.header.unchanged.name","Unchanged")
                .with("transforms.convertTimezone.type","io.debezium.transforms.TimezoneConverter")
                .with("transforms.convertTimezone.converted.timezone","Europe/Berlin")
                //.with("decimal.handling.mode", "double")
                .build();

Chris Cranford

unread,
Aug 1, 2024, 10:06:22 AM8/1/24
to debe...@googlegroups.com
Hi Sven

There are two possibilities to consider why you observe some latency. The first is with the log.mining.sleep.time.* settings, shown here:

    log.mining.sleep.time.min.ms
    log.mining.sleep.time.max.ms
    log.mining.sleep.time.default.ms
    log.mining.sleep.time.increment.ms

The connector uses an adaptive algorithm to adjust the sleep interval between data collection stages. This is done to avoid putting unnecessary load on the database server as LogMiner gathering can be IOPS intensive.  The connector starts using the default of 1000ms and will be adjusted by the increment +/-, which is 200ms.  At a minimum, the connector preforms no sleep between collection stages, but can go up to a max of 3000ms.  The amount of time depends entirely on where the read position is in relation to the database's last flushed SCN.  As the connector falls behind, it reduces the sleep interval to catch up faster.  As the connector reaches near real-time changes, it adjusts to be less burden on the database.

The second is latency imposed by Oracle.  When we mine changes using LogMiner, any logs that are added to the mine queue must be read in totality. So the larger your redo logs are, the more data that must be mined. LogMiner does not return any data to Debezium until the entire read operation ends, and this step happens in a single-thread on the database server side, so we have to wait for that step to complete to output changes.  In addition, you may find even by setting the sleep settings to 0 that you may observe upwards of several seconds latency, and that's precisely due to this point.  We have seen some users with 4GB and 16GB redo logs have latency upwards of many seconds, 5-20s because of the time it takes those redo entries to be read from disk.

Hope that helps.
-cc
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/8966bb0d-64b0-49d8-b821-62ca88b0f102n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages