[ORACLE] Sink strategy

39 views
Skip to first unread message

Vitor Pereira

unread,
Feb 5, 2026, 11:32:20 AMFeb 5
to debezium
Hi,

I hope everything is well.

Today I am here with a question related to the sink process. To overcome the time required to execute the MERGE command in the destination database, since we have our sink connector with ‘insert.mode’: “upsert”, our idea was to execute the first streaming with ‘insert.mode’: “insert” and then change it to ‘upsert’. It seems that the pipeline does not like this and enters a rebalancing cycle.

Is there any way to make this work? It would make our lives much easier.

Best regards,
Vitor Pereira

Chris Cranford

unread,
Feb 5, 2026, 4:27:51 PMFeb 5
to debe...@googlegroups.com
Hi Vitor -

You are free to change the `insert.mode` at any time, but as you point out, a re-balance will be triggered because a reconfiguration will cause the sink connector task(s) to be stopped and restarted. That's normal and entirely expected.

If your issue is more about the fact you have other connectors on the cluster and you want to avoid them from being re-balanced, then perhaps putting the sink connector on its own Kafka Connect cluster is the answer.

But I am more concerned here with the MERGE performance. Does the table in question have a composite primary key, and if so, what is the makeup of this composite key? In addition, how many columns are in this table and what types of column types are present? Any of them CLOB or BLOB?

-cc

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/2a94a287-6c89-4304-8f99-16de668365c4n%40googlegroups.com.

Vitor Pereira

unread,
Feb 9, 2026, 4:59:21 AMFeb 9
to debezium
Hi, Chris

So, my question is: Can I change the configuration from "insert" to "upsert" after the initial snapshot without any issue?

Yes, the table in question has a composite PK, composed by four columns. This table has 40 columns and there is no CLOB and BLOB, only VARCHARs, DATEs and NUMBERs.

Best regards,
Vitor Pereira

Chris Cranford

unread,
Feb 9, 2026, 8:25:11 AMFeb 9
to debe...@googlegroups.com
Hi Vitor

You could, but understand that if we receive an event for the same primary key while the insert mode is set to `insert`, the JDBC sink will throw an error because of the constraint violation. But otherwise, you can most certainly do it that way.

-cc

Vitor Pereira

unread,
Feb 9, 2026, 10:05:03 AMFeb 9
to debezium
Hi, Chris. 

Ok, I'm aware of that.

Regarding the MERGE performance, do you have anything to suggest in order to mitigate?

Best regards,
Vitor Pereira

Chris Cranford

unread,
Feb 9, 2026, 12:51:10 PM (14 days ago) Feb 9
to debe...@googlegroups.com
Hi Vitor -

I wonder if we were to use the "/* APPEND */" hint if that would help in your case, particularly for the stream of inserts. This would request Oracle to use direct inserts.  But the fact the table has 4 columns as the primary key is the most likely reason for the slow behavior. If the index is poorly constructed or if the table has no recent statistics gathered on it, both will impact the performance that MERGE will have.

Can you share how many rows are in the table and how often statistics are gathered?

-cc

Vitor Pereira

unread,
Feb 18, 2026, 10:51:11 AM (5 days ago) Feb 18
to debezium
Hi, Chris.

Hope you're doing well.

I had some issues and was not possible to work on this topic until now. 

Regarding the table, it has 21M rows and the statistics were gathered on november 19th. In those statistics, the number of rows points to 20,8M. 

We also thought that the process might be impacted due to DML during the sink process. We made a new test, in a controlled environment, and the performance maintained constant during the process. 

I've attached an image which shows the bytes in/out during the snapshot+sink process. This test included 7 tables, 5 with just a few rows, 1 table with 21M and another with ~6M. Is clear from the image that the snapshot performance outperformed the sink performance. Can I mitigate this fact?

I'm doing CDC in docker containers. This machine as 64GB of RAM and 4 cores. 

I'll provide an example of my sink connector:


{
    "name": "xxxx",
    "config": {
        "connector.class": "io.debezium.connector.jdbc.JdbcSinkConnector",
        "tasks.max": "1",
        "topics": "TOPICNAME",
        "connection.url": "XXXXX",
        "connection.username": "USER",
        "connection.password": "PW",
        "quote.identifiers": "true",
        "schema.evolution": "none",
        "insert.mode": "upsert",
        "key.converter": "io.confluent.connect.avro.AvroConverter",
        "key.converter.schema.registry.url": "http://schema-registry:8081",
        "value.converter": "io.confluent.connect.avro.AvroConverter",
        "value.converter.schema.registry.url": "http://schema-registry:8081",
        "delete.enabled": "true",
        "primary.key.mode": "record_key",
        "primary.key.fields": "PK1,PK2,PK3,PK4",
        "table.name.format": "XXXXX"
    }
}

Is there anything that I can change to achieve better performance? I can show you my compose either, if its necessary.

Thanks
snapshot2.png
snapshot.png

Vitor Pereira

unread,
Feb 18, 2026, 11:12:37 AM (5 days ago) Feb 18
to debezium
Besides this, in the last few tests, there was some time that the sink connect did not send anything record to the database despite the fact that the lag was above 0. Is this behavior normal?

Chris Cranford

unread,
Feb 18, 2026, 12:03:14 PM (5 days ago) Feb 18
to debe...@googlegroups.com
Hi,

If the lag is from the source connector, and IIRC that is Oracle, that makes sense. 

The connector automatically insists on a ~1s to ~3s latency window to avoid over-burdening the source database, even when there are no changes to be sent to the sink. The source connector captures transaction markers and other non-DML events that are related to transactions that may not contain any changes for your include tables.  So that's perfectly normal. 

In terms of performance, there's only one thing that comes to mind, and this depends on your data pattern. If you have situations where you have multiple changes for the same primary key frequently in the same batch, you could consider enabling the reduction buffer. This reduces the write overhead by only storing the last event for a given primary key. But if you have significantly more unique changes per batch, then unfortunately the reduction buffer won't help very much.

The other thing to ask the DBA is to see if when you have event volume, is the MERGE statement using the primary key index? If it isn't for some reason, that would explain the poor write performance. If it is, then what is the composition of the primary key? Is it a bunch of VARCHAR2 fields or what exactly?

-cc

Vitor Pereira

unread,
Feb 18, 2026, 7:41:27 PM (5 days ago) Feb 18
to debezium
Hi,

I will respond appropriately in the morning.

I just stopped by to say that I started the snapshot at 12:30 p.m. — 7 tables (5 small ones and 2 with 21M and ~6M). It is now 12:37 a.m., and I just checked the progress and found that the consumer delay for the 6M table is over 5M.
My sink connector stopped sending rows at 6:30 p.m. Is there anything I can do to fix this?


Best regards
consumer_lag.png

Chris Cranford

unread,
Feb 19, 2026, 12:45:44 AM (4 days ago) Feb 19
to debe...@googlegroups.com
Hi -

How many partitions do the source topic have? If it has more than one, have you considered increasing `tasks.max` so that you perform parallel writes to the target table? That's one the easiest and best ways to increase consumer throughput by running multiple tasks.


-cc
Reply all
Reply to author
Forward
0 new messages