Signal not working as expected to initiate a initial snapshot

129 views
Skip to first unread message

Shahul Nagoorkani

unread,
Sep 23, 2025, 4:29:32 PM (8 days ago) Sep 23
to debezium
Hello folks,

We are trying hard to make the signal work with our postgres connector(as well as oracle) but it's not working as expected.

We use both Source,kafka to issue the signals:

with Source: When we insert a record into the table, debezium doesn't even pick up the message and start the initial load:

INSERT INTO discard.debezium_discard_configbased_signal (id, type, data)
VALUES (
  'backfill-2025-09-23-01',
  'execute-snapshot',
  '{"type":"incremental","data-collections":["discard.orders","discard.order_items"]}'
);

With Kafka: When a message was published, we see the message gets picked up and inserts two rows in the the database signal table as:

"backfill-2025-09-23-01" "execute-snapshot" "{""type"":""incremental"",""data-collections"":[""discard.orders"",""discard.order_items""]}"
"5ae7e50a-8d7f-4245-8042-62de5be72715-open" "snapshot-window-open" "{""openWindowTimestamp"": ""2025-09-23T18:49:58.124690769Z""}"
"5ae7e50a-8d7f-4245-8042-62de5be72715-close" "snapshot-window-close" "{""openWindowTimestamp"": ""2025-09-23T18:49:58.124690769Z"", ""closeWindowTimestamp"": ""2025-09-23T18:49:58.341435949Z""}"

We see the few entries in the logs that incremental snapshot would be started but we don't see any initial loads happening:

{"stream":"stdout","timestamp":1758653398121,"log":"2025-09-23 18:49:58,121 INFO [debezium-connector-olpqa-discard-test|task-0] Requested 'INCREMENTAL' snapshot of data collections '[discard.orders, discard.order_items]' with additional conditions '[]' and surrogate key 'PK of table will be used' (io.debezium.pipeline.signal.actions.snapshotting.ExecuteSnapshot) [debezium-postgresconnector-olpqa-discard-test-SignalProcessor]"}

{"stream":"stdout","timestamp":1758653398303,"log":"2025-09-23 18:49:58,303 INFO [debezium-connector-olpqa-discard-test|task-0] Incremental snapshot for table 'discard.orders' will end at position [REDACTED] (io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource) [debezium-postgresconnector-olpqa-discard-test-SignalProcessor]"}

Here is our connector configuration:

# Full load on first run, then CDC
snapshot.mode: configuration_based
snapshot.mode.configuration.based.snapshot.schema: true
snapshot.mode.configuration.based.snapshot.data: false
snapshot.mode.configuration.based.start.stream: true
#incremental.snapshot.enabled: true
incremental.snapshot.chunk.size: 10000
snapshot.fetch.size: 20000
snapshot.max.threads: 3

#Debezium Engine/Queing Settings
max.batch.size: 8192
max.queue.size: 65536
max.queue.size.in.bytes: 536870912 #512MB
# Schema refresh for DDL changes
# Capture DDL events
include.schema.changes: true
plugin.name: pgoutput
database.hostname: <dbhostname>
database.port: 5432
database.user: ${secrets:dba-debezium/colp-dba-secret:username}
database.password: ${secrets:dba-debezium/colp-dba-secret:password}
database.dbname: olpqa-db-aurora
publish.via.partition.root: true
topic.prefix: olpqa-discard-test
topic.creation.enable: true
topic.creation.default.replication.factor: -1
topic.creation.default.partitions: -1
topic.creation.default.max.message.bytes: 4194304
schema.include.list: discard
table.include.list: discard.orders,discard.order_items
# Replication slot configuration
slot.name: app_pub
# keep failing on real errors
errors.tolerance: none
# be more patient with transient failures (incl. SR connect failures)
errors.retry.timeout: 600000 # 10m total retry budget (pick your SLO)
errors.retry.delay.max.ms: 30000 # backoff cap between retries (e.g., 30s)
# optional: better logs while it retries
errors.log.enable: true
errors.log.include.messages: true

# Signal configuration for incremental snapshots
signal.kafka.bootstrap.servers: <server names>
signal.kafka.topic: olpqa-discard.signal
signal.enabled.channels: source,kafka
signal.data.collection: discard.debezium_discard_configbased_signal
signal.consumer.security.protocol: SASL_SSL
signal.consumer.sasl.mechanism: SCRAM-SHA-512
signal.consumer.sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule required username=cdo password=${secrets:dba-debezium/msk-secret-kafka:kafka-connect-msk-password};
signal.producer.security.protocol: SASL_SSL
signal.producer.sasl.mechanism: SCRAM-SHA-512
signal.producer.sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule required username=cdo password=${secrets:dba-debezium/msk-secret-kafka:kafka-connect-msk-password};

Any idea why the initial snapshot is not getting triggered? This is a blocker for us at this moment in triggering the initial snapshot for several huge tables. We had issues with going with "initial" snapshot mode as the snapshot failed couple of times due to various reasons and connector kept doing the full loads from the beginning. If the signal works as expected, we would trigger the incremental snapshots and even with any connector restarts, the full load will continue from the point where it left off thus avoiding the full load from the beginning.

Regards,
Shahul Nagoorkani

Shahul Nagoorkani

unread,
Sep 23, 2025, 5:27:32 PM (8 days ago) Sep 23
to debezium

Enabled the trace logging and is attached. 
search-results-2025-09-23T14_25_09.094-0700.csv.zip

Chris Cranford

unread,
Sep 24, 2025, 12:24:07 AM (8 days ago) Sep 24
to debe...@googlegroups.com
Hi -

The incremental snapshots were executed

    Finished exporting 3000 records for window of table table 'discard.orders'; total duration '00:00:00.031'

But it does seems from the logs that there is some connectivity issue between Kafka Connect and the Kafka broker.

    [Producer clientId=connector-producer-debezium-connector-olpqa-discard-test-0] Node 1 disconnected.
    [Producer clientId=connector-producer-debezium-connector-olpqa-discard-test-0] Node 3 disconnected.
    [Consumer clientId=0137068c-57ed-4414-808c-e9f70a825d28, groupId=kafka-signal] Disconnecting from node -1 due to socket connection setup timeout. The timeout value is 11308 ms.

Have you checked the network stability between Kafka Connect and the broker?

-cc
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/4d78a88c-6af5-4031-8ee0-269c12e9aa7bn%40googlegroups.com.

Shahul Nagoorkani

unread,
Sep 24, 2025, 1:12:54 AM (8 days ago) Sep 24
to debezium
Hello Chris,

Thanks for your response.

The signal that we sent to debezium was to start a full load of two tables discard.orders and discard.order_items. We issued atleast 6 such messages since this morning and the signal rarely picked up only a couple so far and processed just one table in the signal instead of processing the initial load for 2 tables.

In between we inserted few thousand records into both the table and the CDC hasn't picked up those changes as well.

Both Kafka connect and the broker are deployed in the same AWS account and vpc but will check if there are any disconnects happening. 

Certainly the signal processor is not working as expected. We just issued another signal to do the initial load for just the second table "discard.order_items" and haven't seen any initial load started so far.

Regards,
Shahul Nagoorkani

Shahul Nagoorkani

unread,
Sep 24, 2025, 2:04:37 AM (8 days ago) Sep 24
to debezium
Attaching the latest logs where the incremental never kicked off for the other table.

Regards,
Shahul
search-results-2025-09-23T23_02_09.554-0700.csv.zip

Mario Fiore Vitale

unread,
Sep 24, 2025, 4:17:08 AM (8 days ago) Sep 24
to debezium
Hi Shahul, 

just a question: was the publication /slot created manually on postgres?

Thanks, 
Mario.

Shahul Nagoorkani

unread,
Sep 24, 2025, 10:44:07 AM (8 days ago) Sep 24
to debezium
Hello Mario,

The publication was created manually with two tables involved. The slot was automatically created when the connector first started. But we generally create the slots manually before starting the connector. 

Regards,
Shahul Nagoorkani

Shahul Nagoorkani

unread,
Sep 24, 2025, 5:07:50 PM (7 days ago) Sep 24
to debezium
Adding to what I shared earlier, looks like the signal processor is kind of stuck in the same message though it says "closing the window" with just doing the initial load of one table. It goes through the loop everytime we start the connector, that the signal processor picks up the same "Message6"  again and again. 

{"stream":"stdout","timestamp":1758738071768,"log":{"method":"init","@timestamp":"2025-09-24T18:21:11.768Z","logger_name":"io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource","source_host":"kafka-connect-dba-colp-connect-1","line_number":"232","message":"Incremental snapshot in progress, need to read new chunk on start","class":"io.debezium.pipeline.source.snapshot.incremental.AbstractIncrementalSnapshotChangeEventSource","file":"AbstractIncrementalSnapshotChangeEventSource.java","@version":1.0,"level":"INFO","thread_name":"debezium-postgresconnector-olpqa-discard-test-change-event-source-coordinator","mdc":{"dbz.connectorContext":"streaming","connector.context":"[debezium-connector-olpqa-discard-test|task-0] ","dbz.connectorName":"olpqa-discard-test","dbz.databaseName":"olpqa-db-aurora","dbz.connectorType":"Postgres","dbz.taskId":"0"}}

{"stream":"stdout","timestamp":1758738071389,"log":{"method":"getPreviousOffsets","@timestamp":"2025-09-24T18:21:11.388Z","logger_name":"io.debezium.connector.common.BaseSourceTask","source_host":"kafka-connect-dba-colp-connect-1","line_number":"532","message":"Found previous partition offset PostgresPartition [sourcePartition={server=olpqa-discard-test}]: {incremental_snapshot_correlation_id=Message6, lsn_proc=4771510602344, messageType=INSERT, lsn_commit=4771431222312, lsn=4771431222312, incremental_snapshot_maximum_key=aced0005757200135b4c6a6176612e6c616e672e4f626a6563743b90ce589f1073296c0200007870000000017372000e6a6176612e6c616e672e4c6f6e673b8be490cc8f23df0200014a000576616c7565787200106a6176612e6c616e672e4e756d62657286ac951d0b94e08b02000078700000000000000bb8, txId=3153713697, incremental_snapshot_collections=[{\"incremental_snapshot_collections_id\":\"discard.orders\",\"incremental_snapshot_collections_additional_condition\":null,\"incremental_snapshot_collections_surrogate_key\":null},{\"incremental_snapshot_collections_id\":\"discard.order_items\",\"incremental_snapshot_collections_additional_condition\":null,\"incremental_snapshot_collections_surrogate_key\":null}], ts_usec=1758686851701432, incremental_snapshot_primary_key=aced000570}","class":"io.debezium.connector.common.BaseSourceTask","file":"BaseSourceTask.java","@version":1.0,"level":"INFO","thread_name":"task-thread-debezium-connector-olpqa-discard-test-0","mdc":{"connector.context":"[debezium-connector-olpqa-discard-test|task-0] "}}}

Mario Fiore Vitale

unread,
Sep 25, 2025, 2:37:49 AM (7 days ago) Sep 25
to debezium
Hi Shahul, 

the signal table must have the CDC enabled so you need to include in the publication. Can you give it a try?

Thanks, 
Mario.

Shahul Nagoorkani

unread,
Sep 25, 2025, 10:12:19 AM (7 days ago) Sep 25
to debezium
Hi Mario,

We have tried that before but not in this connector but never had any major success with the signals. But as per the latest 3.2 connector, most of the searches that we encountered have been saying that it's not required to add the "signal" table to the publication list and will be handled autotmatically by the connector.

Here is the excerpt from our chatgpt friend:

with debezium 3.2 and strimzi, do we need to include the source signal table to the include list for aurora postgres connector
ChatGPT said:
Thought for 46s

Short answer: No.
With Debezium 3.2 on Strimzi for an Aurora PostgreSQL connector, you do not need to add the source signal table to table.include.list. You only need to:

  1. Create the signal table in Postgres and

  2. Point the connector to it with signal.data.collection="<schema>.<table>". Debezium

Why: in 3.2 the source signaling channel is supported natively; Debezium reads the signal table directly (not via CDC for Postgres), and the doc’s special CDC requirement applies to Db2/SQL Server only.

When we use the kafka as signal source, we see the table gets picked up for incremental load, it reads the first chunk from the database based on the "incremental.snapshot.chunk.size" but doesn't seems to proceed further with writing the incremental chunks into the kafka nor proceed with the next chunk. Though we see a watermarking record getting inserted into the signal table, we see the last record in the notification channel topic as "IN_PROGRESS(Incremental Snapshot $last_processed_key 10000" and the connector never get past to the next stages.

We just added the signal table also to the publication to see if it changes the behavior in any way. Right now, we have 4 tables in the publication:

orders
order_items
discard.heartbeat_table
discard.debezium_discard_configbased_signal

Is it possible to have a zoom call to go over this issue?

Regards,
Shahul

Shahul Nagoorkani

unread,
Sep 25, 2025, 12:45:05 PM (7 days ago) Sep 25
to debezium
Hello Mario,

By adding the signal table to the publication and the table include list seems to have helped so far. Trying out few more things like adding a new table in the connector configuration and trigger a initial snapshot. Hope there should not be any problems.

Are there any suggestions around performing the incremental initial load for tables which doesn't have any primary keys?

We had similar problems in Oracle as well. We are going to do similar tests in Oracle and will confirm if it goes well. 

Regards,
Shahul Nagoorkani

Chris Cranford

unread,
Sep 26, 2025, 12:36:47 AM (6 days ago) Sep 26
to debe...@googlegroups.com
Hi,

So the answer from ChatGPT isn't incorrect, the signal table is not required in the `table.include.list`, but if you manually create the publication, it's important you add the signal table to the publication table list or let the connector manage the publication using either filtered or all_tables mode, depending on what permissions you can grant to the Debezium user.

For tables with no primary key, you have a couple options:

    - Specify a surrogate-key column in the signal payload if there is one column that is unique for all rows.
    - Configure the table using `message.key.columns` in the connector configuration if there are multiple columns that make a unique key.
    - Define a unique index on the table, if possible.

If you cannot do one of the above three options, I'm afraid incremental snapshots aren't possible for that table and you'll have to rely only on a blocking snapshot for that table.

Thanks,
-cc

Shahul Nagoorkani

unread,
Sep 26, 2025, 10:08:44 AM (6 days ago) Sep 26
to debezium
Thanks Chris for the suggestions. We will try the options and see how it goes.

Regards,
Shahul Nagoorkani

Reply all
Reply to author
Forward
0 new messages