When Debezium marks snapshot as complete

1,949 views
Skip to first unread message

Bhupendra Baraiya

unread,
Jul 7, 2022, 12:04:19 AM7/7/22
to debezium
I observed that if I pass multiple table in table.whitelist property still I see only one snapshot_completed as True in offset topic, I have few queries regarding the when snapshot is marked as True

1) If I pass 5 tables in table.whitelist property and set snapshot = Initial only then load the data and then I get snapshot_completed as True in offset topic. I again run the job and this time for same 5 tables I change the property snapshot = Initial then will the Debezium again load the snapshot for those 5 tables

2) If I run the job by doing whitelisting at schema level using schema.whitelist property and load snapshots for all tables and a new table gets added in database in the same schema, will Debezium treat this as config change and reload snapshot for all tables

3) I configured around 181 tables in whitelist, for 148 table job ran fine and I got snapshot_completed =  True but for few tables job got stuck so I re ran it. This time it first I saw snapshot load is skipped message and then it started again loading the remaining tables and I again started seeing snapshot_completed =  False in my offset topic
Below is cloud watch log for your reference
Does this mean we can have multiple snapshot_completed =  True for same list of tables

[2022-07-06 18:46:02,755] INFO A previous offset indicating a completed snapshot has been found. Neither schema nor data will be snapshotted. (io.debezium.connector.sqlserver.SqlServerSnapshotChangeEventSource:63)

[2022-07-06 18:46:05,594] INFO [Consumer clientId="Name removed", groupId= "Name removed"  ] Adding newly assigned partitions: history topic name-0 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:288)

[2022-07-06 18:46:05,602] INFO [Consumer clientId= "Name removed"  , groupId= "Name removed"  ] Found no committed offset for partition  history topic name  -0 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1348)

[2022-07-06 18:46:05,603] INFO [Consumer clientId= "Name removed"  , groupId= "Name removed"  ] Resetting offset for partition  history topic name  -0 to offset 0. (org.apache.kafka.clients.consumer.internals.SubscriptionState:397)

[2022-07-06 18:46:05,676] INFO According to the connector configuration both schema and data will be snapshotted (io.debezium.connector.sqlserver.SqlServerSnapshotChangeEventSource:70)

[2022-07-06 18:46:05,677] INFO Previous snapshot was cancelled before completion; a new snapshot will be taken. (io.debezium.relational.RelationalSnapshotChangeEventSource:90)

Chris Cranford

unread,
Jul 7, 2022, 8:42:07 AM7/7/22
to debe...@googlegroups.com, Bhupendra Baraiya
Hi Bhupendra, see inline.


On 7/7/22 00:04, Bhupendra Baraiya wrote:
I observed that if I pass multiple table in table.whitelist property still I see only one snapshot_completed as True in offset topic, I have few queries regarding the when snapshot is marked as True

1) If I pass 5 tables in table.whitelist property and set snapshot = Initial only then load the data and then I get snapshot_completed as True in offset topic. I again run the job and this time for same 5 tables I change the property snapshot = Initial then will the Debezium again load the snapshot for those 5 tables

No.  The snapshot phase only runs if there are no offsets or if the offsets indicate that a prior snapshot had not concluded successfully, otherwise the snapshot phase will exit immediately and the streaming phase will begin.


2) If I run the job by doing whitelisting at schema level using schema.whitelist property and load snapshots for all tables and a new table gets added in database in the same schema, will Debezium treat this as config change and reload snapshot for all tables

No, under no circumstance will a connector perform a snapshot on its on (outside of snapshot_when_needed with MySQL) if a prior snapshot had finished successfully.  Once a connector has successfully completed a snapshot, the connector will only ever stream changes.  When a new table is added that would result in it being automatically picked up by the connector's filter configuration, the table create DDL will register the new table in the connector's relational model and any further changes (inserts/updates/deletes) will be captured by the connector.  In this use case, there shouldn't be a need for a snapshot but you can always trigger an incremental snapshot if one is required for any table if you have the connector configured to support signals.


3) I configured around 181 tables in whitelist, for 148 table job ran fine and I got snapshot_completed =  True but for few tables job got stuck so I re ran it. This time it first I saw snapshot load is skipped message and then it started again loading the remaining tables and I again started seeing snapshot_completed =  False in my offset topic
Below is cloud watch log for your reference
Does this mean we can have multiple snapshot_completed =  True for same list of tables

The offset topic will be shared by multiple connectors, so if you have multiple connectors running, a dump of the topic would have different records with differing state depending on the success of the respective snapshot.  Additionally, if you change certain aspects of the connector configuration, such as the name of the connector deployment, then the offsets will be written to a different position in the offset topic and this will mean that a dump of the offset topic will show different values because the key to each offset is different due to the name change. 

Regarding the log entries below, is it possible this is from two different connectors?  I don't see how those two log entries would occur for the same connector.

[2022-07-06 18:46:02,755] INFO A previous offset indicating a completed snapshot has been found. Neither schema nor data will be snapshotted. (io.debezium.connector.sqlserver.SqlServerSnapshotChangeEventSource:63)

[2022-07-06 18:46:05,594] INFO [Consumer clientId="Name removed", groupId= "Name removed"  ] Adding newly assigned partitions: history topic name-0 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:288)

[2022-07-06 18:46:05,602] INFO [Consumer clientId= "Name removed"  , groupId= "Name removed"  ] Found no committed offset for partition  history topic name  -0 (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1348)

[2022-07-06 18:46:05,603] INFO [Consumer clientId= "Name removed"  , groupId= "Name removed"  ] Resetting offset for partition  history topic name  -0 to offset 0. (org.apache.kafka.clients.consumer.internals.SubscriptionState:397)

[2022-07-06 18:46:05,676] INFO According to the connector configuration both schema and data will be snapshotted (io.debezium.connector.sqlserver.SqlServerSnapshotChangeEventSource:70)

[2022-07-06 18:46:05,677] INFO Previous snapshot was cancelled before completion; a new snapshot will be taken. (io.debezium.relational.RelationalSnapshotChangeEventSource:90)

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/4371d99f-ebfe-452b-8b8b-f9fdb8b3b77en%40googlegroups.com.

Gaurav Gehlot

unread,
Jan 11, 2024, 4:59:33 AM1/11/24
to debezium

Hi

I have the same issue with respect to the Mysql connector, where the snapshot status is being captured as skipped when I get the following issue

ERROR Encountered change event 'Event{header=EventHeaderV4{timestamp=1704950917000, eventType=TABLE_MAP, serverId=1146506, headerLength=19, dataLength=72, nextPosition=130997902, flags=0}, data=TableMapEventData{tableId=229033, database='profilemod', table='USER_PROFILE_HISTORY_228', columnTypes=3, 3, 3, 3, 15, 15, -4, 3, 17, 15, columnMetadata=0, 0, 0, 0, 1, 500, 2, 0, 0, 20, columnNullability={3, 4, 5, 6, 9}, eventMetadata=null}}' at offset {transaction_id=null, file=n5plnanaapp59_profilemod.006301, pos=130997743, server_id=1146506, event=1} for table profilemod.USER_PROFILE_HISTORY_228 whose schema isn't known to this connector. One possible cause is an incomplete database history topic. Take a new snapshot in this case.

After renaming or deleting the history topic in the configuration of the connector, the logs of the worker showing 

[2024-01-11 11:57:50,026] INFO Snapshot ended with SnapshotResult [status=COMPLETED, offset=MySqlOffsetContext [sourceInfoSchema=Schema{io.debezium.connector.mysql.Source:STRUCT}, sourceInfo=SourceInfo [currentGtid=null, currentBinlogFilename=n5plnanaapp59_profilemod.006301, currentBinlogPosition=130955380, currentRowNumber=0, serverId=0, sourceTime=2024-01-11T06:27:50.015Z, threadId=-1, currentQuery=null, tableIds=[null], databaseName=profilemod], partition={server=Database-Debezium-Connector-profilemod}, snapshotCompleted=true, transactionContext=TransactionContext [currentTransactionId=null, perTableEventCount={}, totalEventCount=0], restartGtidSet=null, currentGtidSet=null, restartBinlogFilename=n5plnanaapp59_profilemod.006301, restartBinlogPosition=130955380, restartRowsToSkip=1, restartEventsToSkip=2, currentEventLengthInBytes=0, inTransaction=false, transactionId=null]] (io.debezium.pipeline.ChangeEventSourceCoordinator:111)

before fixing the topic, reposting the config after deleting, the issue recurring as below

[2024-01-11 11:45:12,210] INFO Snapshot ended with SnapshotResult [status=SKIPPED, offset=MySqlOffsetContext [sourceInfoSchema=Schema{io.debezium.connector.mysql.Source:STRUCT}, sourceInfo=SourceInfo [currentGtid=null, currentBinlogFilename=n5plnanaapp59_profilemod.006301, currentBinlogPosition=130955380, currentRowNumber=0, serverId=0, sourceTime=null, threadId=-1, currentQuery=null, tableIds=[], databaseName=null], partition={server=Database-Debezium-Connector-profilemod}, snapshotCompleted=false, transactionContext=TransactionContext [currentTransactionId=null, perTableEventCount={}, totalEventCount=0], restartGtidSet=null, currentGtidSet=null, restartBinlogFilename=n5plnanaapp59_profilemod.006301, restartBinlogPosition=130955380, restartRowsToSkip=1, restartEventsToSkip=2, currentEventLengthInBytes=0, inTransaction=false, transactionId=null]] 

could you clarify how the snapshot is marked completed when I deleted or renamed the history topic but not with the prior config reposting?

thanks


jiri.p...@gmail.com

unread,
Jan 15, 2024, 2:04:44 AM1/15/24
to debezium
Hi,

in case of MySQL - have you tried to use `schema_only_recovery` snapshot mode?

Jiri

Gaurav Gehlot

unread,
Jan 15, 2024, 4:14:38 AM1/15/24
to debe...@googlegroups.com
Yes,

I have used the configuration.

You received this message because you are subscribed to a topic in the Google Groups "debezium" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/debezium/Yz2FdhYqhd0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/b1ba593a-d3b8-4b3f-8258-ba2e0b3929c8n%40googlegroups.com.

jiri.p...@gmail.com

unread,
Jan 16, 2024, 1:02:01 AM1/16/24
to debezium
Hi,

if you look at the internal schema history topic, is there a message/definition related to the missing table?

Jiri

Gaurav Gehlot

unread,
Jan 16, 2024, 2:14:49 AM1/16/24
to debe...@googlegroups.com
Hi,

I checked when I got the issue, the history topic did not contain the info with respect to the schema which was throwing the not known issue when there was any update for that table.

I believe this was due to the incomplete snapshot when the connector was restarted, but I wan to know why this was fixed by deleting or renaming the history topic and when does it mark the snapshot status completed?

Thanks 


jiri.p...@gmail.com

unread,
Jan 17, 2024, 6:54:48 AM1/17/24
to debezium
Hi,

I don't think this was due to the incomplete snapshot but due to part of the data lost from the topic. It was fixed because when the history topic is lost and schema_only_recovery snapshot mode is enabled then Debezium re-takes the snapshot of the schema.

J.

Gaurav Gehlot

unread,
Jan 17, 2024, 10:00:52 AM1/17/24
to debe...@googlegroups.com
Hi, 
 
Agreed one more point when the config was deleted and reposted without any change or before deleting the history topic it did retake the snapshot but the error was not fixed and showing the snapshot status skipped.

As shared the above logs.

Thanks & Regards

Reply all
Reply to author
Forward
0 new messages