Snapshot mode and possible data loss

Skip to first unread message

Frans Guelinckx

May 4, 2021, 5:41:21 AMMay 4
to debezium

We have debezium connected to a multi-master mysql cluster with snapshot mode on 'Initial'. Sometimes we get these errors:

org.apache.kafka.connect.errors.ConnectException: The connector is trying to read binlog starting at GTIDs 6372424d-a375-ee14-49d4-68734c2fd7fb:1-377469,9aa5d14c-5c8b-11eb-a22a-005
056a27f08:1 and binlog file 'mysql-bin.000095', pos=58856251, skipping 0 events plus 0 rows, but this is no longer available on the server.
Reconfigure the connector to use a snapshot when needed.

First of all we would like to have a better understanding of what is happening here. Is it 'normal' that this exception occurs in a percona/mysql multi-master setup sometimes? For example when 1 of the mysql nodes gets cut off the other nodes and debezium was connected to that node?

When looking for a solution, I understand from the documentation that we need some kind of snapshot mode and 'when_needed' seems to be the most suitable in our case. Some aspects of it are still unclear to us though.
- is data loss possible with this snapshot mode?
- or the other way around: is it possible that some events get sent out twice by debezium?

Thanks in advance for your reply!

May 5, 2021, 7:40:14 AMMay 5
to debezium

this error implies that

1) Streaming has already been started
2) Debezium has already processed some transactions and records their GTIDs
3) Some transactions that Debezium has not seen yet are not available on the server to which it is connected

I recommend to check the purged gitd set on the server to see if it is really the culprit.


Reply all
Reply to author
0 new messages