Hi, I have a few questions related to running Debezium Serer against SQL Server -
1. Is it correct that the
poll.interval.ms determines how often the connector checks change tables for records with an LSN greater than the connector's max LSN? What I'm getting at here is, it seems that SQL Server CDC has a batch cycle for populating the change tables, and the SQL Server Debezium connector has a cycle on which it polls for records in the change table. Just want to make sure I understand that correctly.
2. Is it correct that these most important factors in understanding total latency?
- SQL Server CDC parameters: maxscans, maxtrans, and pollinginterval
- Connector config:
poll.interval.ms, max.queue.size, max.batch.size, and
snapshot.fetch.size
So I would expect latency to be (very) roughly equivalent to
pollinginterval (MSSQL) +
poll.interval.ms (Connector) given a situation where there is no backlog on either the SQL Server or Connector side?
3. From step 2 of
Offline Schema Updates how do you know when the Debezium connector has streamed all unstreamed change event records. Is this accomplished by observing the
MilliSecondsSinceLastEvent streaming metric to make sure there's no ongoing activity or is there a procedure to compare max LSNs between capture instance and connector offset storage? I'm trying to determine how I can be sure that it's safe to delete an old capture instance when a new one is created after a schema change.
Thanks!
Adam