Heartbeat Implementation Details

Skip to first unread message

Josh Ribera

Jul 6, 2021, 2:18:12 PM7/6/21
to debezium

I was hoping to get some clarification on how the heartbeat works. After looking through the code I see that the heartbeat is emitted once a data change event occurs. What we are hoping to do is have the connector emit to the heartbeat topic on the interval defined. I found a few posts regarding this on gitter but they're from a few years ago so I'm not sure if the recommendations are still valid (create a dummy table, create some change event every X seconds/minutes/hours). I mostly wanted to refresh this topic and see if there are any new recommendations.


-- Josh

Chris Cranford

Jul 6, 2021, 8:36:07 PM7/6/21
to debe...@googlegroups.com
Hi Josh -

There are two heartbeat concepts in Debezium and it depends entirely on which problem you're trying to solve.

In PostgreSQL, we have the "heartbeat.action.query" which is meant as a way to keep events periodically coming into Debezium from PostgreSQL. This process continues to advance the replication slot's LSNs forward so that the database transaction log doesn't grow out of proportion.  This is needed particularly in cases where the table being captured changes infrequently compared relative to other tables in the database. 

The other concept revolves around the "heartbeat.interval.ms" setting which controls how frequent does the connector emit a simple event to Kafka.  The goal of the heartbeat event is to prevent the connector's offset data from becoming stale in Kafka.  This is particularly useful for a connector that receives periodic events from the source, requires updating its offsets, but the events read don't cause an event to be emitted to Kafka to sync offsets.  By emitting a heartbeat event on a semi-regular interval when necessary, this allows keeping Kafka's offset data aligned with the connector so that in the event that the connector is restarted, it starts where it left off rather than some point in time in the past due to old offset data.

Now, it is important to note that how "heartbeat.interval.ms" is managed varies by connector depending on the integration with the source system's transaction log.

In MongoDB as an example, we use a non-blocking loop to check whether any new events have been written to its log.  If no events are written, we do nothing but do brief pause and then re-poll.  If an event is read and the event is processed by the connector, we again don't emit any heartbeat events.  We only emit heartbeats if we read an event but its not an event the connector cares about.  This allows us to replicate the offsets oplog position to Kafka even though the event wasn't anything of interest.

In PostgreSQL we also use a non-blocking IO call to check for changes.  But unlike above, if no event is returned and we have at least processed 1 event since the connector started, a Heartbeat will always be emitted as long as the Heartbeat's interval timer has expired.

If you look at other connectors, you'll see again the interaction with emitting heartbeats varies depending on the use case.

You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/64e55148-8294-4dc9-902c-bd275f51dde6n%40googlegroups.com.

Reply all
Reply to author
0 new messages