Hello, and thank you for pointing out that option.
The following is a summary of how one might go about replaying events directly from the WAL:
1. debezium.source.flush.lsn.source is set to false in the Debezium server configuration
2. From the Debezium docs related to this option: "User is expected to handle the acknowledgement of LSN outside Debezium"
And without doing this, WAL would (could?) have unbounded growth and disk consumption. One option involves executing:
select pg_replication_slot_advance('SLOT_NAME, 'PG_LSN');
Where SLOT_NAME is the logical replication slot to which Debezium has subscribed, and PG_LSN is the WAL location representing some point in the past. So, for example, if we wanted to retain two days of WAL, a scheduled script could run each day and a) persist `select pg_current_wal_lsn();` along with some date or time info and b) find the LSN from 2 days prior using wherever the script is storing data and execute the pg_replication_slot_advance function. Calling that function seems to not work if Debezium is connected, however, so Debezium would need to be temporarily stopped and restarted again after the slot advance operation.
3. If for whatever reason we need or want to replay directly from the WAL (assuming it is within the time range, e.g 2 days, of the script described above), that would require:
a) pausing the Debezium server process
b) finding the correct variable values for editing Debezium's offsets.dat file - I haven't been able to find much info or discussion or documentation on this (w/out diving into Debezium's postgres connector code), but it would involve editing one, some, or all of: lsn_proc, messageType, lsn_commit, lsn, txId, ts_usec
In the few tests I've done thus far on this, I updated each of the variables above to match values captured in the past, and it indeed worked.
Debezium replayed and re-published messages to its PubSub destination topic.
It would be great if anyone had any insight on 1) is it really necessary to stop Debezium before running pg_replication_slot_advance ? and 2) for the scenario of directly modifying offsets.dat, must all of those variables be reset to match a past LSN?
thanks