PostgreSQL source connector "Last recorded offset is no longer available on the server."

83 views

Skip to first unread message

Antonio Bardazzi

unread,

Oct 15, 2025, 8:20:58 AMOct 15

to debezium

First of all many thanks for this very useful project.

I've 4 Debezium PostgreSQL source connectors which every 1 or 2 weeks force an
initial snapshot: it's expected beacause they are configured "when_needed" but
I want to avoid that due the size of the source.

In addition of Debezium I've steadly connected 2 physical slots without any
issues since months.

The publication and source connector are for 3 tables, one of them could have
records that change the PK values: I'm referring to that because I know it is

correctly managed by Debezium but could be a source of problems in general.

No DDL or PostgreSQL instance issue around the event.

No publication auto creation.

No slot auto drop.

Already tried:
- Same happen if the source connector replicates from primary or from a secondary
- wal_keep_size=64MB hoping the slot could benefit from that

The monitoring system which sample every 5min doesnt show any particular lag
around the event.

Among other I've read this useful article about PostgreSQL and replicationslot with Debezium which stated:

"events with a lower LSN can be published after events with a higher LSN"

in case of overlapping long running transactions, which is a normal situation,
and then the article refers to Debezium "provide.transaction.metadata" but it
seems useful only to fast ack LSN inside a transaction without waiting for the
end of the transaction itself, not in case of an old LSN.

For your experience there is any workaround for that or something that I'm

missing in Debezium or PostgreSQL configuration or monitoring?

It there is not a workaround someone have an example on how to manually flush the slot outside of Debezuim when configuring flush.lsn.source=false?

Thanks

Versions

Debezium 3.1.0.Final - Maybe is worth of an upgrade?
PostgreSQL 17
Kafka 3.9.0

Log of the event

Log when the source connector found the recorded lsn is not found anymore in the replication slot:

21:34:58 INFO Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLsn=LSN{14/C5468F90}, catalogXmin=1638961]
21:34:58 WARN Last recorded offset is no longer available on the server.
21:34:58 INFO The last recorded offset is no longer available but we are in when_needed snapshot mode. Attempting to snapshot data to fill the gap.
21:34:58 INFO Found previous offset PostgresOffsetContext [sourceInfoSchema=Schema{io.debezium.connector.postgresql.Source:STRUCT}, sourceInfo=source_info[server='connector.source'db='pg_db', lsn=LSN{14/C5468F90}, txId=1638961, messageType=UPDATE, lastCommitLsn=LSN{14/C5468F90}, timestamp=2025-10-14T21:15:26.271652Z, snapshot=FALSE, schema=, table=], lastSnapshotRecord=false, lastCompletelyProcessedLsn=LSN{14/C546BA20}, lastCommitLsn=LSN{14/C5468F90}, streamingStoppingLsn=null, transactionContext=TransactionContext [currentTransactionId=null, perTableEventCount={}, totalEventCount=0], incrementalSnapshotContext=IncrementalSnapshotContext [windowOpened=false, chunkEndPosition=null, dataCollectionsToSnapshot=[], lastEventKeySent=null, maximumKey=null]]

At this time the lag between lastCompletelyProcessedLsn=LSN{14/C546BA20} - lastCommitLsn=LSN{14/C5468F90} is 11Kb calculated as:

select pg_size_pretty(pg_wal_lsn_diff('14/C546BA20', '14/C5468F90')) lag;

When the initial snaphost run PostgreSQL log the select, they requires nearly 15min, and then the restart of replication using the same replication slot:

21:49:45 [502757] [Debezium General] pg_user@pg_db LOG: Query: SELECT … FROM "pg_db"."pg_table2"
21:50:43 [502757] [Debezium General] pg_user@pg_db LOG: Query: SELECT … FROM "pg_db"."pg_table1"
21:54:39 [502757] [Debezium General] pg_user@pg_db LOG: Query: SELECT … FROM "pg_db"."pg_table3"
21:54:39 [504370] [Debezium Streaming] pg_user@pg_db LOG: starting logical decoding for slot "pg_slot"
21:54:39 [504370] [Debezium Streaming] pg_user@pg_db DETAIL: Streaming transactions committing after 14/C5468F90, reading WAL from 14/C5468F90.
21:54:39 [504370] [Debezium Streaming] pg_user@pg_db STATEMENT: START_REPLICATION SLOT "pg_slot" LOGICAL 14/C54F7768 ("proto_version" '1', "publication_names" 'pg_slot', "messages" 'true')
21:54:39 [504370] [Debezium Streaming] pg_user@pg_db LOG: logical decoding found consistent point at 14/C5468F90
21:54:39 [504370] [Debezium Streaming] pg_user@pg_db DETAIL: There are no running transactions.
21:54:39 [504370] [Debezium Streaming] pg_user@pg_db STATEMENT: START_REPLICATION SLOT "pg_slot" LOGICAL 14/C54F7768 ("proto_version" '1', "publication_names" 'pg_slot', "messages" 'true')

At this time the lag between the lastCommitLsn at the disconnection time and the position of the initial snapshot is 570Kb calculated as:
select pg_size_pretty(pg_wal_lsn_diff('14/C54F7768', '14/C5468F90')) lag;

Full log

The source connector stop, in my expirience is often due apicurio:

21:33:06 INFO [connector.source|task-0] Obtained valid replication slot ReplicationSlot [active=true, latestFlushedLsn=LSN{14/C5468F90}, catalogXmin=1638961] (io.debezium.connector.postgresql.connection.PostgresConnection) [debezium-postgresconnector-connector.source-change-event-source-coordinator]
21:33:06 INFO [connector.source|task-0] 50 records sent during previous 00:24:30.541, last recorded offset of {server=connector.source} partition is {lsn_proc=89209556904, xmin=1638961, messageType=UPDATE, lsn_commit=89209556848, lsn=89209556904, txId=1639003, ts_usec=1760477586173970} (io.debezium.connector.common.BaseSourceTask) [task-thread-connector.source-0]
21:33:06 ERROR [connector.source|task-0] Failed to update cache value for key: {"type":"record","name":"Key","namespace":"….pg_table3","fields":[{"name":"…","type":"string"}],"connect.name":"….pg_table3.Key"} (io.apicurio.registry.resolver.ERCache) [task-thread-connector.source-0]
io.apicurio.registry.rest.client.exception.RestClientException: HTTP/1.1 header parser received no bytes
…
21:33:06 INFO [connector.source|task-0] Stopping down connector (io.debezium.connector.common.BaseSourceTask) [task-thread-connector.source-0]
21:33:06 INFO [connector.source|task-0] Connection gracefully closed (io.debezium.jdbc.JdbcConnection) [pool-11783-thread-1]
21:33:06 INFO [connector.source|task-0] Connection gracefully closed (io.debezium.jdbc.JdbcConnection) [pool-11784-thread-1]
21:33:06 INFO [connector.source|task-0] Finished streaming (io.debezium.pipeline.ChangeEventSourceCoordinator) [debezium-postgresconnector-connector.source-change-event-source-coordinator]
21:33:06 INFO [connector.source|task-0] Connected metrics set to 'false' (io.debezium.pipeline.ChangeEventSourceCoordinator) [debezium-postgresconnector-connector.source-change-event-source-coordinator]
21:33:06 INFO [connector.source|task-0] SignalProcessor stopped (io.debezium.pipeline.signal.SignalProcessor) [task-thread-connector.source-0]
21:33:06 INFO [connector.source|task-0] Debezium ServiceRegistry stopped. (io.debezium.service.DefaultServiceRegistry) [task-thread-connector.source-0]
21:33:06 INFO [connector.source|task-0] Connection gracefully closed (io.debezium.jdbc.JdbcConnection) [pool-11785-thread-1]
21:33:06 INFO [connector.source|task-0] [Producer clientId=connector-producer-connector.source-0] Closing the Kafka producer with timeoutMillis = 30000 ms. (org.apache.kafka.clients.producer.KafkaProducer) [task-thread-connector.source-0]
…

The source connector restart in less than 2min after the error:

21:34:58 INFO [connector.source|task-0] Kafka version: 3.9.0 (org.apache.kafka.common.utils.AppInfoParser) [DistributedHerder-connect_name:8083-1]
21:34:58 INFO [connector.source|task-0] Kafka commitId: a60e31147e6b01ee (org.apache.kafka.common.utils.AppInfoParser) [DistributedHerder-connect_name:8083-1]
21:34:58 INFO [connector.source|task-0] Kafka startTimeMs: 1760477698102 (org.apache.kafka.common.utils.AppInfoParser) [DistributedHerder-connect_name:8083-1]
21:34:58 INFO [connector.source|task-0] [Producer clientId=connector-producer-connector.source-0] Cluster ID: V57zawTbT5O3pQgJtUoQrA (org.apache.kafka.clients.Metadata) [kafka-producer-network-thread | connector-producer-connector.source-0]
21:34:58 INFO [Worker clientId=connect_name:8083, groupId=connect-cluster] Task 'connector.source-0' restart successful (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect_name:8083-1]
21:34:58 INFO [Worker clientId=connect_name:8083, groupId=connect-cluster] Completed plan to restart 1 of 1 tasks for restart request for {connectorName='connector.source', onlyFailed=true, includeTasks=true} (org.apache.kafka.connect.runtime.distributed.DistributedHerder) [DistributedHerder-connect_name:8083-1]
21:34:58 INFO [connector.source|task-0] Starting PostgresConnectorTask with configuration:
snapshot.locking.mode = none
connector.class = io.debezium.connector.postgresql.PostgresConnector
topic.creation.default.partitions = 3
key.converter.apicurio.registry.url = http://apicurio:8080/apis/registry/v2
value.converter.apicurio.registry.url = http://apicurio:8080/apis/registry/v2
slot.name = pg_slot
tasks.max = 1
publication.name = pg_slot
value.converter.apicurio.registry.auto-register = true
topic.creation.bigtables.include = ^.*pg_table1$,^.*pg_table2$
database.sslmode = require
topic.prefix = …
topic.creation.bigtables.cleanup.policy = compact,delete
topic.creation.default.replication.factor = 1
flush.lsn.source = true
value.converter = io.apicurio.registry.utils.converter.AvroConverter
key.converter.apicurio.registry.auto-register = true
key.converter = io.apicurio.registry.utils.converter.AvroConverter
key.converter.apicurio.registry.find-latest = true
publication.autocreate.mode = disabled
database.dbname = pg_db
database.user = pg_user
topic.creation.default.compression.type = zstd
topic.creation.default.cleanup.policy = compact,delete
xmin.fetch.interval.ms = 60000
notification.enabled.channels = log, sink
snapshot.isolation.mode = repeatable_read
database.port = 5432
plugin.name = pgoutput
topic.creation.bigtables.compression.type = zstd
value.converter.apicurio.registry.find-latest = true
notification.sink.topic.name = dbznotification.topic_pfx
topic.creation.groups = bigtables
task.class = io.debezium.connector.postgresql.PostgresConnectorTask
database.hostname = pg_host
topic.creation.bigtables.partitions = 12
database.password = ********
schema.name.adjustment.mode = avro_unicode
name = connector.source
skipped.operations = t
table.include.list = pg_db.pg_table2,pg_db.pg_table1,pg_db.pg_table3
snapshot.mode = when_needed
(io.debezium.connector.common.BaseSourceTask) [task-thread-connector.source-0]
21:34:58 INFO [connector.source|task-0] Loading the custom source info struct maker plugin: io.debezium.connector.postgresql.PostgresSourceInfoStructMaker (io.debezium.config.CommonConnectorConfig) [task-thread-connector.source-0]
21:34:58 INFO [connector.source|task-0] Loading the custom topic naming strategy plugin: io.debezium.schema.SchemaTopicNamingStrategy (io.debezium.config.CommonConnectorConfig) [task-thread-connector.source-0]
21:34:58 INFO [connector.source|task-0] Connection gracefully closed (io.debezium.jdbc.JdbcConnection) [pool-11787-thread-1]

Finally the unwanted initial snapshot:

21:34:58 INFO [connector.source|task-0] Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLsn=LSN{14/C5468F90}, catalogXmin=1638961] (io.debezium.connector.postgresql.connection.PostgresConnection) [task-thread-connector.source-0]
21:34:58 WARN [connector.source|task-0] Last recorded offset is no longer available on the server. (io.debezium.connector.common.BaseSourceTask) [task-thread-connector.source-0]
21:34:58 INFO [connector.source|task-0] The last recorded offset is no longer available but we are in when_needed snapshot mode. Attempting to snapshot data to fill the gap. (io.debezium.connector.common.BaseSourceTask) [task-thread-connector.source-0]
21:34:58 INFO [connector.source|task-0] Found previous offset PostgresOffsetContext [sourceInfoSchema=Schema{io.debezium.connector.postgresql.Source:STRUCT}, sourceInfo=source_info[server='connector.source'db='pg_db', lsn=LSN{14/C5468F90}, txId=1638961, messageType=UPDATE, lastCommitLsn=LSN{14/C5468F90}, timestamp=2025-10-14T21:15:26.271652Z, snapshot=FALSE, schema=, table=], lastSnapshotRecord=false, lastCompletelyProcessedLsn=LSN{14/C546BA20}, lastCommitLsn=LSN{14/C5468F90}, streamingStoppingLsn=null, transactionContext=TransactionContext [currentTransactionId=null, perTableEventCount={}, totalEventCount=0], incrementalSnapshotContext=IncrementalSnapshotContext [windowOpened=false, chunkEndPosition=null, dataCollectionsToSnapshot=[], lastEventKeySent=null, maximumKey=null]] (io.debezium.connector.postgresql.PostgresConnectorTask) [task-thread-connector.source-0]
21:34:58 INFO [connector.source|task-0] user 'pg_user' connected to database 'pg_db' on PostgreSQL 17…
role 'pg_user' [superuser: false, replication: true, inherit: true, create role: false, create db: false, can log in: true] (io.debezium.connector.postgresql.PostgresConnectorTask) [task-thread-connector.source-0]

Chris Cranford

unread,

Oct 15, 2025, 11:24:21 AMOct 15

to debe...@googlegroups.com

Hi Antonio -

This may have been fixed in DBZ-9150 [1] since the algorithm there changed from a comparison using "<" to "<=", however because the slotRestartLsn isn't recorded in the ReplicationSlot class, it's hard to know. The change also records the slotRestartLsn that is used in the comparison, so we get more information in this case. Could you please update to 3.3.1.Final and let us know if you continue to observe this issue with the slot position being unavailable?

Thanks,
-cc

[1]: https://issues.redhat.com/browse/DBZ-9150

There have been some changes in the Postgres

--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/debezium/ad7e20ce-469f-4d4f-adcd-690030f98173n%40googlegroups.com.

Antonio Bardazzi

unread,

Oct 31, 2025, 4:02:26 AMOct 31

to debezium

After 2 weeks I can confirm that the new version fix the issue.

Thanks

Message has been deleted

Vladyslav Lysenko

unread,

Nov 12, 2025, 11:26:26 AMNov 12

to debezium

I also had the same issue on connector version 3.1.2, postgresql 17.

Log is:
[2025-11-12 15:20:52,960] INFO Obtained valid replication slot ReplicationSlot [active=false, latestFlushedLsn=LSN{3C2/BC032738}, catalogXmin=4916753] [2025-11-12 15:20:52,960] WARN Last recorded offset is no longer available on the server.

[2025-11-12 15:20:52,960] INFO The last recorded offset is no longer available but we are in when_needed snapshot mode. Attempting to snapshot data to fill the gap. (io.debezium.connector.common.BaseSourceTask) (payment-service-stage-postgresql-connector-57dc69db4d-rs8jn) [2025-11-12 15:20:52,968] INFO Found previous offset PostgresOffsetContext [sourceInfoSchema=Schema{io.debezium.connector.postgresql.Source:STRUCT}, sourceInfo=source_info[server='ps'db='platform_payment_service', lsn=LSN{3C2/BC032738}, txId=4916753, messageType=UPDATE, lastCommitLsn=LSN{3C2/BC032738}, timestamp=2025-11-12T15:20:47.051321Z, snapshot=FALSE, schema=, table=], lastSnapshotRecord=false, lastCompletelyProcessedLsn=LSN{3C2/BC032770}, lastCommitLsn=LSN{3C2/BC032738}, streamingStoppingLsn=null, transactionContext=TransactionContext [currentTransactionId=null, perTableEventCount={}, totalEventCount=0], incrementalSnapshotContext=IncrementalSnapshotContext [windowOpened=false, chunkEndPosition=null, dataCollectionsToSnapshot=[], lastEventKeySent=null, maximumKey=null]] (io.debezium.connector.postgresql.PostgresConnectorTask) (payment-service-stage-postgresql-connector-57dc69db4d-rs8jn)

[2025-11-12 15:20:54,050] INFO Snapshot ended with SnapshotResult [status=COMPLETED, offset=PostgresOffsetContext [sourceInfoSchema=Schema{io.debezium.connector.postgresql.Source:STRUCT}, sourceInfo=source_info[server='ps'db='platform_payment_service', lsn=LSN{3C2/BC0338D0}, txId=4916755, timestamp=2025-11-12T15:20:53.055834Z, snapshot=FALSE, schema=public, table=outbox_m11], lastSnapshotRecord=true, lastCompletelyProcessedLsn=null, lastCommitLsn=null, streamingStoppingLsn=null, transactionContext=TransactionContext [currentTransactionId=null, perTableEventCount={}, totalEventCount=0], incrementalSnapshotContext=IncrementalSnapshotContext [windowOpened=false, chunkEndPosition=null, dataCollectionsToSnapshot=[], lastEventKeySent=null, maximumKey=null]]] (io.debezium.pipeline.ChangeEventSourceCoordinator) (payment-service-stage-postgresql-connector-57dc69db4d-rs8jn)

Could you please tell why it happens on 3.1.2 and what has been improved in the newer connector version to prevent this?

середа, 15 жовтня 2025 р. о 18:24:21 UTC+3 Chris Cranford пише:

jiri.p...@gmail.com

unread,

Nov 13, 2025, 3:37:45 AMNov 13

to debezium

Hi,

as you can see in the linked issue an Chris' explanation - there was a bug that on the checking required LSN to be strictly smaller, not smaller or equal.