Postgres replication slots grow indefinitely

Ori Popowski

unread,

Sep 28, 2018, 12:21:25 AM9/28/18

to debezium

Hi,

Postgres replication slot size keeps growing even though Debezium is consuming the WAL. This consequently causes to run out of disk space.

The problem is also referenced here: https://groups.google.com/forum/#!searchin/debezium/postgres%7Csort:date/debezium/NByBask6lN8/E_r1yz7bAAAJ

And a bug was opened here: https://issues.jboss.org/browse/DBZ-926

Gunnar Morling

unread,

Sep 28, 2018, 4:00:13 AM9/28/18

to debe...@googlegroups.com

Could it be you only have changes on non-whitelisted tables? If so, the connector will never emit an event by default, causing no offsets to be committed and the logs to be retained. In that case you can work around the issue using the recently added heartbeat topic.

--Gunnar

Ori Popowski

unread,

Sep 28, 2018, 8:24:02 AM9/28/18

to debezium

Hi,

No. We use a database with one table, and this table is definitely whitelisted, and we see the records there reaching our Kinesis stream and from there to S3.

The Postgres connector seems to never update the confirmed_flush_lsn value.

I actually cannot find the part in the source code responsible for it. If you can point me in the right direction I might be able to debug and give more insights.

Also if you happen to reproduce or find the root cause let me know. Generally, if you do run

select slot_name, pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(),restart_lsn)) as replicationSlotLag, active from pg_replication_slots;

and see an ever increasing lag it means that the problem is indeed real.

Thanks

Gunnar Morling

unread,

Oct 1, 2018, 3:53:51 AM10/1/18

to debezium

Hi,

Am Freitag, 28. September 2018 14:24:02 UTC+2 schrieb Ori Popowski:

Hi,

No. We use a database with one table, and this table is definitely whitelisted, and we see the records there reaching our Kinesis stream and from there to S3.

Ok, thanks for clarifying. Seems like something isn't working as expected indeed then.

The Postgres connector seems to never update the confirmed_flush_lsn value.

I actually cannot find the part in the source code responsible for it. If you can point me in the right direction I might be able to debug and give more insights.

Please see PostgresConnectorTask.commit() which in turn calls RecordsStreamProducer.commit(). PostgresConnectorTask.commit() itself should be called by Kafka Connect regularly if the connector produces records. Maybe this helps to get you started exploring what's going wrong here?

Ori Popowski

unread,

Oct 1, 2018, 3:58:02 AM10/1/18

to debe...@googlegroups.com

Thanks, I will try this.

Two questions:

Were you guys able to reproduce?
Is it possible that PostgresConnectorTask.commit() is not implemented correctly and doesn't actually update the confirmed_flush_lsn value?

--
You received this message because you are subscribed to a topic in the Google Groups "debezium" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/debezium/AGcqiByF9UM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to debezium+u...@googlegroups.com.
To post to this group, send email to debe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/b9f7571c-93f9-4796-8814-acc9994cdb6a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gunnar Morling

unread,

Oct 1, 2018, 4:10:03 AM10/1/18

to debezium

Am Montag, 1. Oktober 2018 09:58:02 UTC+2 schrieb Ori Popowski:

Thanks, I will try this.

Two questions:
Were you guys able to reproduce?

I haven't tried yet. Last time I looked into this (this was in the context of not committing LSNs if only non-whitelisted tables see changes), this worked as expected.

Is it possible that PostgresConnectorTask.commit() is not implemented correctly and doesn't actually update the confirmed_flush_lsn value?

Everything is possible :) Maybe you can launch the connector in debug mode and see whether commit() is called and whether it does the right thing?

Btw. which version of Debezium are you on? Can you please try 0.8.3.Final in case you're still on an older one?

Ori Popowski

unread,

Oct 1, 2018, 1:01:52 PM10/1/18

to debezium

Hi!

There is some development in the investigation.

Running locally the offsets are committed and everything works fine.
Running (in debug) against RDS with wal2json, we don't event get to the PostgresConnectorTask.commit() method.

This happens since in EmbeddedEngine.commitOffsets(), the method returns here:

if (!offsetWriter.beginFlush()) return;

And doesn't event get to task.commit().

https://github.com/debezium/debezium/blob/8426f71fc13cf41e5399a1946d15098e80e47043/debezium-embedded/src/main/java/io/debezium/embedded/EmbeddedEngine.java#L825

We're using 0.8.3.

Why should this happen?

Ori Popowski

unread,

Oct 1, 2018, 1:36:47 PM10/1/18

to debezium

My last post was a mistake.

Offsets are indeed committed, but something weird is going on. The replication slot jumps in size very quickly, and after offset commit, it only reduces by little.

This is the timeline of the select slot_name, pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(),restart_lsn)) as replicationSlotLag, active from pg_replication_slots; command:

slot_name | replicationslotlag | active
------------------------+--------------------+--------
debezium | 152 kB | t
(1 rows)

slot_name | replicationslotlag | active
------------------------+--------------------+--------
debezium | 147 kB | t
(1 rows)

slot_name | replicationslotlag | active
------------------------+--------------------+--------
debezium | 16 MB | t
(1 rows)

The replication slot has only one table whitelisted, example123 which is fully controlled by me, and I have only inserted three rows.

I'm really at a loss here. I've no idea what's going on and in the meantime our production system keeps growing...

Ori Popowski

unread,

Oct 1, 2018, 2:22:42 PM10/1/18

to debezium

Okay, I found the problem.

The log is full of these errors:

Timed out waiting to flush EmbeddedConnector{id=svc-debezium-kinesis} offsets to storage

What is the reason of this happening?

Gunnar Morling

unread,

Oct 1, 2018, 3:49:49 PM10/1/18

to debezium

Ah, that's some good progress. What OffsetBackingStore implementation are you using (via "offset.storage" option)? In any case it seems it's the culprit, as it fails to write the offset in a timely manner.

Ori Popowski

unread,

Oct 2, 2018, 1:39:45 AM10/2/18

to debezium

Hi,

It's actually a custom implementation of ours.

The question is:

How to increase the timeout of the operation?
What does it even have to do with replication slots growing? I was under the impressions that updating last_processed_lsn value in Postgres server was timing out. Not updating our own track of the offsets.

Just to make sure we're on the same page:

The Postgres replication slots growing indefinitely happens because the connector fails to update the transaction log position on the Postgres database, correct?

Ori Popowski

unread,

Oct 2, 2018, 1:43:50 AM10/2/18

to debezium

Okay.

I'm going to increase offset.flush.timeout.ms and let it run for some time.

I will report progress.

Thanks

Gunnar Morling

unread,

Oct 2, 2018, 2:52:15 AM10/2/18

to debezium

Answers inline.

Am Dienstag, 2. Oktober 2018 07:39:45 UTC+2 schrieb Ori Popowski:

Hi,

It's actually a custom implementation of ours.

The question is:
How to increase the timeout of the operation?

By specifying "offset.flush.timeout.ms" as you already found out.

What does it even have to do with replication slots growing? I was under the impressions that updating last_processed_lsn value in Postgres server was timing out. Not updating our own track of the offsets.

No, the call for persisting the offsets timed out as per the log message you shared.

Just to make sure we're on the same page:
The Postgres replication slots growing indefinitely happens because the connector fails to update the transaction log position on the Postgres database, correct?

Yes, exactly.

And this happens because the embedded engine never gets to call commit() on the task (which would acknowledge the consumed LSNs with the database) because the offset flush fails before, raising an exception. See here: https://github.com/debezium/debezium/blob/master/debezium-embedded/src/main/java/io/debezium/embedded/EmbeddedEngine.java#L840-L855

So if you increase the timeout for writing the offsets, then everything should be fine. That said, the default is 5 sec, which seems like plenty of time to write a few simple values, so I'm wondering whether something odd is going on with your OffsetBackingStore?

If this indeed resolves the issue, can you also close the JIRA issue you opened? Thanks!

Ori Popowski

unread,

Oct 2, 2018, 3:33:15 AM10/2/18

to debe...@googlegroups.com

Hi,

Yes, I'll close the issue as soon as I see the lag goes down

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/10e73aa6-2bc6-40c5-8bf9-434fee7107f6%40googlegroups.com.

Ori Popowski

unread,

Oct 2, 2018, 8:06:01 AM10/2/18

to debezium

Hi,

Unfortunately the problem still persists.

Error logs are now completely clean
The offset.dat file shows that there are periodic offset updates (file attached)
The replication lag keeps growing (as in image)

To unsubscribe from this group and all its topics, send an email to debezium+unsubscribe@googlegroups.com.

offset.dat

Gunnar Morling

unread,

Oct 2, 2018, 9:02:20 AM10/2/18

to debezium

:(

I'm running a bit out of ideas here. Do you see the debug log message "flushing offsets to server..." (from RecordsStreamProducer.commit(long)) in the logs?

If so, then we're definitely acknowledging the LSNs with the server and I wouldn't know what else to do. Maybe you can run through this in debug mode in your IDE to make sure it an increasing LSN value is committed (I'll change the log message to contain the value, too).

The one thing I could then still think of is that PG perhaps doesn't handle this request synchronously; so it just might take a while after we call setFlushedLSN()/setAppliedLSN() on the PGReplicationStream until you see the log files also released on the server?

Ori Popowski

unread,

Oct 2, 2018, 10:23:11 AM10/2/18

to debezium

Hi,

Running in debug. I can confirm that the "flushing offsets to server..." is showing regularly in the log file.

On the other hand, the confirmed_flush_lsn value is stuck on 39/8E029F58 for a whole day.

select confirmed_flush_lsn from pg_replication_slots;
 confirmed_flush_lsn 
---------------------
 39/8E029F58

It looks like the command is sent to the Postgres server but isn't carried out.

Maybe the implementation is incorrect?

Also I want to note that the exact same problem occurs in two separate setups with different DB server, Debezium instance, etc. so it seems like a bug.

Gunnar Morling

unread,

Oct 2, 2018, 10:30:19 AM10/2/18

to debezium

> Running in debug. I can confirm that the "flushing offsets to server..." is showing regularly in the log file.

Ok, I see. And is the LSN increased there? I.e. is the "lastProcessedLsn" in PostgresConnectorTask changing?

If it is, I think everything is right on our side AFAICS and I'd suggest you open an issue with the PG JDBC driver (I'd https://github.com/pgjdbc). That's what we call in order to perform the acknowledgement (see PostgresReplicationConnection.createReplicationStream(...).new ReplicationStream() {...}.flushLsn(long)):

PGReplicationStream stream = ...;
stream.setFlushedLSN(lsn);
stream.setAppliedLSN(lsn);
stream.forceUpdateStatus();

Perhaps something else must be done?

Ori Popowski

unread,

Oct 2, 2018, 10:40:27 AM10/2/18

to debezium

Oh, sorry I forgot to stop in breakpoint there.

I'll do it later and report if the LSN is increasing.

If it's increasing I'll open a bug with PG JDBC, if not then I'll tell you :)

If I find out the usage of PG JDBC is incorrect I'll also keep you posted.

Gunnar Morling

unread,

Oct 2, 2018, 10:57:10 AM10/2/18

to debe...@googlegroups.com

So FWIW, I can't reproduce the issue locally. If I run

select slot_name, pg_size_pretty(pg_xlog_location_diff(pg_current_xlog_location(),restart_lsn)) as replicationSlotLag, active, confirmed_flush_lsn from pg_replication_slots;

then I see how the lag increases and decreases and how confirmed_flush_lsn proceeds (to the values the connector is confirming). I'm wondering whether there's something special about your environment.

You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.

To post to this group, send email to debe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/48bd4f04-fa2a-475e-905d-1a624248853a%40googlegroups.com.

Ori Popowski

unread,

Oct 3, 2018, 2:33:41 AM10/3/18

to debezium

Hi,

I will do the debug-breakpointing later today so we'll have better insights.

Our environment is quite regular:

Postgres 9.4 on RDS with wal2json

Ori Popowski

unread,

Oct 3, 2018, 3:02:51 AM10/3/18

to debezium

I meant 9.6.6

Ori Popowski

unread,

Oct 5, 2018, 5:41:54 AM10/5/18

to debezium

Hi,

Ran a snapshot version of 0.8 branch with the debug printing of "Flushing LSN to server".

The log shows that the offsets are increasing with each printing:

…
{"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"io.debezium.connector.postgresql.RecordsStreamProducer","message":"Flushing LSN to server: LSN{3C/A300035F}","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","instant":{"epochSecond":1538732026,"nanoOfSecond":265000000},"threadId":16,"threadPriority":5}
{"thread":"pool-4-thread-1","level":"DEBUG","loggerName":"io.debezium.connector.postgresql.RecordsStreamProducer","message":"Flushing LSN to server: LSN{3C/A3002C4F}","endOfBatch":false,"loggerFqcn":"org.apache.logging.slf4j.Log4jLogger","instant":{"epochSecond":1538732053,"nanoOfSecond":331000000},"threadId":16,"threadPriority":5}
…

However, now the problem does not reproduce, and the confirmed_flush_lsn is updated:


slot_name        |  plugin  | slot_type | datoid |     database     | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn 
------------------------+----------+-----------+--------+------------------+--------+------------+------+--------------+-------------+---------------------
 debeziumstagingsandbox | wal2json | logical   |  26998 | sandbox_location | t      |        603 |      |       261157 | 3C/A3002C00 | 3C/A3002C4F

I have no idea what to do. It seems the problem is inconsistent.

I hoped to glance into the origin of the problem but specifically now it wasn't reproduced.

Jiri Pechanec

unread,

Oct 8, 2018, 12:01:22 AM10/8/18

to debezium

Hi,

thanks for trying. Has anything else (like traffic pattern) changed on your database instance? Does the problem apprear when you return back to the original Debezium version?

J.

Ori Popowski

unread,

Oct 8, 2018, 2:47:14 AM10/8/18

to debezium

Hi,

Traffic pattern hasn't changed between the version with the debug-logging and 0.8.3.
When I used the 0.8-SNAPSHOT with the manually added debug-logging, the offset was successfully and periodically flushed to Postgres DB. The replication slot did not grow.
Returning back the 0.8.3 version caused the lag to return
The difference between the versions is only the addition of debug logging. 0.8.3 was at the same commit as 0.8-SNAPSHOT and only added the debug logging
Since returning the original version, there's no change. The replication slot keeps growing. Restarts don't solve this

It's a real puzzle for me. I've no idea why it happens. I wouldn't go as far as saying that the mere log printing somehow solves the problem, but everything can be.

There's a low chance that other dependencies were changed between the two versions, since I had to manually add them, because I added the JARs manually. However, I double-checked I used the same versions.

If you need more details, examination, etc. let me know. I'll try out and report.

Jiri Pechanec

unread,

Oct 8, 2018, 3:00:38 AM10/8/18

to debezium

Hi,

what if you have a SNAPSHOT version and disable the debug logging hence bypassing the new code?

J.

Ori Popowski

unread,

Oct 8, 2018, 3:04:32 AM10/8/18

to debe...@googlegroups.com

I will try it

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/dea11f22-0754-4e3a-9460-ef7c12a4b530%40googlegroups.com.

Hunor Kovács

unread,

Jan 8, 2020, 3:15:52 AM1/8/20

to debezium

Anything new about this issue? I'm using Postgres and Debezium 1.0 and I the replication slot keeps growing.

I tried restarting, setting a heartbeat interval, nothing worked yet. Should I try 0.8-SNAPSHOT?

Thanks,

Hunor

I will try it

To unsubscribe from this group and all its topics, send an email to debe...@googlegroups.com.

Gunnar Morling

unread,

Jan 8, 2020, 3:21:19 AM1/8/20

to debe...@googlegroups.com

Hunor,

Definitely use 1.0; 0.8 isn't maintained any more. This topic is described in the docs here:

https://debezium.io/documentation/reference/1.0/connectors/postgresql.html#wal-disk-space

Typically, it's caused by a combination of high-traffic and low-traffic tables (or worse, databases) and only capturing the low-traffic ones. The docs describe the applicable counter measures.

--Gunnar

To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/ca0624e6-4487-4b59-ba47-71cd514c101b%40googlegroups.com.

Reply all

Reply to author

Forward