AWS Aurora Postgres Connector keeps failing

649 views
Skip to first unread message

Prabhat Bhattarai

unread,
Aug 22, 2023, 12:32:43 PM8/22/23
to debezium
Hello,
we are running into this issue with the replication slot getting flushed/corrupted on AWS aurora. This in turn kills our connector. Has anyone run into this issue before. Any ideas/suggestions we can try.
Thank you,
Prabhat

org.postgresql.util.PSQLException: ERROR: unexpected pageaddr DD8/A0862000 in log segment 0000000100000DD800000029, offset 8790016 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2552) at org.postgresql.core.v3.QueryExecutorImpl.processCopyResults(QueryExecutorImpl.java:1211) at org.postgresql.core.v3.QueryExecutorImpl.readFromCopy(QueryExecutorImpl.java:1111) at org.postgresql.core.v3.CopyDualImpl.readFromCopy(CopyDualImpl.java:44) at org.postgresql.core.v3.replication.V3PGReplicationStream.receiveNextData(V3PGReplicationStream.java:160) at org.postgresql.core.v3.replication.V3PGReplicationStream.readInternal(V3PGReplicationStream.java:125) at org.postgresql.core.v3.replication.V3PGReplicationStream.readPending(V3PGReplicationStream.java:82) at io.debezium.connector.postgresql.connection.PostgresReplicationConnection$1.readPending(PostgresReplicationConnection.java:472) at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.processMessages(PostgresStreamingChangeEventSource.java:207) at io.debezium.connector.postgresql.PostgresStreamingChangeEventSource.execute(PostgresStreamingChangeEventSource.java:169)

Chris Cranford

unread,
Aug 22, 2023, 2:58:25 PM8/22/23
to debe...@googlegroups.com
Hi -

My initial thought is this may be related to some WAL corruption, perhaps due to some disk or hardware failure; however after reading this article [1], it seems its possible there are other alternatives.  I'd reach out to the PG team on this directly as this is more related to something with PG than necessarily with Debezium.

Thanks,
Chris

[1]: https://postgrespro.com/list/thread-id/2333665
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/2664d6cc-fd4c-49f3-b9bf-a2d7b6ab9b6cn%40googlegroups.com.

Prabhat Bhattarai

unread,
Aug 24, 2023, 12:35:44 PM8/24/23
to debezium
Just an update on this. This is what we got back from AWS and are testing now. I will update if this resolves our problem.

There is a known issue (in case of  Debezium Streaming) regarding WAL sender error when decoding.
There is a workaround for this issue which is to turn off the write-through cache by modifying your parameter group, setting the rds.logical_wal_cache parameter to 0 and then restart your writer instance.
Without the write-through cache, Aurora PostgreSQL uses the Aurora storage layer in its implementation of the native PostgreSQL logical replication process.
It does so by writing WAL data to storage and then reading the data back from storage to decode it and send (replicate) to its targets (subscribers).
This can result in small performance bottleneck during logical replication for Aurora PostgreSQL DB clusters but it does not cause any loss of data from existing logical replication.

Thank you,
Reply all
Reply to author
Forward
0 new messages