Upgrading RDS Postgresql 11.6 to 12.2

1,273 views
Skip to first unread message

Takao Magoori

unread,
Apr 8, 2020, 4:35:56 AM4/8/20
to debezium
Hi,

My Debezium is running  on the following environment.

- Amazon RDS Postgresql 11.6
- Debezium 1.1.0.Final with "snapshot.mode" = "initial"

I am planning to upgrade my RDS Postgresql from 11.6 to 12.2, which can be done on AWS web console without full dump.

During the upgrade, I will stop my Debezium. But, I am worried that LSN may be changed to different value or reset, which will result in some errors or running a snapshot again because of "snapshot.mode" = "initial".
Is it ok to keep my Debezium settings unchanged during Postgresql upgrade?

- Current Debezium settings

"connector.class": "io.debezium.connector.postgresql.PostgresConnector"
,"tasks.max": 1
,"plugin.name": "pgoutput"
,"slot.name": "XXX"
,"slot.drop.on.stop": false
,"publication.name": "XXX"
,"database.hostname": "rds-hostname"
,"database.port": "5432"
,"database.user": "username"
,"database.password": "password"
,"database.dbname": "dbname"
,"database.server.name": "server_name"
,"schema.whitelist": "client_\\d+,common_\\d+"
,"table.whitelist": "^client_\\d+\\.obx_.+$|^client_\\d+\\.inv_daily_\\d+$|^common_\\d+\\.heartbeat$"
,"database.sslmode": "disable"
,"database.tcpKeepAlive": true
,"tombstones.on.delete": false
,"column.propagate.source.type": ".+"
,"snapshot.mode": "initial"
,"max.queue.size": 2000
,"max.batch.size": 500
,"poll.interval.ms": 1000
,"include.unknown.datatypes": true
,"heartbeat.interval.ms": 21000
,"snapshot.fetch.size": 500
,"transforms": "AggTable,AggSchema"
,"transforms.AggTable.type": "io.debezium.transforms.ByLogicalTableRouter"
,"transforms.AggTable.topic.regex": "(.+)\\.(?:(common)_\\d+|(.+))\\.(?:(obx_.+)|(inv_daily)_\\d+|(.+))$"
,"transforms.AggTable.topic.replacement": "$1.$2$3.$4$5$6"
,"transforms.AggSchema.type": "org.apache.kafka.connect.transforms.RegexRouter"
,"transforms.AggSchema.regex": "(.+)\\.(.+)\\.(.+)"
,"transforms.AggSchema.replacement": "XXX.$2"
,"key.converter": "io.confluent.connect.avro.AvroConverter"
,"key.converter.schema.registry.url": "http://schema-registry-url:8081"
,"key.converter.auto.register.schemas": true
,"key.converter.key.subject.name.strategy": "io.confluent.kafka.serializers.subject.RecordNameStrategy"
,"value.converter": "io.confluent.connect.avro.AvroConverter"
,"value.converter.schema.registry.url": "http://schema-registry-url:8081"
,"value.converter.auto.register.schemas": true
,"value.converter.value.subject.name.strategy": "io.confluent.kafka.serializers.subject.RecordNameStrategy"
,"producer.override.acks": "all"
,"producer.override.buffer.memory": 536870912
,"producer.override.compression.type": "snappy"
,"producer.override.batch.size": 131072
,"producer.override.max.block.ms": 120000
,"producer.override.max.request.size": 52428800
,"producer.override.max.in.flight.requests.per.connection": 1
,"consumer.override.max.poll.interval.ms": 1800000
,"status.update.interval.ms": 7000


Thanks.

---
takaomag

Jiri Pechanec

unread,
Apr 8, 2020, 8:11:36 AM4/8/20
to debezium
Hi,

what is important to stop all writes to the database as well. Then you shoul re-create replication slot.

IMHO the easiest approach is deleting the connector and its offsets and recreate it with snapshot mode set to never. Then the connector should resum from the replication slot LSN.

The more complicated but still workable is editing the offsets and changing the LSN to the value of rpelication slot.

J.

Gunnar Morling

unread,
Apr 8, 2020, 8:38:35 AM4/8/20
to debe...@googlegroups.com
Hi,

Am Mi., 8. Apr. 2020 um 14:11 Uhr schrieb Jiri Pechanec
<jiri.p...@gmail.com>:
>
> Hi,
>
> what is important to stop all writes to the database as well. Then you shoul re-create replication slot.

I suppose that'd be kinda given implicitly when upgrading the database server?

>
> IMHO the easiest approach is deleting the connector and its offsets and recreate it with snapshot mode set to never. Then the connector should resume from the replication slot LSN.

+1, that'd be easiest when having a phase without any writes. It's
important to set up the new connector + slot before resuming with
writes.

>
> The more complicated but still workable is editing the offsets and changing the LSN to the value of rpelication slot.

Maybe nothing at all is even needed, in case the server keeps the slot
state across the server update.

@Takao, I'd definitely recommend you try this out in a testing
environment first. Would love to hear back from you on any insights
you may gain doing this, would surely be good material to add to the
documentation.

--Gunnar
> --
> You received this message because you are subscribed to the Google Groups "debezium" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/1240fa6d-189b-4660-8f85-cc4d1863794c%40googlegroups.com.

Takao Magoori

unread,
Apr 8, 2020, 8:54:58 PM4/8/20
to debezium
Thanks, J and Gunnar .

I want to test it in some testing environment if I can.
I will do the following steps.


1. Stop all writes to my database.

2. Stop my debezium.

3. Check the current lsn and the replication slot state by 'SELECT pg_current_wal_lsn(); SELECT * FROM pg_replication_slot;'.

4. Upgrade the database on AWS web console.

5. Check the debezium's replication slot state after the upgrade.

6. If the replication slot is kept and its offset is reasonable value, I will start my debezium unchanged.

7. If not, I will delete my debezium connector (by deleting offset, status, config topics) and recreate it with "snapshot.mode" = "never".

8. Restart writes to the database.
Thanks,
---
takaomag


2020年4月8日水曜日 17時35分56秒 UTC+9 Takao Magoori:

Jiri Pechanec

unread,
Apr 9, 2020, 1:29:38 AM4/9/20
to debezium
Sounds good and safe!

J.

Takao Magoori

unread,
Apr 10, 2020, 7:21:51 PM4/10/20
to debezium
Hi,

I upgraded RDS PostgreSQL 11.6 to 12.2 on AWS web console.
Well, if a database has one or more replication slots, the major upgrade procedure stops and warns its precheck failed.
The warning logs are as follows.

Upgrade could not be run on Fri Apr 10 13:18:08 2020

The instance could not be upgraded from 11.6.R1 to 12.2.R1 because of following reasons. Please take appropriate action on databases that have usages incompatible with requested major engine version upgrade and try again.

- Following usages in database 'database_name' need to be corrected before upgrade:

-- The instance could not be upgraded because one or more databases have logical replication slots. Please drop all logical replication slots and try again.


After all, correct approach is,
1. Stop all writes to the database.

2. Check if all previous writes are processed by the Debezium connector.

3. Delete the Debezium connector and its offset topic.

4. Drop replication slot
SELECT pg_drop_replication_slot('the_replication_slot_name');

5. Upgrade the database on AWS web console.

6. Recreate the Debezium connector with snapshot mode set to 'never'.


Thanks,

---
takaomag


2020年4月8日水曜日 17時35分56秒 UTC+9 Takao Magoori:
Hi,

Jiri Pechanec

unread,
Apr 14, 2020, 1:38:10 AM4/14/20
to debezium
Thanks for validation! Would you be willing to contribute this information into our docs? Probably FAQ?

Thanks

J.

Adriano Gigliuto

unread,
Oct 26, 2021, 6:17:53 AM10/26/21
to debezium
Hi! 

Few days ago we upgraded our RDS cluster from 11 to 12 too. After some tries in a test environment we ended up with the following procedure:
  1. scale down all writers (except for the Kafka Connect cluster where Debezium is running).
  2. use the following query to ensure that replication lag is near 0: in case Debezium is lagging it gives us some buffer to avoid data loss.
    SELECT slot_name, pg_wal_lsn_diff(pg_current_wal_lsn(), confirmed_flush_lsn) as confirmedLag
    FROM pg_replication_slots
  3. scale down Kafka Connect too.
  4. drop all replication slots.
  5. upgrade RDS cluster.
  6. drop topic connector_offsets (Kafka Connect's offset.storage.topic): because in test environment Debezium was not able to resume it's job, probably due to difference in LSN and step 4.
  7. scale up Kafka Connect and use the query in step 2 to ensure that all previous slots were recreated and lag is not increasing.
  8. scale up all other services.
Everything worked fine and we didn't noticed data loss or duplicated messages. 
I'm wondering if you have any suggestion to improve this procedure, because scale down all writes causes a lot of downtime. 

I hypothesized to create a new RDS cluster with the new version, then use DMS to stream changes from the old cluster to the new one, but I'm wondering how to handle Debezium and different LSN stored on Kafka. Maybe I could use Kafka Connect REST API to change connectors' config and stream data to Kafka from the target RDS (the one with the new version), but I'm not sure how safely handle the difference in LSN between source and target RDS . 
Have you got any suggestion to share ? 

Thanks 
Adriano
Reply all
Reply to author
Forward
0 new messages