Debezium not read PostgreSQL multi-level partitions of tables

827 views
Skip to first unread message

Jan Kopic

unread,
Dec 8, 2021, 8:07:51 AM12/8/21
to debezium
We use lot of tables on PostgreSQL (13.5) and Debezium (1.7.0) read changes from WAL logs and save it to Kafka (kafka_2.13-2.8.0) topics.
This work fine for table with one level partition of table, for example:
-- Root table
CREATE TABLE stage.d_v_operator (
    df_operator_key int4 NOT NULL,
    df_druh_hry varchar NOT NULL,
    df_trigger_off_sdc2_group_history bool NULL,
    df_created timestamptz NOT NULL DEFAULT now(),
    df_valid bool NOT NULL DEFAULT true,
    df_v_operator_key int8 NOT NULL,
    idoperator varchar NULL,
    operatornazev varchar NULL,
    df_group_00_df_master_package_key int8 NULL,
    CONSTRAINT d_v_operator_pk PRIMARY KEY (df_operator_key, df_druh_hry, df_created, df_v_operator_key)
)
PARTITION BY RANGE (df_operator_key, df_druh_hry, df_created);

-- 1. level partition
CREATE TABLE stage.d_v_operator_11223344_t_2112 PARTITION OF stage.d_v_operator FOR VALUES FROM (11223344, 'T', '2021-12-01 00:00:00+01') TO (11223344, 'T', '2021-12-31 23:59:59.999+01');

Every table has using index of replica identity, for example:
ALTER TABLE stage.d_v_operator REPLICA IDENTITY USING INDEX d_v_operator_pk;
Now we have table with multi level partitions. Root table has 3 level partitions, for example below:
-- Root table
CREATE TABLE stage.s_v_operator (
    df_package_key int8 NOT NULL,
    df_operator_key int4 NOT NULL,
    df_druh_hry varchar NOT NULL,
    obdobi tstzrange NULL,
    idoperator varchar NULL,
    operatornazev varchar NULL,
    df_v_operator_key int8 NOT NULL,
    rowhash text NOT NULL,
    CONSTRAINT s_v_operator_pkey PRIMARY KEY (df_package_key, df_v_operator_key, df_operator_key, df_druh_hry)
)
PARTITION BY LIST (df_druh_hry);

-- 1. level partition
CREATE TABLE stage.s_v_operator_default PARTITION OF stage.s_v_operator DEFAULT;
CREATE TABLE stage.s_v_operator_t PARTITION OF stage.s_v_operator FOR VALUES IN ('T') PARTITION BY LIST (df_operator_key);

-- 2. level partition
CREATE TABLE stage.s_v_operator_t_11223344 PARTITION OF stage.s_v_operator_t FOR VALUES IN (11223344) PARTITION BY LIST (df_package_key);

-- 3. level partition
CREATE TABLE stage.s_v_operator_t_11223344_269811 PARTITION OF stage.s_v_operator_t_11223344 FOR VALUES IN ('269811');
Debezium not load changes to kafka for this tables. Log have not errors. We have schema and tables in white list in connector properties and this tables we have added to publication of PostgreSQL.
Do you have idea why can not read changes? Support multilevel partitions of table Debezium?  

Thank you very much,
Jan


Chris Cranford

unread,
Dec 8, 2021, 8:44:55 AM12/8/21
to debe...@googlegroups.com, Jan Kopic
Hi Jan -

I'm going to assume you're likely using PgOutput and they were not originally supported with logical replication.  This wasn't added until PostgreSQL 13. 

How did you register the partitioned table with the publication?  Is the publication configured for partitions using the "WITH publish_via_partition_root = true" setup or is it using the default setup to emit from the replicated leaf tables?  If you didn't define the publish_via_partition_root then perhaps the issue is the leaf node tables aren't in the include list?

Chris
--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/5aa8b28b-b484-41e3-8b0d-31bd3758252cn%40googlegroups.com.

Jan Kopic

unread,
Dec 8, 2021, 9:05:12 AM12/8/21
to debezium
Hi Chris,
yes, we using PgOutput and this process work for us from Postgres version 11. All tables with one level partitions work (like example up with table stage.d_v_operator).

This is my complete process of source database configuration below. This process is same for tables with one or multi level partitions.
CREATE PUBLICATION my_pub WITH (publish = 'insert, update, delete, truncate', publish_via_partition_root = true);
SELECT pg_create_logical_replication_slot('my_slot', 'pgoutput');
ALTER PUBLICATION my_pub ADD TABLE stage.d_v_operator; -- table with one level partition
ALTER PUBLICATION my_pub ADD TABLE stage.s_v_operator; -- table with three level partition

Jan

Dne středa 8. prosince 2021 v 14:44:55 UTC+1 uživatel Chris Cranford napsal:

Chris Cranford

unread,
Dec 8, 2021, 11:47:34 PM12/8/21
to debe...@googlegroups.com, Jan Kopic
Hi Jan -

So when you're looking for the replication events from stage.s_v_operator; I just want to be sure you're looking in the stage.d_v_operator topic.  It's my understanding that the publish_via_root_partition enabled option emits those additional partition layers as though they're part of the root partition; thus allowing you to replicate changes from all partitions as if they're actually part of a single partition on the source side.

Assuming we're on the same page you could try to use pg_recvlogical and verify that you're seeing the emitted events over the replication slot for the third level partition.  If those events are indeed visible then this is most definitely a problem either with configuration or processing by Debezium and I would suggest you raise a Jira issue.  If you are not seeing the events emitted by the replication slot then this is a PG issue.  If that's true, then you could reach out to the PG guys.

HTH,
Chris

Jan Kopic

unread,
Dec 9, 2021, 10:11:14 AM12/9/21
to debezium
Hi Chris,
The publish_via_root_partition enabled parameter works exactly as you wrote. If the stage.s_v_operator table contains 200 partitions, the changes are written as a change to the root table. In the connect-debezium-postgres-source.properties configuration file, you do not need to use a regular expression to grouping the partition into one topic. This feature is already taken over by Postgres.

We consulted our problem with PostgreSQL replication developers.

We reproduced the changes in the stage.s_v_operator table using test_decoding, which we read via pg_logical_slot_get_changes. More in Postgres documentation https://www.postgresql.org/docs/13/logicaldecoding-example.html

All operations were displayed correctly, so I would unfortunately see the problem on the Debezium side. For completeness, I am take here our configuration file

name=connect-debezium-postgres-source
plugin.name=pgoutput
connector.class=io.debezium.connector.postgresql.PostgresConnector
publication.name=my_pub
slot.name=my_slot
snapshot.mode=never
database.hostname=*.*.*.*
database.port=5432
database.user=myuser
database.password=****
database.dbname=mydb
database.server.name=myservername
schema.include.list=ldz_master,rdt,stage,df_master
table.include.list=ldz_master.df_package,df_master.lst_provozovatel,rdt.kurs_cnb,stage.d_v_operator,stage.s_v_operator
delete.topic.enable=true
decimal.handling.mode=string
toasted.value.placeholder={"message":"__debezium_unavailable_value"}
topic.creation.default.replication.factor=-1
topic.creation.default.partitions=-1
topic.creation.default.cleanup.policy=delete
topic.creation.default.compression.type=snappy


Our database has almost 40 TB so we do not use a snapshot .

Jan
Dne čtvrtek 9. prosince 2021 v 5:47:34 UTC+1 uživatel Chris Cranford napsal:

Chris Cranford

unread,
Dec 9, 2021, 10:14:40 AM12/9/21
to debe...@googlegroups.com, Jan Kopic
Hi Jan -

Thank you very much for confirming this is on the Debezium side.   Could you please raise a Jira issue with the details of this conversation if you haven't already and we''ll take a look to see what could be the problem?

Chris

Jan Kopic

unread,
Dec 9, 2021, 11:08:08 AM12/9/21
to debezium
Thank you very much Chris. I created issue in your JIRA - https://issues.redhat.com/browse/DBZ-4433

Jan

Dne čtvrtek 9. prosince 2021 v 16:14:40 UTC+1 uživatel Chris Cranford napsal:
Reply all
Reply to author
Forward
0 new messages