Use of Debezium embedded client for postgres

473 views
Skip to first unread message

Sumit Pal

unread,
Jun 29, 2018, 9:46:19 AM6/29/18
to debezium
I'm working towards a solution where I need to move all relational data from postgres to ElasticSearch. I intend to use Debezium as the CDC tool and I have some limitations using Kafka. Thus I'm planning to use Debezium embedded client for this. To start with, I have some questions and I would be great full if you can clarify my doubts -

1. Can embedded Debezium be used with postgres?

2. What is the CDC data format that Debezium embedded connector returns?

3. Do I need to parse the SoureRecord to retrieve all the fields as key value pair?

4. Does Debezium embedded client provide the initial data dump when connected for the first time like the normal connector?

5. Can you provide me a sample config file for postgres? I have it for MySQL but need for postgres

6. How can I set the max.batch.size in postgres? If I set max.batch.size to 1 and offset.flush.interval.ms as 0, does it guarantee each source record exactly once?

Thanks,
Sumit

Gunnar Morling

unread,
Jul 2, 2018, 12:03:15 PM7/2/18
to debezium
Hi Sumit,


Am Freitag, 29. Juni 2018 15:46:19 UTC+2 schrieb Sumit Pal:
I'm working towards a solution where I need to move all relational data from postgres to ElasticSearch. I intend to use Debezium as the CDC tool and I have some limitations using Kafka. Thus I'm planning to use Debezium embedded client for this. To start with, I have some questions and I would be great full if you can clarify my doubts -

1. Can embedded Debezium be used with postgres?


Yes.
 

2. What is the CDC data format that Debezium embedded connector returns?


It's the same as when deploying the Debezium connectors into Kafka Connect. See http://debezium.io/docs/connectors/postgresql/#events for the details.
 

3. Do I need to parse the SoureRecord to retrieve all the fields as key value pair?


I'm not quite sure what you mean by "parse" in this context. When sinking change events into Elasticsearch, you'll be interested in the contents of the "after" field of the change events. You can use this SMT ("single message transform") to just extract that part: http://debezium.io/docs/configuration/event-flattening/
 

4. Does Debezium embedded client provide the initial data dump when connected for the first time like the normal connector?


Yes.
 

5. Can you provide me a sample config file for postgres? I have it for MySQL but need for postgres


 

6. How can I set the max.batch.size in postgres? If I set max.batch.size to 1 and offset.flush.interval.ms as 0, does it guarantee each source record exactly once?


You can specify "max.batch.size" as any other connector property, only that you don't register the connector via JSON but use the builder API of the embedded client as described at http://debezium.io/docs/embedded/

See http://debezium.io/docs/embedded/#handling_failures regarding batching and flushing. In general, you're better of if you consumers can handle duplicates as it'll allow for a much higher through-put.


Thanks,
Sumit


Hth,

--Gunnar 
Reply all
Reply to author
Forward
0 new messages