Monitoring Postgres connection failures for Debezium

2,224 views
Skip to first unread message

Marc MacGonagle

unread,
Jun 21, 2021, 10:49:16 AM6/21/21
to debezium

Hello,

According the debezium documentation, “When the connector is running, the PostgreSQL server that it is connected to could become unavailable for any number of reasons. If this happens, the connector fails with an error and stops. When the server is available again, restart the connector.”

Is there a way to surface the fact that the connector has failed? The reason I ask is, I’ve run an experiment where I’ve 

  • started a Postgres instance in a docker compose
  • configured a debezium connector in Kafka-connect running in another docker compose
  • checked the status using http://localhost:8083/connectors/debezium-spike/status and saw that it was : {"name":"debezium-spike","connector":{"state":"RUNNING","worker_id":"kafka-connect:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"kafka-connect:8083"}],"type":"source"}
  • stopped the Postgres docker compose
  • rechecked the status and it hadn’t changed
Furthermore, the logging on the kafka connect instance wasn't displaying a connection issue.

DEBUG polling records... (io.debezium.connector.base.ChangeEventQueue)

DEBUG no records available yet, sleeping a bit...(io.debezium.connector.base.ChangeEventQueue)

DEBUG checking for more records... (io.debezium.connector.base.ChangeEventQueue) 


I am missing some config that would tell debezium to surface when there's a connection problem?

Thanks,

Marc.



Marc MacGonagle

unread,
Jun 21, 2021, 11:31:08 AM6/21/21
to debezium
Sorry, that should be 'Am I missing some config that would tell debezium to surface when there's a connection problem?'

Gunnar Morling

unread,
Jun 21, 2021, 11:50:28 AM6/21/21
to debe...@googlegroups.com
Hey Marc,

The Debezium Postgres connector will automatically do an (internal) restart in case of a connection loss, which is why the exposed connector status always is "RUNNING". You should see exceptions in the Kafka Connect log while the connector is trying to reconnect.

I just tried myself using the tutorial example (https://github.com/debezium/debezium-examples/) and can confirm that exceptions showed up in the log. Note the connector won't retry upon encountering UnknownHostException, as this seems rather unlikely to be mitigated by simply restarting, in that case it will transition to "FAILED" state. We might add retrying in that case though if folks think that's a good idea.

Hth,

--Gunnar


--
You received this message because you are subscribed to the Google Groups "debezium" group.
To unsubscribe from this group and stop receiving emails from it, send an email to debezium+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/debezium/344f88a8-c7db-4297-a2b7-1990f8ce6a1an%40googlegroups.com.

Marc MacGonagle

unread,
Jun 21, 2021, 12:31:12 PM6/21/21
to debezium
Hi Gunnar,

Thanks for the quick reply. 

Given what you said should the connector not have displayed 'FAILED' in my case? The database had been taken down. Or is because it was running in a docker container that caused the problem?

Also, what if the automatic restart fails for whatever reason? Would it eventually displayed FAILED state after multiple retries?

Thanks,
Marc.

Chris Cranford

unread,
Jun 21, 2021, 7:27:13 PM6/21/21
to debe...@googlegroups.com
HI Marc -

Debezium builds on top of the Kafka Connect retry/back-off framework that is based around RetriableException [1].  The idea is when the connector throws an exception that is wrapped by this RetriableException, its a special indicator to Kafka Connect that it should honor the retry attempts & back-off strategy.  Normally this means that the connector will be restarted almost instantaneously but if repeated restarts fail, Kafka Connect will back-off retrying based its configured strategy. 

The PostgreSQL connector only restarts automatically under the following error conditions:

    "Database connection failed when writing to copy"
    "Database connection failed when reading from copy"

If the exception message is anything but those, the connector will not wrap the exception in a RetriableException and thus will be marked FAILED immediately.

Lastly, if the maximum number of retries of the retry configuration are met, Kafka Connect will stop attempting to restart the connector and it will also then be marked as FAILED.

HTH,
Chris

[1]: https://kafka.apache.org/27/javadoc/index.html?org/apache/kafka/common/errors/RetriableException.html

Marc MacGonagle

unread,
Jun 22, 2021, 8:47:47 AM6/22/21
to debezium

Hi Chris. 


Thanks for your input. It’s really useful to know how the RetriableException works. Where are the maximum number of retries configured by the way?


Getting back to the point about displaying FAILURE. I’ve tried using the examples that Gunnar linked to (https://github.com/debezium/debezium-examples/tree/master/tutorial) and I was able to reproduce the issue with the following steps - I'm running on a mac in case that's relevant

  • export DEBEZIUM_VERSION=1.5
  • docker-compose -f docker-compose-postgres.yaml up
  • curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @register-postgres.json

I now have 
Screenshot 2021-06-22 at 13.35.16.png

Screenshot 2021-06-22 at 13.33.38.png
Now I run
  • docker stop df84f0e286fc

And I can see
Screenshot 2021-06-22 at 13.38.31.png
Screenshot 2021-06-22 at 13.37.33.png
Screenshot 2021-06-22 at 13.39.32.png

Why is it that RUNNING is still displayed?

Thanks,
Marc.

Gunnar Morling

unread,
Jun 23, 2021, 4:45:07 AM6/23/21
to debezium
Hi Marc,

So I tried again, and indeed something doesn't look right here.

I also observed the situation now that after the DB was stopped, the connector still is RUNNING, without any indication of attempts to reconnect in the logs. There seems to be a matter of timing involved here, as this doesn't happen all the time. Could you log a Jira issue please, so we can investigate this (https://issues.redhat.com/browse/DBZ)?

Thx,

--Gunnar

Marc MacGonagle

unread,
Jun 23, 2021, 7:22:31 AM6/23/21
to debezium
Hi Gunnar,

Thank you so much for taking a second look. I've raised the following ticket: https://issues.redhat.com/browse/DBZ-3655. Please let me know if this doesn't meet the required standards and I'll edit it.

Much appreciated,
Marc.

Reply all
Reply to author
Forward
0 new messages