Microservice is not responsive due to R2DBC connection loss

91 views

Skip to first unread message

akila dharmadasa

unread,

Sep 22, 2024, 11:32:25 PM9/22/24

to r2dbc

Hi,

I have several spring webflux microservices which are connected to the same postgres database on GCP cloud sql (A google managed service).

I keep constantly running in to an issue where one service out of the services connected to the database becomes unresponsive randomly.
When an api request is send to a service at a time like that, it tries to get a response from the service for a long time and finally throws an error from nginx after the request reaches max keep alive time.

I suspect that this is an R2DBC connection issue because all logs in the service layer upp until a database call is made are written to the log file, but no logs were found after that.

I have even enabled R2DBC debug logs but no query execution logs were found.

I cannot recreate or debug this due to it happening at random, I even did a load test but the service worked fine at that time.

This is how the connection to R2DBC is made in all services. I have adjusted the thread pool due to suggestions i got from other sources.

r2dbc:
url: r2dbc:postgresql://**.**.**.**:****/v1
username: ******
password: **********
pool:
initial-size: 10
max-size: 20
max-idle-time: 1m
max-acquire-time: 1m
max-validation-time: 1m
max-create-connection-time: 1m
max-life-time: 5m

I have given bellow the expected logs and the unresponsive logs bellow for your reference.

expected :
Screenshot from 2024-09-23 09-00-32.png

INFO Fetching configuration for factoryId: TEST001, appType: WEB,version: v1
is a service layer log just after the API is hit and before a database call and
2024-09-17 04:20:11 [,] - INFO Parsing configurations Map: {FactoryName=TEST001, ..............................................}
2024-09-17 04:20:11 [,] -DEBUG Retrieved configuration configurations: JsonByteArrayInput{{"FactoryName": "TEST001", ...................................................}

are logs from the service layer after a database call.

faulty log :
This is a log when the same API is hit multiple times at a moment where the service is unresponsive.

As you can see not even the query is executed.

When I hit the actuator/health endpoint when a service is unresponsive I got the same result.
When I hit the actuator/restart endpoint, the service breaks entirely. (sorry I do not have a log of this).

Has this happened to anyone?

Can someone explain why this could happen or give any pointers?

Thank you in advance.

Reply all

Reply to author

Forward

0 new messages