Hi,
I have several spring webflux microservices which are connected to the same postgres database on GCP cloud sql (A google managed service).
I keep constantly running in to an issue where one service out of the services connected to the database becomes unresponsive randomly.
When an api request is send to a service at a time like that, it tries to get a response from the service for a long time and finally throws an error from nginx after the request reaches max keep alive time.
I suspect that this is an R2DBC connection issue because all logs in the service layer upp until a database call is made are written to the log file, but no logs were found after that.
I have even enabled R2DBC debug logs but no query execution logs were found.
I cannot recreate or debug this due to it happening at random, I even did a load test but the service worked fine at that time.
This is how the connection to R2DBC is made in all services. I have adjusted the thread pool due to suggestions i got from other sources.
r2dbc:
url: r2dbc:postgresql://**.**.**.**:****/v1
username: ******
password: **********
pool:
initial-size: 10
max-size: 20
max-idle-time: 1m
max-acquire-time: 1m
max-validation-time: 1m
max-create-connection-time: 1m
max-life-time: 5m
I have given bellow the expected logs and the unresponsive logs bellow for your reference.
expected :

INFO Fetching configuration for factoryId: TEST001, appType: WEB,version: v1
is a service layer log just after the API is hit and before a database call and
2024-09-17 04:20:11 [,] - INFO Parsing configurations Map: {FactoryName=TEST001, ..............................................}
2024-09-17 04:20:11 [,] -DEBUG Retrieved configuration configurations: JsonByteArrayInput{{"FactoryName": "TEST001", ...................................................}
are logs from the service layer after a database call.
faulty log :
This is a log when the same API is hit multiple times at a moment where the service is unresponsive.

As you can see not even the query is executed.
When I hit the actuator/health endpoint when a service is unresponsive I got the same result.
When I hit the actuator/restart endpoint, the service breaks entirely. (sorry I do not have a log of this).
Has this happened to anyone?
Can someone explain why this could happen or give any pointers?
Thank you in advance.