Cloud SQL proxy causes significant additional latency

2,213 views
Skip to first unread message

Daniel Alm

unread,
May 30, 2018, 7:32:26 AM5/30/18
to Google Cloud SQL discuss
(This may be related to https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/87.)

I've been experimenting with using Cloud SQL Proxy for my PostgreSQL database vs. connecting directly via TCP. To illustrate, here's my Stackdriver Trace latency graph:

screen shot 2018-05-27 at 20 22 32

Around 18:15 I switched from directly connecting via TCP from GKE to Cloud SQL over to using a Cloud SQL proxy sidecar that my service connects to via a Unix socket (connecting to the proxy via TCP yields very similar results). As one can see, there are a few spikes in latency when the connection pools get warmed up, and then the latency distribution is significantly above the latency before, which was for direct connections.

My Stackdriver Trace spans also indicate that the increase in overall latency can be attributed to an increase in database call latency. Trivial database calls that normally take <=10ms via a direct connection take up to 100ms through the proxy.

Is this expected, can I do anything about it, and, if there is no remedy, what would be the best way to connect to the DB via TCP from GKE? I'm currently using a /16 netmask to allow connections grom GCE, but that's probably bad for production. Also, I'm not sure whether direct traffic to Cloud SQL is encrypted — if you could provide insights into that, that would be appreciated.

FYI, the GKE cluster and the Cloud SQL database are located in the same zone (europe-west3-a IIRC). The database is only ~500 MB right now (fits completely into memory and has appropriate indexes) and has low load. Also, as stated above, I am using connection pooling and have confirmed that that works.

nau...@google.com

unread,
May 31, 2018, 2:46:39 PM5/31/18
to Google Cloud SQL discuss
Hello Daniel

It appears to be related to the github issue you mentioned

Currently, I am not aware of any workaround about the cloud proxy latency caused  by the connection handshakes. However, You mentioned that with connection pooling it is working fine, which is related to decrease in the number of new connection handshakes. Cloud SQL instance can be directly accessed from GKE, using the Cloud SQL external IP within the container with appropriate routing. In addition, you also use secrets to store the Cloud SQL information and use them in application. Here is one of the third-party link describing this process. it will also require to manage the whitelisting IPs for the Cloud SQL connections. 

By default for direct connection, Cloud SQL connections may not be encrypted. It requires to configure SSL as per this documentation. Also, I found this discussion thread providing more insight on the same. 

Our engineering team is making efforts for cloudsql-proxy, to reduce the latency spike almost every hour by refreshing the SSL certificate before it expires its lifetime of one hour. You can keep track of this git hub feature request or Google public feature request for further updates regarding this. 

Daniel Alm

unread,
May 31, 2018, 2:57:28 PM5/31/18
to Google Cloud SQL discuss
Hello, thanks for your reply. My issue is NOT related to the SSL handshake. I am ALWAYS using connection pooling, yet latency is much higher when connecting via the proxy as opposed to direct connections, even after the connection pools have been filled. Could you please take another look as to what could be going one here?

nau...@google.com

unread,
Jun 1, 2018, 2:08:09 PM6/1/18
to Google Cloud SQL discuss
Hello 

cloudsql-proxy uses SSL and each connection go through the SSL handshake which adds up some delay, however on the other hand direct connection to the Cloud SQL (with SSL disabled) work without the SSL overhead providing less connection time. Apart from the mentioned possible reasons, I would suggest checking performance on pods to ensure it is not hitting and cpu/memory bottleneck as resource constraints on the pods could cause latency with SSL handshakes and such.

Daniel Alm

unread,
Jun 1, 2018, 3:34:15 PM6/1/18
to Google Cloud SQL discuss
I am using cross-request connection pooling (ie I only connect to the database ONCE overall), so the handshake and higher delay should only happen ONCE, right? However, I see higher latency for every single request, even though only the first request creates a new PostgreSQL connection. So what could be the reason for that?

Also, my nodes are almost idle, so overloaded CPU should not be the reason, either.

Daniel Alm

unread,
Jun 2, 2018, 12:27:28 PM6/2/18
to Google Cloud SQL discuss
Hi, I checked again and it seems like removing my Pods‘ Kubernetes CPU Limits has indeed resolve die the issue. So the issue was not due to connection pooling, but not due to Cloud SWL Proxy, either. Thanks for your patience!
Reply all
Reply to author
Forward
0 new messages