Connection issues using cloudsqlproxy and Cloud SQL 2 gen HA

80 views
Skip to first unread message

Jan Schnurle

unread,
Oct 22, 2019, 10:50:39 AM10/22/19
to Google Cloud SQL discuss
Hi there!

We recently ran into networking issues between App servers on Compute Engine and SQL MYSql server (2 gen. HA).
This issue has affected us now two times and during the incident we are unable to connect to the SQL MYSql server via cloudsqlproxy or directly via IP.
During the incident we also noticed several updates being applied to the SQL MYSql 2 Gen server, even if out of the scheduled update time, (evidence further below).

Please note that we did not face any kind of networking issues since the deployment of the infrastructure, which happened ~14 moths ago.

Evidence of connection issues:
  • 2019/10/17 07:18:32 couldn't connect to "#######-website:us-east1:#####mysql01": dial tcp ########:3307: getsockopt: connection timed out
Example of logs collected on the Cloud SQL console:
  • Aborted connection ###### to db: '######' user: '#######' host: 'cloudsqlproxy~########' (Got an error reading communication packets)
  • Slave I/O for channel: error reconnecting to master 'cloudsqlreplica@#########:3306' - retry-time: 60  retries: 1, Error_code: 2003
Several updates being applied during incident:
image.png
Updates being applied outside planned window:
image.png

Solution:
The only way to reestablish the connection between App servers and SQL MYSql server was manually triggering the failover.
Once the process completes, the connection is reestablished and works as expected.

Did anybody out there face the same issue?

Thank you very much!




Yasser Karout

unread,
Oct 22, 2019, 1:31:27 PM10/22/19
to Google Cloud SQL discuss
Hello,

Did anything change in terms of rate of connections to the instance? Aborted connections usually means connections to the database are not being terminated properly. Please see these docs [1][2]. 

The restart could be fixing the issue temporarily then the instance will do the same thing after the connections flood again.

But if you believe this is on the Google side, you can open a new issue here so that we can investigate further [3].

Jan Schnurle

unread,
Oct 22, 2019, 1:39:27 PM10/22/19
to Google Cloud SQL discuss

Images that broke are here.

Several updates being applied during incident:

scheduled-mantainance.png



Several updates being applied during incident:

updates.png

Jan Schnurle

unread,
Oct 23, 2019, 9:31:01 AM10/23/19
to Google Cloud SQL discuss
Hello Yasser, thank you very much for your interaction!

Did anything change in terms of rate of connections to the instance?

No changes, same amount of connections etc for the las few months.
 
Aborted connections usually means connections to the database are not being terminated properly. Please see these docs [1][2].

Thanks! I will look into it. The app in question is a Wordpress.
We noticed that this error appears from time to time, but did not necessarily happen constantly.
 
The restart could be fixing the issue temporarily then the instance will do the same thing after the connections flood again.

During the incident, restarting did not solve our issue.
We had no connections at all between app and db servers.

 
But if you believe this is on the Google side, you can open a new issue here so that we can investigate further [3].

I will do that, thank you very much.
Reply all
Reply to author
Forward
0 new messages