After Google Maintenance CloudSQL stopped working

196 kali dilihat
Langsung ke pesan pertama yang belum dibaca

Arvind Stutzen

belum dibaca,
23 Feb 2020, 01.32.2623/02/20
kepadaGoogle Cloud SQL discuss
After Google Maintenance CloudSQL stopped working.  In error logs it displays as the server keeps restarting. 

Also in parallel tried to create a new instance and restore the old automated backup, even that too fails in the middle of restoring  :( . 

Using paid support subscription created ticket but it is not addressed until now. cant able to accept that Production DB down for 12hours. Being SaaS DB numerous customers are affected by it.  Are there any suggestions to get it resolved? 
Pesan telah dihapus

Arvind Stutzen

belum dibaca,
23 Feb 2020, 03.08.5923/02/20
kepadaGoogle Cloud SQL discuss

Log(attached below) shows that after Google updated the CloudSQL version this problem started. Still no response from Tech Support for created P1 case and our business been seriously affected by this 13+hrs downtime :(  . Any suggestions how it can be solved?

dbupgradefailure.png

dbupgradefailure.png

Hans Ravnaas

belum dibaca,
24 Feb 2020, 11.23.2224/02/20
kepadaGoogle Cloud SQL discuss
Arvind, I'm seeing SQL issues as well after the maintenance, so posting here in case it's related:

We are using Java 11 and Postgres 11. Our code has not changed in the last two weeks, but after the maintenance on 2/23, we starting having this error in the SQL server logs:

LOG: SSL error: DATA_LENGTH_TOO_LONG

Screen Shot 2020-02-24 at 8.18.44 AM.png

Any chance this is what you are seeing too?

Hans

Devin Homan

belum dibaca,
24 Feb 2020, 14.08.4924/02/20
kepadaGoogle Cloud SQL discuss
I am having intermittent connection issues since yesterday's maintenance.  It looks like it might be an internal problem with the mysqld.  Perhaps the maintenance created bad linked libraries.

key_buffer_size = 8388608
read_buffer_size = 131072
sort_buffer_size = 262144
max_threads = 4030

key_buffer_size + (read_buffer_size + sort_buffer_size) * max_threads = 1609833 K bytes of memory.

The log suggests that one of the linked libraries may be corrupt. I  see

Thread pointer: 0x34b20d125000
Cannot determine thread, fp=7f71fabfe770

Devin Homan

belum dibaca,
24 Feb 2020, 16.03.1324/02/20
kepadaGoogle Cloud SQL discuss
We were able to replicate the issue on a local MySQL instance.  There was a query being run that was causing a pointer exception.  We resolved the issue by changing the query.  It's possible that one of the MySQL libraries that Google updated has a bug.  This code had been working in CloudSQL before.

Chris Rasco

belum dibaca,
25 Feb 2020, 08.03.4925/02/20
kepadaGoogle Cloud SQL discuss
I had the same issue. Maintenance Saturday morning at 1am killed my Cloud SQL instance. Tried contacting support to no avail. Woke up this morning, after restoring from backup to a new instance and it's magically healthy now.

Saturday I attempted to restart, add HA, do anything I could to get the database back online. Nothing worked. Eventually it stopped reporting CPU metrics completely. At the maintenance start, it spiked at 100% as you can see.
sql operations.png
sql util.png

George (Cloud Platform Support)

belum dibaca,
25 Feb 2020, 11.42.2425/02/20
kepadaGoogle Cloud SQL discuss
Has your paid support ticket been addressed meanwhile? 

This is a known issue, and Engineering is working on it right now. I'll keep you informed in this thread on temporary workarounds, fixes, and recommendations. 

Hans Ravnaas

belum dibaca,
25 Feb 2020, 11.46.2325/02/20
kepadaGoogle Cloud SQL discuss
George, I can't answer whether Arvind's case has been addressed, but can you please elaborate on exactly what the known issue is and what the symptoms are please? A number of us has experienced issues since Saturday and I'm still spending hours trouble-shooting this. I'd rather not if I knew what you are working on addressing.

Thanks,
Hans

George (Cloud Platform Support)

belum dibaca,
25 Feb 2020, 11.53.4425/02/20
kepadaGoogle Cloud SQL discuss
The general issue can be simply described as CloudSQL having stopped working after Google maintenance. There is no doubt that Engineering is working on it. I'll provide more detail shortly. 

Hans Ravnaas

belum dibaca,
25 Feb 2020, 12.00.4825/02/20
kepadaGoogle Cloud SQL discuss
Thanks, George. A couple suggestions that would really help us service consumers. Please update https://status.cloud.google.com/ to reflect that SQL is having issues. Also, I feel we were all stoned walled here for 24 hours or more, scrambling to figure out what was up. Being a bit more proactive and acknowledging that you are aware of the issue much sooner would be really really helpful. I have already wasted a full day on this when it was already a known issue :-(

Chris Rasco

belum dibaca,
26 Feb 2020, 09.57.3226/02/20
kepadaGoogle Cloud SQL discuss
This is what I got from Google Support yesterday.

Thanks for contacting Google Cloud Platform support. I understand that your Cloud SQL instance (xxxxxxx) did get stuck on an operation on 2020-02-22 at 1:00 AM EST which resulted in “unknown error” and prevented you from taking any further actions. I understand that this situation raises concern and I will do my best to investigate the root cause of the issue. I do also apologize for the delay in response, as this was initially created on Feb 22 and was actioned in a timely manner.

The case was initially routed to a different team and since the support package on your account does not include the weekends, this case stayed unattended during that time, however, it is now in the right department and I’ll be providing support on this case.

 I noticed that the project name is missing from the case and I had to make sure I am looking at the right project and Instance. By checking your email and the projects assigned to your email, I came to the conclusion that the project in question is ‘xxxxxxxxx’. Please correct me if I'm looking at the wrong project.

Furthermore, By reviewing the cloud SQL instance status (xxxxxx), I can confirm that it is currently up and in the Running state. By looking at the recent operations, I noticed that there was a roll out scheduled for 1AM EST on Feb 22, however, it did not complete and therefore the instance got stuck on that operation up until Feb 24 at 11:03 AM. Once the instance is in the middle of a maintenance, no further action can be taken, unless the operation is cancelled by a Cloud SQL specialist or it is completed. Our Cloud SQL specialist was able to cancel it on Feb 24 at 11:03 AM EST and that enabled the instance to go back to Running status. This should no longer be an issue and should not be a concern anymore.  Please let me know if you still experience any issue on your instance and I shall investigate it further.

Once again, I do apologize for the delay in responding to your request as this was due to inappropriate assignment of the case. Please let me know if you have further questions or require further clarification regarding the issue and I shall address them accordingly.
Balas ke semua
Balas ke penulis
Teruskan
0 pesan baru