Project suspended

742 views
Skip to first unread message

Michael Ibanes

unread,
Apr 20, 2017, 8:54:21 PM4/20/17
to gce-discussion
Hello,


I have been using google cloud for a couple of years and highly like it, as i do work on many customers's VMs i dragged a fair few over from aws , rackspace etc.. 

Tonight i received a 'project suspended' email from google, concerning my own project, no warning , no explanation other than "appears to violate our Terms of Service".  The project is not accessible, the VMs are off,  all i can access is an appeal submission form, no way to contact google, and i dont even have a working email address to get any updates anyway because my server is offline !.  It's been hours , the forms says it can take 2 days !.   I could not even upgrade support is i wanted to get phone access !

Ok so , maybe a server got hacked ? who knows , but it must be life threatening to shutdown a whole project. And if it was life threatening , you would think that someone would shutdown the concerned VM or service and let you know there is an issue so you can fix it  !

So, it's been hours, i'm already getting complaint calls , in 2 days i wont have much of a business left, beside i cant even access any backups or anything to try to get some sites back up somewhere else . But take that to the next level, what am i going to tell the customers that i setup projects for on google and told them that this type of things would never happen!.  I might only have a couple of VMs , but those customers have 15 or so VMs and other services per project that interact with each other.  If something like that would happen to them, it would fall back right on me and considering the loss they would incur, i would probably have to flea the country !.

And i'm going to come up with a big lie tomorrow to justify why i didnt do my work , otherwise i'll be spending the next 2 months freely moving their services out of google.  

Forgive me , but that's totally insane, all those great services for redundancy and fail safe solutions and the whole thing is just turned off at a click without a warning, reason, contact  nor even a way to get hold of your data ? . 
Whatever the reason might be, it's a deadly flow.

I cant be contacted, my emails are down , i'm using Google Cloud ! 

Michael Ibanes

Michael Basilyan

unread,
Apr 20, 2017, 9:12:55 PM4/20/17
to Michael Ibanes, gce-discussion
Hi Michael, 
Thanks for reaching out. I will make sure your email reaches the right folks.

Thanks,
Mike



--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/ed9a41ce-288a-4912-8a08-f59162997052%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted
Message has been deleted
Message has been deleted

Michael Ibanes

unread,
Apr 21, 2017, 1:58:27 PM4/21/17
to gce-discussion
Follow up

No resolution, still no hint or contact.  Service unavailable for now:
[conn] [RED Duration : 23h:17m:34s]

Michael

Michael Ibanes

unread,
Apr 21, 2017, 2:19:48 PM4/21/17
to gce-discussion

Google seem to have deleted this previous post by mistake so i'm reposting it.
------------------------------

Thanks for your help.   The account is still disable, i wouldn't even know if i was contacted anyway, i have no email and mobiles are out of batteries from complaint calls.


An eye opener experience i must say , 20 years as a sys admin and i didn't see that one coming. Google is the pretty girl or guy  with nice features that dumps you with a text message. Great to play with and have fun but if you think you can do serious business with, you pay the high price.


I can't do much for my stuff right now other than helplessly watching the disaster unfold with a first row seat, i'm about to start migrating other customer's services which are on separate projects/accounts to more business safe platforms after informing them of what could happen to them overnight, complete waste of my time for weeks to come but i don't want to be personally responsible, the savings are not worth the risk. 


I have since found in some tech related forums that quite a fair few businesses got burnt the same way, i wish i had found those before, i've added my experience to the pool, it might save a few souls. 


Thank you again for trying to help.


Michael Ibanes

Paul Nash

unread,
Apr 21, 2017, 4:44:05 PM4/21/17
to gce-discussion
Hi Michael,

I’m sorry to hear about the trouble this has caused you.

We were made aware of a serious terms of service violation with the project. After investigation, we sent a notification directly to the project owners, including the email address you are posting from. Unfortunately we did not receive a response, and as a result it became necessary to suspend the project. 

Our team has reached out to you to discuss the situation privately, and we hope we can reach a resolution quickly. We can’t discuss the details of such issues on public forums, but we do want to respond to your concerns.

Please look for an email from our team referencing case number 9-9452000015827.

Thanks,
- Paul Nash, GCE Product Manager

PS - it does appear that several posts are missing from this thread. I apologize to you and to this community for that - it is absolutely not our policy to delete anything. Unfortunately it looks like the rapid rate of posting on the thread may have triggered a spam protection threshold. I have confirmed with our team that nobody actively deleted posts. 

Michael Ibanes

unread,
Apr 21, 2017, 7:14:07 PM4/21/17
to gce-discussion
Hi Paul,

Thank you for restarting this project.  'Trouble' would be an understatement, more like an un recoverable situation now.
Yes please i would like Google to respond to my concerns which happens to be the concerns of an a whole bunch of other google customers now.

A legit message (from an outraged person) featuring the totally inappropriate photos of the paintings from a recent 'Art' expo in France was posted on a site hosted on a VM.   
It was violating terms of service. I'm told i was notified about it by email several days ago. I obviously didn't get that email, probably flagged as junk ironically due to it's content. 

It took me 10 minutes to go delete that forum message when i was made aware of it earlier, and I even thought of blocking port 80 before starting the VM, amazing after 2 days of no sleep.


Google blocked a whole project which could have been containing dozens of VMs and services and took 1 DAY and 2 HOURS to actually tell me WHY everything is off .  Because of 1 missed email ! . And that's after i exhausted all possible avenues to contact someone to just tell me WHY ?.  Are you aware that even if you are desperate enough to pay the high price for support for that one phone call that could save your business , you cant on a blocked project ?

The last time i had a server down for more than 4 hours, was 25 years ago i had to fly interstate to get parts and I still got it back online in 1/2 the time it took google to send me 1 email and allow me access to my services. 
I didn't even think it could even ever happen in 2017, i didn't have a fallback plan .  I'm now making sure it will never happen again and i'm sure you guessed how.


So, most importantly, what would you like me to tell those larger customers i setup on google , who start loosing money and calling me when they service is unavailable for 5 minutes ?  To quadruple check their junk  box , cause if they miss an email from google their 200 employees might have to play solitaire for a couple of days ?

Seriously , i'd like google to tell me what those customers can expect, the same treatment as i received today and that's just how it is with google, or do they have other options , would platinum support even make a difference. 
 

Thanking you for helping.


Mike Hardy

unread,
May 12, 2017, 1:44:39 PM5/12/17
to gce-discussion
Looks like you're not alone Michael. We're facing this EXACT same issue. It has been over 26 hours and literally no response. We are 100% confident that no email was ever sent. We've scoured every corner of every single email account. Nada.

Google, seriously, this is really, really bad, and I am sure you know this, but how does the problem still persist after Michael highlighted this massive issue weeks ago? There is a silver lining - you did highlight the need for a disaster recovery plan. Unfortunately, we'll be sure to implement that plan somewhere else.

Michael Ibanes

unread,
May 12, 2017, 2:45:51 PM5/12/17
to gce-discussion
And there's been a few others since ! , i'm still waiting for an answer to my last post, it's been 20 some days , i think it says it all. 
It's obviously not a safe and reliable platform for business, I'm slowly moving customer VM's away, i'm not willing to risk it, i don't know anybody who would if they knew about it at the first place. I'm making sure to warn other people, in much higher traffic areas than those forums.  The least i can do, while i'm waiting for an answer to my post :-)

And if you think you are cranky now, wait till you know why they suspended it, probably some ridiculous reason too.

Paul Nash

unread,
May 12, 2017, 3:21:26 PM5/12/17
to Michael Ibanes, gce-discussion
Folks,

I appreciate that this is an extremely frustrating experience for each of you, and we are working on a more formal response. However, our more effective engagement for solving a specific case is to respond to customer's situation directly, not generically in a public forum.

Both of your situations, and anybody else who chooses to post here are actually incredibly rare and unique cases, and generalizing to what all customers might experience is not realistic.

To be clear, we don't like doing takedowns of any kind. We take it incredibly seriously, and it is an option of very last resort. Yet the processes we use and controls we have in place work very well for the vast majority of cases where we have to apply them. That said, it is not acceptable to us that there are *any* cases where customers have a bad experience, and we routinely investigate each one of those to learn how it could have been avoided.

I would be happy to continue this conversation in a productive vein of discussing what we're doing to make improvements. We cannot however discuss the details of what our criteria are, because that helps bad actors (not you, others) take advantage. We can reiterate some relevant parts of our Terms of Service, if that would be valuable. We cannot effectively engage in hypotheticals like "what should we tell all customers that would consider using Google."

Please feel free to post your specific questions or concerns here, and we will do our best to address them.

(And, note, Mike's account has been reinstated while we continue discussing the issue with him)

Thanks,
-P

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.
---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.

Mike Hardy

unread,
May 12, 2017, 6:04:12 PM5/12/17
to gce-discussion, goo...@techsoup.com.au
In defense of Paul and his team, I will say that once I found them, they were quite helpful and very quick to respond. I just wish that it were easier to find these helpful people, because, quite simply, this is an extremely time sensitive issue and a 2 business day turnaround could be enough to bankrupt a business or at least cause debilitating harm. 


On Friday, May 12, 2017 at 12:21:26 PM UTC-7, Paul Nash wrote:
Folks,

I appreciate that this is an extremely frustrating experience for each of you, and we are working on a more formal response. However, our more effective engagement for solving a specific case is to respond to customer's situation directly, not generically in a public forum.

Both of your situations, and anybody else who chooses to post here are actually incredibly rare and unique cases, and generalizing to what all customers might experience is not realistic.

To be clear, we don't like doing takedowns of any kind. We take it incredibly seriously, and it is an option of very last resort. Yet the processes we use and controls we have in place work very well for the vast majority of cases where we have to apply them. That said, it is not acceptable to us that there are *any* cases where customers have a bad experience, and we routinely investigate each one of those to learn how it could have been avoided.

I would be happy to continue this conversation in a productive vein of discussing what we're doing to make improvements. We cannot however discuss the details of what our criteria are, because that helps bad actors (not you, others) take advantage. We can reiterate some relevant parts of our Terms of Service, if that would be valuable. We cannot effectively engage in hypotheticals like "what should we tell all customers that would consider using Google."

Please feel free to post your specific questions or concerns here, and we will do our best to address them.

(And, note, Mike's account has been reinstated while we continue discussing the issue with him)

Thanks,
-P
On May 12, 2017 11:45 AM, "Michael Ibanes" <goo...@techsoup.com.au> wrote:
And there's been a few others since ! , i'm still waiting for an answer to my last post, it's been 20 some days , i think it says it all. 
It's obviously not a safe and reliable platform for business, I'm slowly moving customer VM's away, i'm not willing to risk it, i don't know anybody who would if they knew about it at the first place. I'm making sure to warn other people, in much higher traffic areas than those forums.  The least i can do, while i'm waiting for an answer to my post :-)

And if you think you are cranky now, wait till you know why they suspended it, probably some ridiculous reason too.

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.

---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.

Steve Wright

unread,
May 17, 2017, 6:54:57 PM5/17/17
to gce-discussion
Hah, yeah one day quite randomly my RADIUS server vanished after a routine system upgrade and reboot.  You can imagine the consequences of that...  I found the fault and fixed it myself, and now they get shitty with me because I won't tell them what the fault was... 

Paul Nash

unread,
May 17, 2017, 8:25:46 PM5/17/17
to Steve Wright, gce-discussion
Hi Steve, unfortunately I'm not able to tell what you're alluding to, or who "they" is, so I'll follow up privately. I'd like to understand what you're saying and if there's something we can be doing better, to make sure the right people have that feedback.

On Wed, May 17, 2017 at 3:52 PM, Steve Wright <stevew...@gmail.com> wrote:
Hah, yeah one day quite randomly my RADIUS server vanished after a routine system upgrade and reboot.  You can imagine the consequences of that...  I found the fault and fixed it myself, and now they get shitty with me because I won't tell them what the fault was... 

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.

---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/009ad2d3-08f4-4e47-8fa0-41c59cace8d6%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Paul R. Nash | Group Product Manager, Compute Engine | paul...@google.com | 206-876-1620

Steve Wright

unread,
May 17, 2017, 8:43:38 PM5/17/17
to gce-discussion, stevew...@gmail.com
https://issuetracker.google.com/issues/38303466

Here ya go, have a read.  I PAY to use their stuff - they can EITHER be more helpful OR pay me, OR refund me, I don't mind which.  This aint no open-source games, this is business and I use GCE because it's SUPPOSED TO BE stress free!  Stress free my arse..  Damn lucky I had a spare RADIUS server up my sleeve..

S


On Thursday, 18 May 2017 12:25:46 UTC+12, Paul Nash wrote:
Hi Steve, unfortunately I'm not able to tell what you're alluding to, or who "they" is, so I'll follow up privately. I'd like to understand what you're saying and if there's something we can be doing better, to make sure the right people have that feedback.
On Wed, May 17, 2017 at 3:52 PM, Steve Wright <stevew...@gmail.com> wrote:
Hah, yeah one day quite randomly my RADIUS server vanished after a routine system upgrade and reboot.  You can imagine the consequences of that...  I found the fault and fixed it myself, and now they get shitty with me because I won't tell them what the fault was... 

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.

---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.

Paul Nash

unread,
May 17, 2017, 9:20:59 PM5/17/17
to gce-discussion, stevew...@gmail.com
Steve,

I'm sorry to hear that you're frustrated about the issue linked below. As I understand it, you had a problem with VM startup, and our support team tried to help you debug the issue. You were able to find the cause of the issue yourself, and clearly feel that it is Google's fault. You stated that you feel we should refund you for your time and effort, because you pay for the service, but you're not willing to explain why you feel that way. You feel that we're being unreasonable because we haven't credited you an unspecified amount of money for this issue that we still don't have information about. So, some thoughts about that:
  • gce-discussion and our public issue tracker are free public support forums, staffed by hard working Google employees, but they are not a substitute for paid support. If you have an urgent business critical support need or timeframe, you might want to check into our paid premium support options where you could file a case and get a more dedicated response. This is a common model in the industry.
  • You seem to be suggesting that we should have found the problem as fast as you did. In our free support channels, it is not common that our staff would have the time or detailed information needed to attempt setting up and testing an exact reproduction of every user reported issue in order to determine the fix themselves. It happens sometimes, depending on the issue.
  • Regardless of whether the product is open source, the public issue tracker *is* a public community forum for reporting and discussing problems. If you're not interested in participating in the community or documenting something you found out, that is certainly your choice, but please don't criticize Google simply for asking if you would share your knowledge.
  • Were this an issue that were covered by our SLA (it doesn't sound like it is though), you would still need to explain to us what happened in order for us to make a credit determination.
I certainly understand that you had a problem with a piece of GCE, and feel that it was not as easy to fix as it should have been (or maybe that the problem shouldn't have happened in the first place) and that you're frustrated about that. We would absolutely like to understand more about what happened, and learn what we can do better in the future. But I also hope that you can put yourself in our shoes for just a second, and see why it doesn't make sense for us to give service credits to customers who merely claim it's our fault that they had an issue with a VM, or that "threaten" to broadcast on social media otherwise.


Note: I'm changing the subject to reflect that the thread you've added onto is about a different and unrelated issue to project suspensions. As I've mentioned, these public forums are provided for free, and staffed for the benefit of our community of users. Common standards of community conduct do apply, including avoiding cross posting, thread hijacking, or use of profanity. We welcome your participation and even your criticism, as long as it's respectful.

Aliaksandr Valialkin

unread,
May 18, 2017, 12:31:26 PM5/18/17
to gce-discussion
We just received the email from google-clou...@google.com with the following contents:

Action required: Critical problem with your Google Cloud Platform / API project ClickHouse Test (id: clickhouse-test)



Dear Developer,
We have detected that your Google Cloud Project ClickHouse Test (id: clickhouse-test) has been committing denial of service (DoS) attacks via 35.185.60.76 between 2017-05-18 00:43 and 2017-05-18 04:18.
You can fix the problem by stopping the instance(s) as soon as possible. Verify the outgoing traffic usage of your instance and if the behavior is intentional, please provide a business justification for this.
Meanwhile, to protect our users, we have set an outbound bandwidth rate limit on your instance. Please note that as the project owner you are responsible for securing the software installed on your machine. To learn more about securing your instance visit the Securing Instances section of the Cloud Security Help Center.
We will suspend your project in 3 days unless you correct the problem andrespond to this email by submitting an appeal. Please note that you should be logged in as the project owner to access the appeals page. For more help on submitting an appeal or to learn more about the process check the Policy Violation FAQ.
If the behavior of your instance starts affecting the service or other users in an egregious manner, we may have to suspend the project before the warning window expires. Please get back to us as soon as possible to help prevent that situation.

Just after that two of our 13 instances that belong to db cluster have been limited in network traffic to a few kpbs (the second once is 104.196.177.224). It looks like the network bandwidth limit applies also to persistent disk read/writes, because these servers completely stopped writing and reading data. Reboot didn't help - after the reboot the db failed to prefetch data from persistent disks.

We store up to 400Tb of data in this db. The data is sharded among 13 instances, so now we effectively lost access to 2/13th of our data stored on the suspended instances and had to re-configure db cluster to write data to the remaining 11 instances.

These instances communicate only by a single TCP port with the outside world - 9000 (clickhouse) and accept TCP packets only by a few TCP ports - 22 (ssh), 9100 (prometheus), 8123 and 9000 (clickhouse). Access to ports 8123 and 9000 is restricted to a small set of our IPs via both cloud firewall and clickhouse configs. We ssh into these hosts only via public key - password authentication is disabled. So we are confident that these instances couldn't participate in any ddos mentioned in the email from google-clou...@google.com above.

This looks very weird.

Swati Kulshreshth

unread,
May 18, 2017, 12:46:05 PM5/18/17
to Aliaksandr Valialkin, gce-discussion
Hi Aliaksandr,

Please respond to the message by clicking on the "Appeals" link in the email and someone from my team will get back to you.

Thanks,
Swati



On Thu, May 18, 2017 at 9:28 AM, Aliaksandr Valialkin <val...@gmail.com> wrote:
We just received the email from google-cloud-compliance@google.com with the following contents:
These instances communicate only by a single TCP port with the outside world - 9000 (clickhouse) and accept TCP packets only by a few TCP ports - 22 (ssh), 9100 (prometheus), 8123 and 9000 (clickhouse). Access to ports 8123 and 9000 is restricted to a small set of our IPs via both cloud firewall and clickhouse configs. We ssh into these hosts only via public key - password authentication is disabled. So we are confident that these instances couldn't participate in any ddos mentioned in the email from google-cloud-compliance@google.com above.

This looks very weird.

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.

---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/e5ac44e5-a914-4add-8e17-41faadd8875c%40googlegroups.com.

Aliaksandr Valialkin

unread,
May 18, 2017, 12:55:50 PM5/18/17
to gce-discussion, val...@gmail.com
Thanks, Swati,

Already did this.

Additionally, only one of 7 project owners received the email from google-cloud-compliance@ - the last one in the list on the 'appeal form' page. Thanks God, this person wasn't OOO and we didn't suffer from the instant unexpected account suspension like the Michael above :)

This looks like a serious bug in your mail delivery system.

Hope you'll resolve both issues quickly.


On Thursday, May 18, 2017 at 7:46:05 PM UTC+3, Swati Kulshreshth wrote:
Hi Aliaksandr,

Please respond to the message by clicking on the "Appeals" link in the email and someone from my team will get back to you.

Thanks,
Swati


On Thu, May 18, 2017 at 9:28 AM, Aliaksandr Valialkin <val...@gmail.com> wrote:
We just received the email from google-clou...@google.com with the following contents:

Action required: Critical problem with your Google Cloud Platform / API project ClickHouse Test (id: clickhouse-test)



Dear Developer,
We have detected that your Google Cloud Project ClickHouse Test (id: clickhouse-test) has been committing denial of service (DoS) attacks via 35.185.60.76 between 2017-05-18 00:43 and 2017-05-18 04:18.
You can fix the problem by stopping the instance(s) as soon as possible. Verify the outgoing traffic usage of your instance and if the behavior is intentional, please provide a business justification for this.
Meanwhile, to protect our users, we have set an outbound bandwidth rate limit on your instance. Please note that as the project owner you are responsible for securing the software installed on your machine. To learn more about securing your instance visit the Securing Instances section of the Cloud Security Help Center.
We will suspend your project in 3 days unless you correct the problem andrespond to this email by submitting an appeal. Please note that you should be logged in as the project owner to access the appeals page. For more help on submitting an appeal or to learn more about the process check the Policy Violation FAQ.
If the behavior of your instance starts affecting the service or other users in an egregious manner, we may have to suspend the project before the warning window expires. Please get back to us as soon as possible to help prevent that situation.

Just after that two of our 13 instances that belong to db cluster have been limited in network traffic to a few kpbs (the second once is 104.196.177.224). It looks like the network bandwidth limit applies also to persistent disk read/writes, because these servers completely stopped writing and reading data. Reboot didn't help - after the reboot the db failed to prefetch data from persistent disks.

We store up to 400Tb of data in this db. The data is sharded among 13 instances, so now we effectively lost access to 2/13th of our data stored on the suspended instances and had to re-configure db cluster to write data to the remaining 11 instances.

These instances communicate only by a single TCP port with the outside world - 9000 (clickhouse) and accept TCP packets only by a few TCP ports - 22 (ssh), 9100 (prometheus), 8123 and 9000 (clickhouse). Access to ports 8123 and 9000 is restricted to a small set of our IPs via both cloud firewall and clickhouse configs. We ssh into these hosts only via public key - password authentication is disabled. So we are confident that these instances couldn't participate in any ddos mentioned in the email from google-clou...@google.com above.

This looks very weird.

--
© 2017 Google Inc. 1600 Amphitheatre Parkway, Mountain View, CA 94043
 
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-dis...@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.

---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussio...@googlegroups.com.
To post to this group, send email to gce-dis...@googlegroups.com.

Aliaksandr Valialkin

unread,
May 19, 2017, 6:22:34 AM5/19/17
to gce-discussion, val...@gmail.com
Hi there,

Here is a follow up (see ticket #12773581 for details):

The network bandwidth restrictions on two of our instances have been lifted.

But persistent disks attached to these instances (both root disks and data disks) contained errors and didn't mount properly after the reboot. They were mounted in read-only mode. So we had to spend a few hours fixing filesystem errors. It looks like the network bandwidth restrictions also apply to persistent disks, so this resulted in disks' corruption.

Broken filesystems led to broken database files with more than 60TB of business-critical data. Thanks to genius design of filesystem data layout in clickhouse, almost all of the data has been recovered. Only the last hour of data before the restrictions start has been lost.

Our wishes to Google Cloud Platform:

- To provide more details on the incident, so we could investigate it and justify the downtime of our services.
- To provide guidance on how to avoid such incidents in the future. We started thinking on moving our services outside Google Cloud Platform. We'll definitely migrate from GCP if such an incident will be repeated in the future.
- To provide compensation for the downtime and data loss.

- To figure out how to avoid persistent disk corruption when network bandwidth restrictions are applied.

- To investigate why only a single person out of 7 (seven) project owners of the project received the notification from google-clou...@google.com about the incident. According to this thread there are chances that the next time the notification email may never reach any project owner leading to sudden project termination.

- To figure out more sane procedure for reaching project owners before applying restrictions to project resources.

Paul Nash

unread,
May 19, 2017, 9:29:15 AM5/19/17
to Aliaksandr Valialkin, gce-discussion
Hi Aliaksandr,

I'm sorry to hear about the issues you encountered while recovering your instances. We are investigating the circumstances of your report (which is a normal procedure we follow), but as we've mentioned in previous scenarios, this forum is not the channel to discuss the details of particular customers' support cases.

We will contact you directly to discuss your requests in more detail. We usually find that the circumstances surrounding a warning or suspension are often quite unique for each customer that encounters this, but of course if any general insights or guidelines are appropriate to publish for all users, we will do that in due time (such as via our documentation pages).


Thanks,
-P


On May 19, 2017 3:22 AM, "Aliaksandr Valialkin" <val...@gmail.com> wrote:
Hi there,

Here is a follow up (see ticket #12773581 for details):

The network bandwidth restrictions on two of our instances have been lifted.

But persistent disks attached to these instances (both root disks and data disks) contained errors and didn't mount properly after the reboot. They were mounted in read-only mode. So we had to spend a few hours fixing filesystem errors. It looks like the network bandwidth restrictions also apply to persistent disks, so this resulted in disks' corruption.

Broken filesystems led to broken database files with more than 60TB of business-critical data. Thanks to genius design of filesystem data layout in clickhouse, almost all of the data has been recovered. Only the last hour of data before the restrictions start has been lost.

Our wishes to Google Cloud Platform:

- To provide more details on the incident, so we could investigate it and justify the downtime of our services.
- To provide guidance on how to avoid such incidents in the future. We started thinking on moving our services outside Google Cloud Platform. We'll definitely migrate from GCP if such an incident will be repeated in the future.
- To provide compensation for the downtime and data loss.

- To figure out how to avoid persistent disk corruption when network bandwidth restrictions are applied.

- To investigate why only a single person out of 7 (seven) project owners of the project received the notification from google-cloud-compliance@google.com about the incident. According to this thread there are chances that the next time the notification email may never reach any project owner leading to sudden project termination.
Email preferences: You received this email because you signed up for the Google Compute Engine Discussion Google Group (gce-discussion@googlegroups.com) to participate in discussions with other members of the Google Compute Engine community and the Google Compute Engine Team.

---
You received this message because you are subscribed to the Google Groups "gce-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gce-discussion+unsubscribe@googlegroups.com.
To post to this group, send email to gce-discussion@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gce-discussion/7fdbf113-816d-443f-ad0c-7a7788f41260%40googlegroups.com.

Dattatreya Yalsangiker

unread,
Aug 30, 2018, 1:36:21 AM8/30/18
to gce-discussion
Same experience here, The google people doesn't tell WHY the project is suspended and i have to wait for days to get my instances back, surely instead of paying hefty price it is always good to move out of such unhealthy support environment.. I do not recommend Google cloud anymore.. 
Reply all
Reply to author
Forward
0 new messages