The region us-west2 does not have enough resources - on deployment

301 views
Skip to first unread message

Pip Jones

unread,
Mar 12, 2022, 2:05:40 PM3/12/22
to Google App Engine
When trying to deploy my App Engine Flex app today, I am getting this error, after the build has completed.

`ERROR: (gcloud.app.deploy) Error Response: [9] An internal error occurred while processing task /app-engine-flex/flex_await_healthy/flex_await_healthy>2022-03-12T14:09:32.742Z6575.hg.2: The region us-west2 does not have enough resources available to fulfill the request. Please try again later.`

After about 10x attempts, and about 2 hours, it killed off my previous running version's two instances, and so now my company's app it down, and I cannot bring it back up. 

I get this error pretty regularly when deploying new versions (almost every time) but it usually succeeds after a couple of attempts, so I've just lived with it. But now my site is down and it's 4 hours later, and I'd really like to know if there's something I can do to fix it, or is it just a case of waiting for the zone to have more capacity?

A newly deployed version shows up in the console (and command line version list command), for a while but then disappears on its own.

I have checked my quotas under IAM & Admin, and nothing is above 30% allocation at most. Besides now my site is not running, I don't have many resources in use.

I noticed the previous version had 2 instances which seemed stuck in "restarting" state from a couple of weeks ago. I killed them off manually in the console thinking these might have been consuming resources. I wonder if this has somehow skewed the auto-scaler? I was hoping it would eventually repair itself, as it seems GCP sometimes takes a while to do stuff in the background.

I have tried restarting the previous version in the console, but it just sits at 0 instances. It's autoscaled, but the autoscaling has stopped working. I tried stopping it, waiting, then restarting it. 

I have checked the stackdriver logs and it's definitely a ZONE_RESOURCE_POOL_EXHAUSTED error.
e.g.
serviceName: "compute.googleapis.com"
status: {
code: 8
details: [
0: {
value: {
zoneResourcePoolExhausted: {
resource: {
project: {
canonicalProjectId: "XXX"
}
resourceName: "us-west2-b"
resourceType: "ZONE"
scope: {
scopeName: "global"
scopeType: "GLOBAL"
}
}
}
}
}
]
message: "ZONE_RESOURCE_POOL_EXHAUSTED"
}


I have tried increasing the readiness_check: app_start_timeout_sec and increasing failure_threshold and timeouts etc in case this was on the edge, but judging by the logs, the instance doesn't even begin to get booted (due to the VM not being allocated).

I tried re-deploying the previous version again.

I tried stopping the current version (which previously was "SERVING" but with 0 instances) and then deploying, but this doesn't help. So at this point I'm deploying over nothing running at all in my project, confirming it cannot be quotas.

I noticed in my service logs though, seemingly inconsistent reports of the number of instances. This doesn't make sense because the are NO instances running either before or after deployment.

2022-03-12 13:09:55.422 GMT
The number of running VMs for version 20220211t141828 changed from 2 to 1
2022-03-12 13:10:19.089 GMT
The number of running VMs for version 20220211t141828 changed from 1 to 3
2022-03-12 13:10:25.919 GMT
The number of running VMs for version 20220211t141828 changed from 3 to 4
2022-03-12 13:10:41.160 GMT
The number of running VMs for version 20220211t141828 changed from 4 to 2

I tried deploying the app to a different service name, (and was going to change my dispatch to reroute to that) but that service deployment failed with the same error.

The status pages look OK. 

I've tried --verbosity=debug which didn't reveal any extra info.

I've read every post I can find (including my own previous post in this group where is was a "prerequesite" error caused by quotas), and the only thing I seem to be left with is migrating my app to a new project in a more reliable zone like us-central? However this will be a lot of work as I'm using GCS, Functions, and networking to other providers which will all have to be migrated.

Is there any way to get more detailed information on the resource problem?

thanks
Pip Jones

Pip Jones

unread,
Mar 13, 2022, 1:09:22 PM3/13/22
to Google App Engine
FYI. The site automatically restarted the instances after 18 hours of outage at about 6am GMT.

So I presume it simply was the zone being exhausted and nothing to do with my specific instances, and my 10 hours of trying to fix it was all in vain. Next time I won't bother.

Andres Fiesco Casasola

unread,
Mar 14, 2022, 1:55:09 PM3/14/22
to Google App Engine

Pip Jones

unread,
Mar 14, 2022, 3:42:12 PM3/14/22
to Google App Engine
Yes that's correct (as I mentioned in my last paragraph). But the difficulty there is App Engine doesn't allow you to just move a VM into another region - you have to create a  new project and rebuild the infrastructure: App Engine, GCS, Cloud Functions, PubSub, Networking, Service Accounts Permissions, enabled APIs, DNS, Billing etc... I'm doing that now and looking at using something like Terraform to automate it.

Lluis Munoz Ladron de Guevara

unread,
Mar 15, 2022, 9:29:43 AM3/15/22
to Google App Engine
If you suspect that you might be affected by an outage I encourage you to check the Google Cloud Status Dashboard. However please note that if the incident is affecting a limited amount of users it might not be shown in the Dashboard, more details can be found here

If you suspect that might be affected by an incident on GCP side please file a support case


Reply all
Reply to author
Forward
0 new messages