Sporadic server error since March 1 (Python)

92 views
Skip to first unread message

Viktor Bresan

unread,
Mar 3, 2022, 10:57:03 AM3/3/22
to Google App Engine
Since two days ago I have started to experience sporadic server errors. I.e. for a period of time, everything is working OK, and then for a period of time all calls to my python scripts are failing.

The following message is displayed in browser:

     Error: Server Error
     The server encountered an error and could not complete your request.

     Please try again in 30 seconds.

Log files are showing latency of 10 seconds on failed calls.

This started happening after March 1, and code or configuration weren't modified weeks before that.

Any ideas?

Thanks in advance for your help!

Rogelio Monter Rodriguez

unread,
Mar 3, 2022, 6:14:43 PM3/3/22
to Google App Engine

The error you’re sharing could be caused by many different reasons. Please check Logs Viewer in Google Cloud, as shown in this Using Logs Viewer documentation, and share what you see for further troubleshooting. You could also troubleshoot the latencies from Cloud Trace.

Viktor Bresan

unread,
Mar 5, 2022, 12:11:16 PM3/5/22
to Google App Engine

Many thanks for your tips! The trace is not available for failed request, for all other it is. I haven't pasted log earlier, perhaps I should, because (in my opinion) it isn't showing anything. Here it is (attached).

Meanwhile, I have switched to instance class F2 and so far the problem did not happen again. I don't have latency longer than 3s, when new process is started. Though I don't think the problem should have happened earlier, or F1 is useless.


log.txt

David (Cloud Platform Support)

unread,
Mar 7, 2022, 4:59:04 PM3/7/22
to Google App Engine
Glad to hear that switching to a F2 instance class fixed the issue. It’s hard to say whether the issue was caused by a lack of resources even after looking at the log you provided. Which is why I would recommend you to contact GCP support If this issue happens again even after having upgraded instance type, since they can inspect your GAE service and provide you with more useful information.

Viktor Bresan

unread,
Mar 8, 2022, 6:33:43 PM3/8/22
to Google App Engine
I had similar problem few months ago when suddenly extra instances were created for another app that wasn't receiving any extra traffic. And I am not the only one who experienced that. Something is obviously happening on the google side, it's ridiculous that I can't serve a simple request with F1 instance.

And it would be ridiculous to spend $29 a month for support while the total running cost of my apps is $0.08 a month.

Horace (Cloud Platform Support)

unread,
Mar 9, 2022, 6:26:59 PM3/9/22
to Google App Engine

Thank you for sharing your experience with us. You are saying that your machine workload does not match the machine type requirement and that you didn’t expect to upgrade to a class F2 instance. I will try to help.

There are some ways to mitigate this error message [1] such as:

  1. Predict spikes and preemptively load instances. Warmup requests [2] are designed specifically to combat situations that involve predictable frequent sudden spikes. Warmup requests would "know" when they expect spikes so it preemptively load up instances for you to avoid cold booting during spikes.

  2. Make cold boots faster [3]. You can make cold boot loading faster by having less complex code with less libraries that need to be loaded. There is an interesting article [4] on how to improve the loading performance.

  3. Provisioning more resources is one of the easier solutions such as idle instances to avoid cold boots, however this might not be ideal as it could increase your costs.

  4. Retry strategy if your app can accept x amount of transient failures (within our SLO), then you can simply catch those failures with a retry and your app can function without any issue.

Lastly, I would like to concur David’s suggestion to raise an official case with the GCP support [5] as the error message [1] can be due to multiple reasons (e.g sudden spiky traffic, backend issues, etc) and we do have the tools to diagnose such issues. With the right diagnosis of your case, we can determine the recommendations which might avoid resorting to changing the machine type and increasing your costs.


[1] "logMessage": "Request was aborted after waiting too long to attempt to service your request."

[2] https://cloud.google.com/appengine/docs/standard/go111/configuring-warmup-requests

[3] https://cloud.google.com/appengine/docs/standard/go/how-instances-are-managed#loading_requests

[4] https://cloud.google.com/blog/products/gcp/best-practices-for-app-engine-startup-time-google-cloud-performance-atlas

[5] https://cloud.google.com/support-hub

Viktor Bresan

unread,
Mar 10, 2022, 3:52:37 AM3/10/22
to Google App Engine
In this case cold boot can not be made any faster by making code simpler. It's just a simple db (db has less than 10 rows) lookup and response. If that is something that F1 instance is not able to serve from cold boot - it is useless.

I would also like to add that for such simple queries (that are also called few times a day, without any predictable occurrences or spikes) there is no point in wasting resources and keeping the instance active.

I have already commented on paying for support. This time I will add that it's a know fact that quality assurance for all products that come from google is actually done by users. That's why the support in cases such is this should be free.

Over and out.
Reply all
Reply to author
Forward
0 new messages