HTTP Cloud load Balancer 502 intermittent responses , doesn’t let request reach our backend services

Jorge Barrachina

unread,

Mar 10, 2017, 8:00:37 AM3/10/17

to Google App Engine

Description of the problem :

HTTP Cloud load Balancer 502 intermittent responses , doesn’t let request reach our backend services. It seems that this problem rise up randomly . Maybe related with https://groups.google.com/forum/#!topic/google-appengine-downtime-notify/C_fCwHb73wc

issued on February, 4th?

Also, we were looking for alternative solutions and we’ve found this post , that maybe has some correlation with this issue, and bring some light to find what happened

https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340#.rw4tbv6gl

Issue date periods:

March 9th, from 14:12:29.697 ( UTC/GMT +1 hour ) till March 10th, 09:28:36.085 ( UTC/GMT +1 hour )

Projects affected :

oa-staging

Services affected:

clientaccount ( versions: release-2017-10-a , release-2017-8-a , release-2017-07-a)

Service url :

https://clientaccount-dot-oa-staging.appspot-preview.com

Deployment Details of our stack :

app.yaml

api_version: 1

service: clientaccount

runtime: python

env: flex

entrypoint: gunicorn 'client_account.wsgi:load_app("prod")'

runtime_config:

python_version: 3

automatic_scaling:

min_num_instances: 1

max_num_instances: 5

cool_down_period_sec: 120 # default value

cpu_utilization:

target_utilization: 0.8

requirements.txt

bingads==v10.4.11

boto3==1.3.0

botocore==1.4.10

docutils==0.12

Flask==0.10.1

Flask-Cors==2.1.2

Flask-Security==1.7.5

flask-swagger==0.2.12

gcloud==0.13.0

google-api-python-client==1.5.0

googleads==5.0.0

gunicorn==19.4.1

itsdangerous==0.24

Jinja2==2.8

jmespath==0.9.0

MarkupSafe==0.23

mongoengine==0.10.6

pymongo==3.2.2

python-dateutil==2.5.2

PyYAML==3.11

recurly==2.4.2

requests==2.9.1

sendgrid==3.0.1

six==1.10.0

Werkzeug==0.11.4

wheel==0.24.0

Transaction Example Details :

If you want to check the transaction on Google Cloud Console , here is the link to check it out :

https://console.cloud.google.com/logs/viewer?project=oa-staging&hl=es&minLogLevel=0&expandAll=false&resource=http_load_balancer&advancedFilter=resource.type%3D%22http_load_balancer%22%0Aresource.labels.zone%3D%22global%22%0Aresource.labels.project_id%3D%22oa-staging%22%0Atimestamp%3D%222017-03-09T15:35:43.842872262Z%22%0AinsertId%3D%221snolhbg22kclfq%22&timestamp=2017-03-09T15:35:43.842872262Z

GET https://clientaccount-dot-oa-staging.appspot-preview.com/check_token/**********LONG_TOKEN*************

This request arrives to the HTTP Cloud Load Balancer at : 16:35:43.842 (Madrid Time)

Trace Log of the HTTP Cloud Balancer

{

insertId: "1snolhbg22kclfq"

jsonPayload: {

statusDetails: "failed_to_connect_to_backend"

@type: "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry"

}

httpRequest: {

requestMethod: "GET"

requestUrl: "https://clientaccount-dot-oa-staging.appspot-preview.com/check_token/*********LONG_TOKEN*************"

requestSize: "1370"

status: 502

responseSize: "421"

remoteIp: "34.197.229.75"

}

resource: {

type: "http_load_balancer"

labels: {

url_map_name: ""

forwarding_rule_name: ""

backend_service_name: ""

target_proxy_name: ""

zone: "global"

project_id: "oa-staging"

}

timestamp: "2017-03-09T15:35:43.842872262Z"

severity: "WARNING"

logName: "projects/oa-staging/logs/requests"

}

The connection with our backend ( https://clientaccount-dot-oa-staging.appspot-preview.com ) stopped . If you look into the jsonPayload of the trace provided , it says “failed_to_connect_to_backend” so it means that our backend seems to be not processing the request arriving from the http load balancer.

This is weird , because we didn’t change our stack and it was working days before of the date reported.

Could you tell us what happened , our provide some logs to review with our team.

Thanks for your support

Adam (Cloud Platform Support)

unread,

Mar 11, 2017, 2:21:41 PM3/11/17

to Google App Engine

I've posted a response to the public issue you filed at https://issuetracker.google.com/36144028, please feel free to direct any replies there.

julien silverston

unread,

Sep 29, 2017, 7:05:25 PM9/29/17

to Google App Engine

Adam, access is denied to the issue you mentioned. Thank you.

Christian Aquino

unread,

Nov 3, 2017, 4:52:16 PM11/3/17

to Google App Engine

Hi Jorge, I was wondering if you ever got to the bottom of this issue?

Thanks,

Christian

On Friday, March 10, 2017 at 8:00:37 AM UTC-5, Jorge Barrachina wrote:

Jorge Barrachina

unread,

Nov 6, 2017, 5:04:22 PM11/6/17

to Google App Engine

Yes,

I copy the response from support team ( Sorry but this was solved long time ago, and i can't remember all the details )

-------

Getting back to your app configuration, the part which stands out the most is this:

entrypoint: gunicorn 'client_account.wsgi:load_app("prod")'

It looks like you're using gunicorn with the default config, which is to use one sync worker which limits your app to serving a single request at a time. This could cause health check pings and other requests to time out if the app is currently busy. One quick fix is to simply spawn more worker processes to handle concurrent requests eg:

entrypoint: gunicorn -w 4 'client_account.wsgi:load_app("prod")'

---------

In summary , gunicorn has to start with some workers . By default, gunicorn config launches only one . In case health_check request and a common request arrives almost at the same time , gunicorn cannot deal with both of them. That's why 5xx HTTP status codes appear.

At the time of this issue , track those kind of issues in AppEngine logs were a nightmare. Hope now is better.

Hope this answer can help you in some way ,

Cheers

julien silverston

unread,

Nov 7, 2017, 8:39:37 AM11/7/17

to Google App Engine

Hi,

I think I've solved this issue. No HTTP/502 so far.

First my loadbalancer config was wrong.
Check your named port, check your health conditions on both instances groups and loadbalancer (disable them if necesarry to troubleshoot).
I also applied the following tuning :
- enable gzip (already was)
- increase timeout in Nginx
- add GCP IP ranges to firewall rules.

Take a look to the same blog from Percy.io, thanks to him.

https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340

Cheers

Reply all

Reply to author

Forward