Load Balancer cookie session affinity debugging.

2,687 views
Skip to first unread message

Michael March

unread,
Jun 28, 2018, 12:33:33 PM6/28/18
to gce-discussion
I have an application using a load balancer with the following configuration:

Session affinity: 

Generated cookie

 
Affinity cookie TTL: 

43200 seconds

 
Connection draining timeout: 

300 seconds

 
Security policy: 

None

 
Identity-Aware Proxy: 

disabled

My users are, within minutes sometimes, being bounced back and forth between nodes.

What are the best tools in the Google Cloud Console to debug this problem?


The

Michael March

unread,
Jun 28, 2018, 5:37:39 PM6/28/18
to gce-discussion
More info:

I set up a static app on four nodes.. The only thing it does is report it's node name.  

That app does not stay on same node for more than a few minutes.

Michael March

unread,
Jun 29, 2018, 12:12:33 PM6/29/18
to gce-discussion
Also, the heath checks seem to be happening four time per second, per host. Is that normal?


On Thursday, June 28, 2018 at 9:33:33 AM UTC-7, Michael March wrote:

Dinesh (Google Platform Support)

unread,
Jun 30, 2018, 2:50:40 PM6/30/18
to gce-discussion
Did you try to find some clue from the Stack driver logs[1] of back-end services and Cloud Load Balancer? It seems your application users are losing affinity session frequently.  In general, a user can lose affinity with the back-end instance in case instance runs out of capacity or health check fails or auto-scaling adds/removes instances from the instance group as described in this documentation[2]. You may want to check and verify all these causes. 

Regarding your health check query, you may want to view used health check configuration from the cloud console or gcloud command as described in this documentation[3]. Generally, health check interval must be an integer from 1 to 300 seconds. I am not sure how you getting four times per seconds, per host as this looks bit strange. If you consider that is an issue or potential bug, you may raise this on public issue tracker[4] platform as that is meant for bugs and feature request tracking.   

I hope above information helps.

Regards,

Michael March

unread,
Jul 1, 2018, 1:13:00 PM7/1/18
to gce-discussion
I was able to fix the check frequency.  It checks the services a lot less now, but when it does a check it still does A LOT of checks in a cluster, then stops.

Like I said in one of my previous posts, I've replaced my application with four web servers that serve static pages and the session still keep changing.

Dinesh (Google Platform Support)

unread,
Jul 2, 2018, 12:30:23 PM7/2/18
to gce-discussion
Fortunately, your message is not clear. Can you please elaborate what do you mean by "but when it does a check it still does A LOT of checks in a cluster, then stops".  From your post, I understand your query regarding health check issue has been resolved. Please let me know if I misunderstood or you still have some further questions?

Regarding LB session affinity issue, Did you find anything in suggested stack driver logs for the back-end services and Cloud Load Balancer? Have you checked and verified all suggested causes(Backend instance capacity, health check failure, Auto-scaling adding/removing instance etc. ) for losing affinity session?

Michael March

unread,
Jul 2, 2018, 12:45:37 PM7/2/18
to gce-discussion


See my responses below.

On Monday, July 2, 2018 at 9:30:23 AM UTC-7, Dinesh (Google Platform Support) wrote:
Fortunately, your message is not clear. Can you please elaborate what do you mean by "but when it does a check it still does A LOT of checks in a cluster, then stops".  From your post, I understand your query regarding health check issue has been resolved. Please let me know if I misunderstood or you still have some further questions?


I mean the heath checks are less frequent. It was 4 per second per node now I'm at 4 per seconding for 3 seconds (per node) then a 15 second pause between each burst. Does that make more sense?
 
Regarding LB session affinity issue, Did you find anything in suggested stack driver logs for the back-end services and Cloud Load Balancer? Have you checked and verified all suggested causes(Backend instance capacity, health check failure, Auto-scaling adding/removing instance etc. ) for losing affinity session?

So I'm not sure where in Stackdriver to check for that. I can confirm (because I'm running a static web server for each node):

1) The scaling group is static, no nodes are being added.
2) The instances have NO load on them.
3) The health checks are not failing (from the health check logs POV).

Dinesh (Google Platform Support)

unread,
Jul 3, 2018, 6:22:16 PM7/3/18
to gce-discussion
You can check Stackdriver logs for back-end and cloud load balancer by following previously given documentation[1]. Going forward, If you consider client sessions losing affinity is a bug, please raise that on public issue tracker[2], as suggested in my previous post as Google Group is mainly for product-related discussion. 

Zhou Liang

unread,
Jan 22, 2020, 1:17:27 PM1/22/20
to gce-discussion
I'm running into the same issue. We configured the "Generated cookie" session affinity and set Affinity cookie TTL to 0 (use session cookie),  but we found that the LB sometimes route the traffic to different instances. I can confirm the GCLB cookie with the same value get passed in every request. I don't see a good way to debug. From the Cloud load balancer stackdriver logs, I don't see any cookie, not sure if they were not received or not displaying. Any idea?

Gagandeep Toor

unread,
Jan 23, 2020, 8:37:59 AM1/23/20
to gce-discussion
For GENERATED_COOKIE, If a cookie is not present in a request, the proxy picks a backend without affinity applied, but adds a Set-Cookie header to the response, with the cookie value based on the IP addresses of the client and selected backend. The default value 0 means cookie will not expire. 
Reply all
Reply to author
Forward
Message has been deleted
0 new messages