Backend service - connection draining

467 views
Skip to first unread message

Pavel Madr

unread,
Sep 13, 2017, 8:59:37 AM9/13/17
to gce-discussion
Hello,

I have an internal compute.v1.regionBackendService with TCP protocol. It's connected to a managed instance group with 1 instance.
The service has connection draining 45s.

There are some strange things:
1. If VM is deleted directly from Compute Engine | VM Instances page, all existing connections to the VM are immediately closed. Is it ok?

2. If VM is deleted from instance group, it keeps all connections quite well. But the VM also accepts new connections. I saw that in a log file.

Am I doing something wrong? Should I hook somehow to VM's shutdown event and force a health check to fail?

Carlos (Cloud Platform Support)

unread,
Sep 13, 2017, 3:45:05 PM9/13/17
to gce-discussion
Hi Pavel,

1- I think the behavior is expected. As per the article the connection drain feature will be triggered by a deleteInstance() group action which translate in the following HTTP request

POST
https://www.googleapis.com/compute/v1/projects/project/zones/zone/instanceGroupManagers/instanceGroupManager/deleteInstances


When you delete the VM from the console the HTTP request is not performed by a group action. i.e.

DELETE https://www.googleapis.com/compute/v1/projects/example-project/zones/us-central1-f/instances/example-instance



2- I do not believe you need any additional configuration.

I tried to reproduce the behavior but in my case it worked as expected. In my test I used netcat in the backend VM to listen in one particular port (5000). I also performed a packet capture (tcpdump port 5000) in the VM to verify incoming connections.

I removed the VM from the managed group using the console. I noticed the following: it took about 5 minutes to fully remove the VM from the instance group. During those 5 minutes, when I tried to connect to the internal load balancer on port 5000, I could see the incoming connection in the tcpdump. When my Cloud Console finally did not show the VM, I could not see new incoming connections. I believe your logs might show new connections since the instance could have been still part of the managed instance group during that time. The best way to test would be by capturing packets in the backend and monitoring the console to verify that the VM has been fully removed.

Pavel Madr

unread,
Sep 14, 2017, 2:22:52 AM9/14/17
to gce-discussion
Hello Carlos,

Thank you for you explanation. Topic 1 is ok.

Topic 2 is still not very clear for me. I thought that removing VM from instance group marked that VM as unusable for new connections. Hence VM should not accept incoming connections during that 5 minutes.
So I don't understand a meaning of connection draining duration. How does it effect 5 minutes in your case? Is the draining period included in them? Is VM's shutdown postponed by that period? Or is it the period that VM still lives after its complete removing from the instance group?

Carlos (Cloud Platform Support)

unread,
Sep 14, 2017, 5:02:24 PM9/14/17
to gce-discussion
Hi Pavel


On Thursday, September 14, 2017 at 12:22:52 AM UTC-6, Pavel Madr wrote:
Hello Carlos,

Thank you for you explanation. Topic 1 is ok.

Topic 2 is still not very clear for me. I thought that removing VM from instance group marked that VM as unusable for new connections.

Your statement is right, I am sorry I was no able to explain myself.
 
Hence VM should not accept incoming connections during that 5 minutes.
So I don't understand a meaning of connection draining duration. How does it effect 5 minutes in your case? Is the draining period included in them? Is VM's shutdown postponed by that period? Or is it the period that VM still lives after its complete removing from the instance group?

No, the draining period is not included on those 5 minutes. What I think is that yesterday I experienced a long delay and it took about 5 minutes for the VM to be taken out of the instance group. GCP is a distributed platform and some commands might take a couple of minutes to fully commit. After the 5 minutes the draining period started to count. 

I repeated the test today. After I clicked the button to abandon my VM from the group, it still took about two minutes to fully commit (Seeing the update in the cloud console showing no VM in the IMG). Again, I think that during those two minutes my VM was still part of the group.Therefore still receiving new connections. After the command was committed no new connections were received and established connections hold for what I believe was the time set on the connection draining feature. 

Working with multiple windows and issuing different connections requests did not allow me to track accurately the times. I believe you will have more visibility on the behavior if you test using a larger timeout on the connection draining setup.

Reply all
Reply to author
Forward
0 new messages