Google Cloud: Health Check is not removing failed instance from UDP Internal Load Balancer

308 views
Skip to first unread message

Oussama Hammami

unread,
Feb 1, 2018, 9:00:33 AM2/1/18
to gce-discussion


Hi,

I'm working in a project to move our SIP infra. to GCP.

I'm using an UDP Internal load balancer with a private IP to route calls from Asterisk to my Kamailio SBC, Asterisk is configured with the address IP of the load balancer as a single outgoing endpoint.




my Internal UDP Load Balancer operate on 5060 Frontend, a Backend with 2 SBC and basic http Health Check on port 80.

On each kamailio SBC I have my application listing on port 5060 and apache server on port 80 for health check so stopping httpd change the status of an instance to unhealthy.

forwarding-rules

    # gcloud compute forwarding-rules describe ip-gateway-internal-lb-local-fontend --region=europe-west3
   
IPAddress: 10.156.0.15
   
IPProtocol: UDP
    backendService
: https://www.googleapis.com/compute/v1/projects/My-Project/regions/europe-west3/backendServices/My-gateway-internal-lb-bservices
    creationTimestamp
: '2018-01-30T10:20:19.564-08:00'
    description
: ''
    id
: 'XXXXXXXXXXXXX'
    kind
: compute#forwardingRule
    loadBalancingScheme
: INTERNAL
    name
: ip-gateway-internal-lb-local-fontend
    network
: https://www.googleapis.com/compute/v1/projects/My-Project/global/networks/default
    ports
:
   
- '5060'
    region
: https://www.googleapis.com/compute/v1/projects/My-Project/regions/europe-west3
    selfLink
: https://www.googleapis.com/compute/v1/projects/My-Project/regions/europe-west3/forwardingRules/ip-gateway-internal-lb-local-fontend
    subnetwork
: https://www.googleapis.com/compute/v1/projects/My-Project/regions/europe-west3/subnetworks/default

backend-service

    # gcloud compute backend-services describe My-gateway-internal-lb-bservices --region=europe-west3
    backends
:
   
- balancingMode: CONNECTION
      description
: ''
     
group: https://www.googleapis.com/compute/v1/projects/My-Project/zones/europe-west3-a/instanceGroups/My-gateway-1xx
    connectionDraining
:
      drainingTimeoutSec
: 0
    creationTimestamp
: '2018-01-30T10:15:10.688-08:00'
    description
: ''
    fingerprint
: XXXXXXXXX
    healthChecks
:
   
- https://www.googleapis.com/compute/v1/projects/My-Project/global/healthChecks/basic-check-internal-http
    id
: 'XXXXXXXXX'
    kind
: compute#backendService
    loadBalancingScheme
: INTERNAL
    name
: My-gateway-internal-lb-bservices
    protocol
: UDP
    region
: https://www.googleapis.com/compute/v1/projects/My-Project/regions/europe-west3
    selfLink
: https://www.googleapis.com/compute/v1/projects/My-Project/regions/europe-west3/backendServices/My-gateway-internal-lb-bservices
    sessionAffinity
: NONE
    timeoutSec
: 3


health-check


    # gcloud compute health-checks describe basic-check-internal-http
    checkIntervalSec
: 3
    creationTimestamp
: '2018-01-31T01:13:25.030-08:00'
    description
: ''
    healthyThreshold
: 2
    httpHealthCheck
:
      host
: ''
      port
: 80
      proxyHeader
: NONE
      requestPath
: /
    id: 'XXXXXXXXXXXXXXXXXXXX'
    kind: compute#healthCheck
    name: basic-check-internal-http
    selfLink: https:/
/www.googleapis.com/compute/v1/projects/My-Project/global/healthChecks/basic-check-internal-http
    timeoutSec
: 3
    type
: HTTP
    unhealthyThreshold
: 2


All timeout are set to 3s, Internal UDP LB route rules done by the session affinity (the persistence) are not removed immediately, it takes about 15 min (without any traffic) to be removed.

Same case when an instance is healthy again, it takes 15 min to be considered by the LB and start receiving traffic.

I didn't had this problem when I was using an UDP load balancer with an External address IP, because my asterisk address sending the traffic are nated so the 5-tuple hash will be different for each call.

But with an UDP LB using an Internal IP the 5-tuple hash will be always the same (same src/dst IP:Port) so how I can configure the timeout of the session affinity (persistence) rules or force flushing the memory of my LB.

Maybe it's a Bug ! Has anyone run into the same problem?
Thanks and looking forward if any someone can help me out with this issue ?

BR
/Oussama

Marcel Manz

unread,
Feb 1, 2018, 2:05:47 PM2/1/18
to gce-discussion

Hi Oussama

I think you are affected by the same bug as we are. In our case we have a syslog server receiving UDP-input via an internal LB and it keeps receiving traffic even the instance is marked unhealthly. I reported this to Cloud Support and they opened a case. Please star following issue and comment there as well:

Google Cloud Support has added a comment to your Support Case #14732682 - Loadbalancer forwards UDP traffic to unhealthy instance:

Hello Marcel, I've consulted to the specialist team and it is a known issue which they are already working in. There isn't a ETA of when it will be done, so I would recommend the workaround that we talked about with the TCP load balancer if it is affecting the connection. You can follow the updates about the investigation in https://issuetracker.google.com/issues/72491707


BR

Marcel

Oussama Hammami

unread,
Feb 2, 2018, 9:02:53 AM2/2/18
to gce-discussion
Hi Marcel,

Thank you very much for taking the time to answer my question.

I've added a comment indicating that we are affected by the same issue. hope it will be soon fixed.

BR
/Oussama
Reply all
Reply to author
Forward
0 new messages