I've been playing with xDS client side and noticed an odd occurrence handling a massive scale-down on the server side. If an entire locality of an endpoint disappears, the gRPC client does not remove any of the hosts from the lb. I've seen issues on both the golang client and java client. For example, if us-east-1c disappears, the client side keeps trying to connect to it.
2023/05/17 23:56:24 INFO: [xds] [xds-client 0xc00083cdc0] ADS response received: {
"versionInfo": "2023-05-17T22:41:11Z/3164",
"resources": [
{
"@type": "
type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",
"clusterName": "outbound|6565||myservice.svc.cluster.local",
"endpoints": [
{
"locality": {
"region": "us-east-1",
"zone": "us-east-1c"
}
Then I remove 1a
2023/05/17 23:58:32 INFO: Resource with name: outbound|6565||myservice.svc.cluster.local, type: *endpointv3.ClusterLoadAssignment, contains: {
"clusterName": "outbound|6565||myservice.svc.cluster.local",
"endpoints": [
{
"locality": {
"region": "us-east-1",
"zone": "us-east-1a"
},
I then see a lot of
2023/05/18 00:01:43 INFO: [xds] [weighted-target-lb 0xc000eac660] Balancer state update from locality {"region":"us-east-1","zone":"us-east-1c"}, new state: {ConnectivityState:TRANSIENT_FAILURE Picker:0xc0005439c0}
2023/05/18 00:01:44 WARNING: [core] [Channel #1 SubChannel #12] grpc: addrConn.createTransport failed to connect to {
"Addr": "IP_HERE:6565",
"ServerName": "myservice.svc.cluster.local:6565",
"Attributes": {},
"BalancerAttributes": {},
"Type": 0,
"Metadata": null
}. Err: connection error: desc = "transport: error while dialing: dial tcp IP_HERE:6565: i/o timeout"
2023/05/18 00:01:44 INFO: [core] [Channel #1 SubChannel #12] Subchannel Connectivity change to TRANSIENT_FAILURE
2023/05/18 00:01:44 INFO: [balancer] base.baseBalancer: handle SubConn state change: 0xc0007d0020, TRANSIENT_FAILURE
2023/05/18 00:01:44 INFO: [xds] [weighted-target-lb 0xc000eac660] Balancer state update from locality {"region":"us-east-1","zone":"us-east-1c"}, new state: {ConnectivityState:TRANSIENT_FAILURE Picker:0xc00076efc0}