wrong work outlier detection

38 views
Skip to first unread message

Boris

unread,
Jul 28, 2025, 3:34:33 AMJul 28
to envoy-users
Hello
I have a question about Outlier detection.
According to the documentation, the consecutiveGatewayErrors field - is number of gateway errors before a host is EJECTED from the connection pool.
In some conditions, the removal of a failed host from balancing occurs not after the specified number of errors in the consecutiveGatewayErrors field, but after a longer number of errors.

Example for istio:

Steps for reproduce:
1) unpack the archive:
```
tar -zxvf circuitbreaker_error_1.tgz
```

2) create namespace:
```
kubectl apply -f circuitbreaker_error_1/namespace.yaml
```
3) install helm-chart:
```
helm upgrade --install circuitbreaker ./circuitbreaker_error_1/  -n circuitbreak-ns --disable-openapi-validation
```

4) find ip-address of circuitbreaker-cb service:
```
kubectl get service -n circuitbreak-ns | grep circuitbreaker-cb # for example output: circuitbreaker-cb          ClusterIP   10.101.216.10    <none>        80/TCP    3m2s
```

let's say ip-address is 10.101.216.10

5) find name of pod:
```
kubectl get pods -n circuitbreak-ns | grep circuitbreaker-curl-test # for example output: circuitbreaker-curl-test-5bd5db75c8-blknw   2/2     Running   0          10m
```

let's say pod-name is circuitbreaker-curl-test-5bd5db75c8-blknw

6) start cyclical call curl-command from pod circuitbreaker-curl-test:
```
kubectl exec -it circuitbreaker-curl-test-5bd5db75c8-blknw -n circuitbreak-ns -- bash # connect to pod circuitbreaker-curl-test
while true; do date '+%H:%M:%S.%N' | tr -d '\n' && echo ' ' | tr -d '\n' && curl -s -o /dev/null -w "%{http_code} - " 10.101.216.10/cb; echo ""; sleep 0.3; done
```
here:
circuitbreaker-curl-test-5bd5db75c8-blknw - pod-name got on step 5
10.101.216.10 - ip-address of circuitbreaker-cb service got on step 4

now we can see that the first time the number of 504 responses is 3, and the second and subsequent times more than 3 (usually 6-7)

(and if you set sleep 0.1, then the number of 504 becomes more, about 10):


13:51:47.654364655 504 -
13:51:48.138242219 504 -
13:51:48.458132684 504 - # first eject after 3rd error
13:51:48.848290380 503 - # node successfully ejected
13:51:49.214445804 503 -
13:51:49.578717342 503 -
13:51:49.889832045 503 -
13:51:50.205591815 503 -
13:51:50.518209774 503 -
13:51:50.835189384 503 -
13:51:51.156772362 503 -
13:51:51.467665356 503 -
13:51:51.784570410 503 -
13:51:52.103821062 503 -
13:51:52.422360591 503 -
13:51:52.740877626 503 -
13:51:53.060479926 503 -
13:51:53.377271233 503 -
13:51:53.690316806 503 -
13:51:54.012977750 504 -
13:51:54.341917621 504 -
13:51:54.674083544 504 - # second eject must be after 3rd error but
13:51:54.999237352 504 - # eject not happened
13:51:55.376940911 504 - # and errors continue
13:51:55.707559580 503 - # only here node ejected (+2 extra errors)
13:51:56.021592779 503 -
13:51:56.331106068 503 -
13:51:56.641656580 503 -
13:51:56.959535086 503 -
13:51:57.273492888 503 -
13:51:57.592223290 503 -
13:51:57.905952860 503 -
13:51:58.221809715 503 -
13:51:58.542112303 503 -
13:51:58.856106757 503 -
13:51:59.165310981 503 -
13:51:59.487668697 503 -
13:51:59.810092022 503 -
13:52:00.123043221 503 -
13:52:00.436563766 503 -
13:52:00.752536958 503 -
13:52:01.069240086 503 -
13:52:01.385609095 503 -
13:52:01.700142085 503 -
13:52:02.010799504 503 -
13:52:02.330169037 503 -
13:52:02.649629853 503 -
13:52:02.967611889 503 -
13:52:03.282549729 503 -
13:52:03.605739171 503 -
13:52:03.924744586 503 -
13:52:04.237313468 503 -
13:52:04.550787021 503 -
13:52:04.863660469 503 -
13:52:05.179390842 504 -
13:52:05.504509983 504 -
13:52:05.827119898 504 -
13:52:06.154289897 504 - # same problem
13:52:06.470791941 504 -
13:52:06.794727709 504 -
13:52:07.122505288 503 -
13:52:07.437308588 503 -
13:52:07.752739993 503 -
13:52:08.066411713 503 -
13:52:08.386081707 503 - 
circuitbreaker_error_1.tgz

Boris

unread,
Aug 5, 2025, 5:17:50 AMAug 5
to envoy-users
Can anyone explain this behavior?

понедельник, 28 июля 2025 г. в 10:34:33 UTC+3, Boris:

Yan Avlasov

unread,
Aug 12, 2025, 4:05:01 PMAug 12
to Boris, envoy-users
I suggest filing an Issue on GitHub to see if anyone is interested in debugging this behavior.

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/envoy-users/6b8e76d0-ff83-4f56-92f2-293acfb5be2dn%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages