Using Outlier detection with large number of downstream hosts

12 views
Skip to first unread message

Manvendra Singh

unread,
Nov 24, 2022, 1:00:23 PM11/24/22
to envoy-users
Hey folks,
Currently we are using host level envoy's for data plane , A normal cluster have around 500,600 nodes ( and thus 500,600 ) envoy's running in the cluster. The issue that we are running into is that, even if a service have a high RPS to trigger outlier detection, due to the way that clients are spread out in cluster ( it's normal for a service to receive calls from 200 different downstreams ) , Envoy's outlier detection is not able to trigger.  

The issue is that even if a service ( envoy cluster) have a high throughput ( 10k rps) , due the way clients are distributed throughout the cluster , individuals envoy's process ( on downstream) are hardly seeing more then 3,4 rps, which makes the outlier detection quite unreliable since, in a lot of cases , even if the total errors for a service is quite high, the bad backends are not removed as they are just under the thresholds ( 5/10 = 0.5 rps)

Thanks,
Manvendra

Yan Avlasov

unread,
Nov 28, 2022, 3:06:54 PM11/28/22
to Manvendra Singh, envoy-users
Just to clarify, are you running Envoy's as sidecars to your services?

--
You received this message because you are subscribed to the Google Groups "envoy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to envoy-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/envoy-users/94ae0f7b-52b3-452c-a8ee-1cd75c8b42dfn%40googlegroups.com.

Manvendra Singh

unread,
Nov 28, 2022, 3:14:10 PM11/28/22
to Yan Avlasov, envoy-users
No, we run envoy as a host service. clients connect to the local envoy using various interfaces ( envoy-service-port with destination service as a header or using a dedicated proxy_port for each service). Also envoy is available on egress path only, so the connection flow looks like this 

Client --->Envoy (Running on same host as client) ----> Server ( mostly some pod running on k8s)

Yan Avlasov

unread,
Dec 1, 2022, 10:34:04 AM12/1/22
to Manvendra Singh, envoy-users
Sorry, I have no suggestions. Your set-up is very similar to Istio service mesh. You may want to ask on Istio forums for solutions.

Reply all
Reply to author
Forward
0 new messages