Should create a separate Hystrix Thread pool for each remote call?.

Matli Pushpa Sekhar Reddy

unread,

May 14, 2016, 1:16:05 PM5/14/16

to HystrixOSS

Hi Hystrix Gurus,

Have a question on Hystrix thread pool configuration for remote calls made to a given dependency.

We are using Hystrix in our Microservices environment for executing the remote calls from upstream(Edge Service) to downstream(Middle Services) dependencies. We have several independent remote calls (upto 5) made from a given upstream service to each downstream dependency(Edge Service -> Middle Service). We have grouped these remote calls to each downstream dependency in a separate HystrixCommandGroupKey and are using HystrixThreadpoolKey to define a thread pool for each remote call with 'coreSize', 'QueueSize' etc based on the formula outlined here

https://github.com/Netflix/Hystrix/wiki/Operations

Now the question is which of the following option is recommended for handling hystrix thread pool?. Or is there a better approach then the following two?.

Each remote call to downstream dependency be configured with a 'separate hystrix thread pool' and its own pool config. The advantage of this approach is, each call(to a given dependency) will have its own thread pool, and heavy loading of one remote call isolates other calls failure. The downside of this approach, in our case, is we may end up having several different thread pools(upto 5) for a given downstream dependency.

OR

Have one thread pool for all remote calls to a given downstream dependency(So, all remote calls share the same thread pool) and configure the thread pool that caters to peak load of all the remote calls to a given downstream dependency. The advantage of this is, simple config and one thread pool for one dependency. The downside of this is, if one remote call is heavily used, it exhausts the thread pool and other remote calls would fail.

Thanks in advance for your help.

Thanks

Sekhar

Matt Jacobs

unread,

May 17, 2016, 7:03:45 PM5/17/16

to HystrixOSS

Sekhar -

Good question - here's what I've observed from going through this exercise many times at Netflix.

1) If you call a service via multiple commands, then it's likely that those commands' failure rates are correlated. i.e. If you're configuring Edge Service A to call Midtier Service B with 5 different commands, then any general problem on Service B manifests as excess failure rate/latency on all 5 commands simultaneously. If they are correlated, then providing a separate thread pool for each isn't providing any extra resilience.

2) We generally tune thread pools for something like 99.99% success rate and 0.01% rejection in steady-state. If you tune all 5 commands separately in this manner, then the sum of those threadpools will be greater than if you placed them together, then tuned for 99.99%. This is because the chances that all 5 commands are simultaneously at their 99.99p latency is (0.0001)^5. Put another way, summing the distributions will allow fewer overall threads to handle outlier latency.

So my general rule is to combine commands that hit the same system into the same threadpool, unless you have a good reason not to. One such good reason is that a certain command is much more business-critical, or has no decent fallback. If you need 4 9's on Command A, but 2 9's is fine for Commands B-E, then it likely makes sense to move Command A into a dedicated threadpool so that it has better guarantees about its success rate.

Hope that helps!

-Matt

Sambit Dixit

unread,

Jan 5, 2019, 1:57:07 AM1/5/19

to HystrixOSS

@matt jacobs
I have a setup where we use hystrix to failover requests to a service version if primary version is erroring out. Basically i have 10 services and each having 3 versions. Right now im creating a command as a combination of service and version as command key. It means i have 30 commands created and each of these commands will internally have fallback strategcy to fallback to other versions. For example if call to seevice A and version 1 fails, then in command ServiceAv1 it will fallback to ServiceAv2

In this approach what i want to achieve is isolate issues related to latency and cricuit breaker open to be per service version so that if issue is there in one api call to a service version eg lets say ServiceAv1 then any call to SeeviceAv2 or ServiceBv1 etc should nit be impacted or starved for resesources eg thread availability.

In this scenario what is the best approach to configure threadpool. One default threadpool of 200 as maxSize assuming 200 is overall QPS across commands. Or having a threadpool for each command ie serviceVersion combination and have a core size of 20 threads since thats the avg QPS for service version at a time where as across service version is 200 QPS ie across 30 service version calls it is around 200 QPS.

Which approach will be better. Setting a default threadpool with core max size of 200 or having small threadpools of 20 coresize and one threadpool for command and each command is nothing but a service version specific endpoint.

Regards
Sambit

Reply all

Reply to author

Forward