Circuit closed before sleepWindowInMilliseconds have passed

345 views
Skip to first unread message

Tobias

unread,
Nov 25, 2015, 5:34:54 AM11/25/15
to HystrixOSS
Hi,

I am running performance tests and testing some values on the sleepWindowInMilliseconds. What I have noticed is that it doesn't sleep for the time set for the property sleepWindowInMilliseconds.

I set it to 500000ms(5 minutes) but looking at the dashboard the circuit closes much faster. Some times after just a qouple of seconds. 

I have attached some data from the hystrix stream of a hystrix command when the circuit is open and right after a command stating it is closed but the time it was open was just a couple of seconds. What I notice is that all errorPercentage  data is cleared. Why is that? Is somehow the sleepwindow setting overridden by some other parameters? I guess that has something to do with it?

CB open

Thread:
data: {"type":"HystrixThreadPool","name":"main.thread","currentTime":1448444261526,"currentActiveCount":10,"currentCompletedTaskCount":246,"currentCorePoolSize":10,"currentLargestPoolSize":10,"currentMaximumPoolSize":10,"currentPoolSize":10,"currentQueueSize":0,"currentTaskCount":256,"rollingCountThreadsExecuted":0,"rollingMaxActiveThreads":0,"rollingCountCommandRejections":0,"propertyValue_queueSizeRejectionThreshold":5,"propertyValue_metricsRollingStatisticalWindowInMilliseconds":10000,"reportingHosts":1}

HystrixCommand:
data: {"type":"HystrixCommand","name":"main.thread","group":"main.thread","currentTime":1448444262026,"isCircuitBreakerOpen":true,"errorPercentage":100,"errorCount":48,"requestCount":48,"rollingCountBadRequests":0,"rollingCountCollapsedRequests":0,"rollingCountEmit":0,"rollingCountExceptionsThrown":48,"rollingCountFailure":0,"rollingCountEmit":0,"rollingCountFallbackFailure":0,"rollingCountFallbackRejection":0,"rollingCountFallbackSuccess":0,"rollingCountResponsesFromCache":0,"rollingCountSemaphoreRejected":0,"rollingCountShortCircuited":48,"rollingCountSuccess":0,"rollingCountThreadPoolRejected":0,"rollingCountTimeout":0,"currentConcurrentExecutionCount":10,"rollingMaxConcurrentExecutionCount":11,"latencyExecute_mean":20962,"latencyExecute":{"0":20864,"25":20917,"50":20952,"75":20989,"90":21114,"95":21148,"99":21165,"99.5":21165,"100":21165},"latencyTotal_mean":20962,"latencyTotal":{"0":20864,"25":20917,"50":20952,"75":20989,"90":21114,"95":21148,"99":21165,"99.5":21165,"100":21165},"propertyValue_circuitBreakerRequestVolumeThreshold":20,"propertyValue_circuitBreakerSleepWindowInMilliseconds":300000,"propertyValue_circuitBreakerErrorThresholdPercentage":50,"propertyValue_circuitBreakerForceOpen":false,"propertyValue_circuitBreakerForceClosed":false,"propertyValue_circuitBreakerEnabled":true,"propertyValue_executionIsolationStrategy":"THREAD","propertyValue_executionIsolationThreadTimeoutInMilliseconds":45000,"propertyValue_executionTimeoutInMilliseconds":45000,"propertyValue_executionIsolationThreadInterruptOnTimeout":true,"propertyValue_executionIsolationThreadPoolKeyOverride":null,"propertyValue_executionIsolationSemaphoreMaxConcurrentRequests":10,"propertyValue_fallbackIsolationSemaphoreMaxConcurrentRequests":10,"propertyValue_metricsRollingStatisticalWindowInMilliseconds":10000,"propertyValue_requestCacheEnabled":true,"propertyValue_requestLogEnabled":true,"reportingHosts":1,"threadPool":"main.thread"}

CB closed

Thread:
data: {"type":"HystrixThreadPool","name":"main.thread","currentTime":1448444262026,"currentActiveCount":10,"currentCompletedTaskCount":246,"currentCorePoolSize":10,"currentLargestPoolSize":10,"currentMaximumPoolSize":10,"currentPoolSize":10,"currentQueueSize":0,"currentTaskCount":256,"rollingCountThreadsExecuted":0,"rollingMaxActiveThreads":0,"rollingCountCommandRejections":0,"propertyValue_queueSizeRejectionThreshold":5,"propertyValue_metricsRollingStatisticalWindowInMilliseconds":10000,"reportingHosts":1}

HystrixCommand:
data: {"type":"HystrixCommand","name":"main.thread","group":"main.thread","currentTime":1448444262526,"isCircuitBreakerOpen":false,"errorPercentage":0,"errorCount":0,"requestCount":0,"rollingCountBadRequests":0,"rollingCountCollapsedRequests":0,"rollingCountEmit":0,"rollingCountExceptionsThrown":2,"rollingCountFailure":0,"rollingCountEmit":0,"rollingCountFallbackFailure":0,"rollingCountFallbackRejection":0,"rollingCountFallbackSuccess":0,"rollingCountResponsesFromCache":0,"rollingCountSemaphoreRejected":0,"rollingCountShortCircuited":0,"rollingCountSuccess":0,"rollingCountThreadPoolRejected":2,"rollingCountTimeout":0,"currentConcurrentExecutionCount":10,"rollingMaxConcurrentExecutionCount":11,"latencyExecute_mean":20962,"latencyExecute":{"0":20864,"25":20917,"50":20952,"75":20989,"90":21114,"95":21148,"99":21165,"99.5":21165,"100":21165},"latencyTotal_mean":20962,"latencyTotal":{"0":20864,"25":20917,"50":20952,"75":20989,"90":21114,"95":21148,"99":21165,"99.5":21165,"100":21165},"propertyValue_circuitBreakerRequestVolumeThreshold":20,"propertyValue_circuitBreakerSleepWindowInMilliseconds":300000,"propertyValue_circuitBreakerErrorThresholdPercentage":50,"propertyValue_circuitBreakerForceOpen":false,"propertyValue_circuitBreakerForceClosed":false,"propertyValue_circuitBreakerEnabled":true,"propertyValue_executionIsolationStrategy":"THREAD","propertyValue_executionIsolationThreadTimeoutInMilliseconds":45000,"propertyValue_executionTimeoutInMilliseconds":45000,"propertyValue_executionIsolationThreadInterruptOnTimeout":true,"propertyValue_executionIsolationThreadPoolKeyOverride":null,"propertyValue_executionIsolationSemaphoreMaxConcurrentRequests":10,"propertyValue_fallbackIsolationSemaphoreMaxConcurrentRequests":10,"propertyValue_metricsRollingStatisticalWindowInMilliseconds":10000,"propertyValue_requestCacheEnabled":true,"propertyValue_requestLogEnabled":true,"reportingHosts":1,"threadPool":"main.thread"}

Matt Jacobs

unread,
Nov 30, 2015, 12:44:52 PM11/30/15
to HystrixOSS
Thanks for the report, I'll investigate this. I think in the 1.4.x series, the metrics getting reset is expected, but the sleep window config is not.  That seems like a bug.  If I can repro it, I'll open a Github issue and link it here.

-Matt

Matt Jacobs

unread,
Dec 2, 2015, 2:40:28 PM12/2/15
to HystrixOSS
I'm not able to reproduce this behavior in a unit test.  See https://github.com/Netflix/Hystrix/pull/1001.  Is anything in your application calling a method that might cause configuration to get recalculated?  Something like Hystrix.reset() or HystrixCircuitBreaker.Factory.reset()?

Tobias

unread,
Dec 4, 2015, 10:54:27 AM12/4/15
to HystrixOSS
Hi,

no I don't call any reset or any other recalculate method. 

my properties are:
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds = 30000
hystrix.threadpool.default.coreSize = 30
hystrix.command.default.circuitBreaker.sleepWindowInMilliseconds = 300000

As you can see I have many threads and long timeouts. Could it be that it exposes first under heavy load? 

Tobias

unread,
Dec 7, 2015, 1:29:39 AM12/7/15
to HystrixOSS
Hi,

you say also that the metrics getting reset is expected. Why is that? I was doing a performance test with many Hystrix commands, how is it possible that it suddenly clears all its history? 

Matt Jacobs

unread,
Dec 9, 2015, 12:41:18 AM12/9/15
to HystrixOSS
I still don't have any ideas on why the config gets reset.  I'll try writing some unit tests with a super-long timeout to see what that does.

As for the metrics getting reset, this happens because it's a straightforward way to make sure that the circuit health is zero when the circuit does transition back from OPEN -> CLOSED.  If the previous failures were still in the metrics, then it would be prone to open again quickly, and that's undesirable.  This is a bit heavy-handed, and I think I've got a better solution that will separate out health metrics from general counters in 1.5

-Matt
Reply all
Reply to author
Forward
0 new messages