(accidentally sent privately to Stuart, re-sending to group)
There are 2 ways we do this at Netflix:
1) Directly using HystrixCommand (within request/response context)
For simple use cases we have applications that use HystrixCommand.queue() directly and never call .get() on it (just like what you said you're doing).
If a network timeout occurs (not thread timeout since we don't block on get()) then an exception is thrown and it's seen as a failure that can result in the circuit tripping. Thread-pool size automatically constrains throughput, does rejections and causes circuit tripping if the backend becomes saturated.
A possible functional problem with this is that generally the HystrixRequestContext is scoped to an HTTP request/response loop and fire-and-forget can result in something executing after the response is returned and thus after the HystrixRequestContext has been cleared. If you're not using RequestContext for anything (most people aren't by default) then this may not matter. It can however have an impact on request caching and request collapsing if you use either of those. If any of these items matter then option #2 below is the approach we take.
Regarding the thread timeout problem you mentioned ... there are a few ways a command can timeout (when using thread isolation):
a) queue().get()
The get() itself times out and the underlying Callable is cancelled (either while still in the queue or while its running on a thread).
b) fire-and-forget: queue() without get()
When the Callable gets picked up by a Thread if elapsed time since calling queue() > timeout then the work is skipped and metrics are incremented for a timeout occurring.
c) queue().get() with a race condition on timeout
A race condition could occur on the get() timeout and the Callable getting scheduled. In that case both (a) and (b) scenarios happen in a race on the same command. The timeout logic and metrics capture is atomically handled for this scenario.
However, none of these account for the use case you refer to which is canceling the underlying network call itself via a Future.cancel/timeout when get() is not invoked. I am not aware of a way to do that with Futures and Executors since I cannot submit a Callable with a "maximum execution time". It is only on a blocking get() call that I can choose to timeout and then cancel the task or via another thread (such as a timer) calling Future.cancel().
The way we handle this for fire-and-forget use cases is that the underlying network activity still must have a timeout that is applicable since we will never trigger an interrupt on the thread (via Future.cancel or Future.get(timeout)). In other words, for the fire-and-forget use case the thread timeout value is mostly useless, it will only take effect if the time from queueing to a thread picking it up exceeds the timeout value.
I am not aware of a mechanism to set a timeout value on a Future/ExecutorService that automatically cancels the Future after a certain time has passed without another thread blocking on the get() method. Unless there is then it is up to the underlying run() method implementation performing the network call to ensure the network timeout is set correctly.
If you want to use the HystrixCommand timeout value for the network timeout you can retrieve it within the run() method via:
getProperties().executionIsolationThreadTimeoutInMilliseconds()
That way you don't have 2 different config and can leverage the dynamic updates of Hystrix properties. Of course this assumes that your network client allows you to inject a timeout value on each request.
2) Via a separate queue (in a separate context)
For use cases where the request context is a problem, or we want to get closer to ensuring delivery via queuing instead of dropping on the floor, we'll fire-and-forget into a queue and then have a background thread (or thread-pool) pick up the work and execute the command using HystrixCommand.execute() synchronously with the request context lifecycle managed correctly but decoupled from the user request/response.
It does not sound like you need this but I figured I'd mention it.
I hope the above explanation of options helps.
Ben