Understand the Hystrix Collapser

1,092 views
Skip to first unread message

JD Rosensweig

unread,
Aug 27, 2015, 3:36:49 PM8/27/15
to HystrixOSS
Hey Hystrix Group,

   I've been trying to work my way through the Hystrix Collapser model and really understand how to successfully implement collapsing in my program.  I've been able to build a little example program to illustrate the concept to myself and it seems to be working except when I enabled the HystrixRequestContext.  If I disable the context, the sample code runs flawlessly and collapses the commands as it should.  If I enable it I end up hitting the log size limit and I get many errors in my log.

   From reading the wiki, I understand that we have to manage the lifecycle of this object.  Specifically this portion https://github.com/Netflix/Hystrix/wiki/How-To-Use#RequestContextSetup.  This seems to indicate that the scoped request context should only exist through the request life time.  What I don't understand about this is how exactly do I know when to invoke these commands?  My example code is located here:


   To run the above simply follow these steps:
  1. tar jxf collapser-test.tar.bz2
  2. cd collapser-test
  3. mvn install
  4. mvn exec:java -Dexec.mainClass=com.xyleolabs.test.collapser.ExampleCollapserMain
   In my example my commands each represent a number and the batch job simply adds them all up.  In my real code I create a unified payload for my external service.  Would I just wrap that "adding" code with the RequestContext setup and teardown?

   Would appreciate any insight.

Thanks,

JD


Matt Jacobs

unread,
Aug 28, 2015, 8:04:10 PM8/28/15
to HystrixOSS
JD -

In a case like a Tomcat app, the HystrixRequestContext object should get initialized on an incoming HTTP request, and cleaned up when the HTTP response is outgoing.  See this for code you can use directly (or fit to your specific container): https://github.com/Netflix/Hystrix/tree/master/hystrix-contrib/hystrix-request-servlet.

Using default config, a Hystrix collapser will collapse requests that occur with the same request-context only.  To make this concrete, a use-case we have at Netflix is: for a given API request, we need to grab 100 movie IDs that make up your homepage, and then for each, get the user rating.  Rather than explicitly handle the batching in application-layer code, we can just make 100 calls to the collapser that gets a rating for a specific (movie, user ID) pair.  Then the collapser compresses these into batches to be sent to our downstream ratings microservice.

Does that help clear things up?

JD Rosensweig

unread,
Aug 31, 2015, 2:00:34 AM8/31/15
to HystrixOSS
Hi Matt,

    Thanks for the response.  I'm still a little bit unclear on how to handle the lifetime of the context.  The messages are all collapsed transparently by Hystrix.  From my point of view I'm just running Hystrix commands and they get batched together.  All of my network interactions are occurring within the Hystrix batch command.  How can I know when the HTTP command has finished?  There are call backs for each individual command, but I would need a callback on batch completion.  Additionally while my batch job is running I would want another batch stop building up messages for the next run.  Would I simultaneously need a new RequestContext for that one?

    In my use case I have a continuous stream of events coming in that need to be processed and I'm collapsing them to reduce network requests.  I just don't get how to properly tie that into the RequestLog lifecycle.  I've seen the examples you mentioned, but I can't seem to connect them to my use case.  In the bookmarks case it seems like it gets a user request and wants to batch the fetching of the bookmark metadata.  So this all fits neatly into the lifecycle you describe.  But I have a continuous stream of requests that I'm trying to collapse over a time range.

    I'm sure I'm just missing something simple here.  My example code illustrates what I'm trying to do a little bit better than I can explain.

Thanks,

JD

Matt Jacobs

unread,
Sep 1, 2015, 1:26:05 PM9/1/15
to HystrixOSS
OK, so it sounds lie your case is different from the one I described. For your case, it sounds like you don't need a HystrixRequestContext object as all, as you have no requests.  Given that a HystrixRequestLog is request-scoped, you shouldn't be using that either.

What should be happening in lieu of HystrixRequestContexts is:

A) You should define your collapser to have HystrixCollapser.Scope == GLOBAL (The default is REQUEST)
B) As collapsers are invoked in your stream-processing code, they don't trigger a network call, but get placed in a batch (Hystrix handles this)
C) Hystrix sets up a background thread to turn these batches into HystrixCommands, which then make your network call with a batch of arguments.
D) When that HystrixCommand returns, your supplied demultiplexing code (mapResponseToRequests) will return a single value to each collapser invoked in B)

Hope that helps!

JD Rosensweig

unread,
Sep 2, 2015, 6:46:19 PM9/2/15
to HystrixOSS
Hi Matt,

   I tried your steps and I still see:

61756 WARN  [HystrixTimer-7][HystrixRequestLog] RequestLog ignoring command after reaching limit of 1000. See https://github.com/Netflix/Hystrix/issues/53 for more information.

   When I try to run my example with the request log enabled.

   My collapser implementation works great with the RequestLog disabled.  I understand how the model works.  I just don't understand how the request context fits into my use case.

   I've attached a diff of my changes to the original code I posted.  Should I still be starting the request context?  Is there some clean up that needs to happen?

Thanks,

JD
request_context_global.diff

Matt Jacobs

unread,
Sep 4, 2015, 3:22:25 PM9/4/15
to HystrixOSS
I'm running through your code now.  I'll have an answer for you shortly

Matt Jacobs

unread,
Sep 4, 2015, 3:38:19 PM9/4/15
to HystrixOSS
JD - Thanks for your patience.  

The problem here is that you're initializing a Request Context, and your use case doesn't require one.  Request Contexts are useful when it's important to share memory/state across threads that participate in a request.  Since your sample app has no requests, you don't need it.  By commenting out the 'HystrixRequestContext context =  HystrixRequestContext.initializeContext();' and corresponding 'context.shutdown()', nothing gets written to the request log, and you don't run into the issue described above.  Also note that you must remove the reference to 'HystrixRequestLog.getCurrentRequest()', as that no longer exists without a request context.

Let me know if that solves the problem.  Also, since all of my experience running Hystrix in production is on apps which use Request Context, I'm sure the documentation around cases like yours is not as good as it could be. If you could point out any bits of documentation that led you astray/could be improved, that would be a great outcome for all future Hystrix users.

Thanks!

-Matt

JD Rosensweig

unread,
Sep 8, 2015, 6:04:16 PM9/8/15
to HystrixOSS
Hi Matt,

    Thanks for the quick reply.  So I tried it myself and it appears that if I don't invoke the request context at all it works just fine.  I still am a little bit confused on a few things.

    First, you mention that the Request context can be used to share memory/state across threads.  In all the examples I've found of request collapsing, I've never seen this request context get used for anything other than initializing a context and shutting it down.  Is this inter-thread usage something that happens under the covers.  Or is it explicitly used in the application code?

    Second, the HystrixRequestLog seems like a useful item for debugging and getting insight into the various request results.  I guess since I'm not using the RequestContext, I won't be able to use the request log?  Is there a reason those are coupled?

    As for your question about the documentation.  I feel like they led me to believe that I was required to use the request context in one form or another.  It wasn't very clear to me that collapsing would work without the context.  The way I read it, the context was required for various mechanisms under the hood to work.  This section especially confused me:


    Perhaps my use case could be listed as an alternative that doesn't require scoped or global request contexts?  

    Once again, I appreciate your help and follow up with this discussion.  It's helped me to better understand this content.

Thanks,

JD

Matt Jacobs

unread,
Sep 9, 2015, 9:35:00 AM9/9/15
to HystrixOSS
JD -

Yeah, the request context is generally used internally only.  It is useful when your application has request bounds that are short-lived compared to the lifetime of the application.  Once in place, it is used for the request log (record of all commands executed in a request), request cache (deduplicating requests with the same argument in a request), and request-scoped collapsing (batching multiple requests with different arguments in the same request).  

In an application with no request boundaries, these all make less sense as currently implemented.  A request log would be never-ending and would have to be drained in order to avoid memory-leaks.  A request-cache has no TTL in the current implementation, since request scope takes care of that for us. Instead of request-scoped collapsing, globally-scoped collapsing is available, which will periodically drain the batch of arguments.

There is likely an request-less version of HystrixCommandLog possible that acts more like an infinite stream of metrics, rather than a single object available per-request.  But it doesn't exist today.

Thanks for the feedback on the documentation - I agree that that section is ambiguous.  When it mentions "request collapsing", "request" is mostly referring to scoping the collapser to a request, but there's no way for a documentation consumer to know that.  I'll take a pass through the docs with an eye for your use case.
Reply all
Reply to author
Forward
0 new messages