Implementing Load Balancing Spray Client

Elmar Weber

unread,

Feb 11, 2016, 4:46:00 PM2/11/16

to spray.io User List

Hello,

we are using ES with Spray via REST. Currently we are moving towards a cluster and also testing load balancing and failure scenarios.
We have a thin layer on top of the spray pipeline that takes care of load balancing requests (either via random or based on a simple histogram of response times). We are still looking into how to detect downed nodes and when to kick nodes out of the pool and re-adding them reliably.

Because this sounds like a solved problem, I was wondering if there is already an implementation out there or anyone has experience doing with with spray client?

Thanks,
Elmar

Age Mooij

unread,

Feb 12, 2016, 8:35:07 AM2/12/16

to spray...@googlegroups.com

Hi Elmar

I've built an API gateway that uses an Akka router per logical endpoint, which routes to a set of child actors that is each managing a single HostConnector, falling back for brief periods to IO(Http) in case the HostConnector gets killed.

We use a completely external system to detect downed nodes, i.e. we use Consul for health monitoring and a Consul watch to update the gateway config at runtime so we can reconfigure the Akka router actor. Of course you can do this yourself by monitoring a health endpoint on each node but detecting new nodes might be a little harder that way.

There are still a lot of features that we want to add to this strategy though, mostly around failure detection/handling and red-blue deployment of endpoint nodes.

If this is something that is interesting to you I can dump some of the code into a gist.

Hope this helps

Age

--
You received this message because you are subscribed to the Google Groups "spray.io User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spray-user+...@googlegroups.com.
Visit this group at https://groups.google.com/group/spray-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/spray-user/dc52424b-b11f-4cf1-afbd-5ca023cf6ee7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Elmar Weber

unread,

Mar 18, 2016, 11:59:39 AM3/18/16

to spray.io User List

Hi Age,

thanks, yes. Sorry for only replying now, was on the to do later list but came back now as we are also faced some performance issues and set up finally Kamon also in production for distributed tracing and bundled it all in an approach similar to what you described.

We included load balancing, performance measurement, error logging and also cross service tracking into it and feed all this data into our monitoring infrastructure. Our down time detection is right now basically if times go up or we run into too many timeouts, then clients back of from the node incrementally. We use it also track all metrics, identify slow calls, etc. Right now it's not highly configurable as it makes assumptions on our own API layer classes, but that's the next step. Next will be also to add an API endpoint where we can just change the configuration on the fly without downtime to manage new nodes and remove manually nodes if we do upgrades or maintenance.

Cheers,
Elmar

Reply all

Reply to author

Forward