[2.1] Performance issue with play.libs.WS

Loic

unread,

May 20, 2013, 10:05:54 AM5/20/13

to play-fr...@googlegroups.com

I have a Play 2.1 Java webservice, that calls another webservice.

My service serves JSON made with transforming the XML response of the other webservice.

Initially , I was using a Jersey-Client lib to call my second service.

As this is a blocking call, I've modified the setting to allow 500 threads in parrallel.

I had good results with it : with a gatling test, I reach easily 500 requests/seconds.

Then I'ved decided to use only asynchronous calls to optimize the I/O.

Now I use play.libs.WS client instead of the jersey blocking client with the default thread pool parameters.

I use promises to get my results but the rest of my code is exactly the same as before.

Strangely, the performances with my gatling test (just a loop that sends some http GET requests) has dropped to only 250 requests per second...

I've tried to increase the number of threads for the "default-dispatcher" but it's not better.

Did I miss something?

Thanks,

Loïc

Loic

unread,

May 20, 2013, 10:38:56 AM5/20/13

to play-fr...@googlegroups.com

Could it be because a single thread processes the XML and Json transformations after the WS call ?

The app is structured as following :

//controller :

PromiseList<Entities> promise = myServiceMethod();

promise.map(new Function<List<Entities>, Result>() {

public Result apply(List<Entities> entities) {

return ok(jsonp(callback, toJson(entities)));

}

myServiceMethod is defined like that :

public Promise<List<Entity>> myServiceMethod {

Promise<Response> result = WS.url(url.toString())

return result.map(new Function<Response, List<Entity>>() {

/// XPATH and DOM parsing to create a List<Entity>

}

Note : myServiceMethod is part of a spring service singleton, and my routes/controllers use spring with the @ notation

Is there a way to do better?

Loic

unread,

May 21, 2013, 1:30:57 AM5/21/13

to play-fr...@googlegroups.com

Just a few clarifications and corrections :

With the synchronous calls, 500 qps was the average, with the play async WS client, 250 is the max I reach.

If we look at the average response times, it's more than 10x slower with the WS client :

- 30 seconds to do 10 000 requests with gatling

- between 500 and 600 seconds to run the same scenario with the async code (with sometimes some KO because of requests timeout)

I've tried to put all the xml decode / json encode into the "async()" method of my controller, but it gives the same results...

Are there some play libs.WS benchmarks working better than mine?

N.B : I've posted my question here too : http://stackoverflow.com/questions/16653253/play-framework-2-1-performance-issue-with-play-libs-ws

Guillaume Bort

unread,

May 21, 2013, 4:25:14 AM5/21/13

to play-fr...@googlegroups.com

Hi,

It is likely that the `/// XPATH and DOM parsing to create a List<Entity>` part is the bottleneck. While writing asynchronous code the bottlenecks occurs in the `map` functions. They must be absolutely non-blocking. If these functions are costly it can be needed to run them on a dedicated thread pool; unfortunately in the Java API there is no way to inject a custom ExecutionContext for each operation.

One way to solve it is to run yourself the costly code in a another threadpool:

return result.flatMap(new Function<Response, Promise<List<Entity>>>() {

return futureExecutionInACustomThreadPool(new Function<Response,List<Entity>>() {

/// XPATH and DOM parsing to create a List<Entity>

})

}

--
You received this message because you are subscribed to the Google Groups "play-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to play-framewor...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Guillaume Bort, http://guillaume.bort.fr

Loic

unread,

May 21, 2013, 4:31:54 AM5/21/13

to play-fr...@googlegroups.com

Thanks, I'll try that.

But if I understand well, it will force me to increase the number of threads to handle the XPATH and DOM parsings in parrallel, so would there be some gain comparing to the fully synchronous version (using jersey-client) ?

I will post here my new gatling results as soon as possible.

Loïc

Guillaume Bort

unread,

May 21, 2013, 5:05:32 AM5/21/13

to play-fr...@googlegroups.com

On Tue, May 21, 2013 at 10:31 AM, Loic <loic.d...@gmail.com> wrote:

But if I understand well, it will force me to increase the number of threads to handle the XPATH and DOM parsings in parrallel, so would there be some gain comparing to the fully synchronous version (using jersey-client) ?

It depends what you are trying to do. Handling IO asynchronously allow you to break the dependency between the number of thread you have in the JVM and the nummber of incoming requests you can handle. If your backend can serve 10000 requests per/second then you could serve 10000 request per/second as well, given that your CPU/Memory is powerful enough to handle the 10000 XML transformation per second.

Basically using asynchronous code will always have an overhead over using synchronous code. What you gain is scalability of the number of concurrent connections you can handle (because it's not directly tied anymore to the available number of threads you have), and possibly some memory since you don't have to keep all the thread stacks in memory (but it's a tradeoff since you will have to keep some Future stacks, so it really depends of your use case).

So if you are sure that you will never have more than 500 concurrent connections on your app, you can handle them synchronously given that you have enough memory on your server for 500 thread stacks and enough CPU to schedule them all. But 500 is an ok number for a thread for a JVM on a modern server.

If we have to handle 10000 concurrent connection, you have no other choices than handling then asynchronously.

http://www.kegel.com/c10k.html

Loic Descotte

unread,

May 21, 2013, 5:18:13 AM5/21/13

to play-fr...@googlegroups.com

Ok, thanks for your help.

The thing I'm not sure to understand is that as I have only one kind of action in my service, if I create another thread pool, it will be the same as using the default one : I'm not trying to avoid blocking some "common" user actions.

If I use some "Akka.future" to parse the XML in a separated thread, it will still be blocking, but in another thread, am I right?

So I have to increase the default pool size and I will still be synchronous?

What did I miss in my reasoning?

I'm not sure I will never have more than 500 connections/seconds , so I would like to be as "reactive" as possible.

Thanks,

Loïc

--
You received this message because you are subscribed to a topic in the Google Groups "play-framework" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/play-framework/SDpPA6UPyFQ/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to play-framewor...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Loïc Descotte

http://about.me/loicd

Loic

unread,

May 21, 2013, 6:33:53 AM5/21/13

to play-fr...@googlegroups.com

As promised, here are my first test results :

I use some Akka futures for teh XML parsing to make it in separated threads.

It's not reallt faster, it seems that as a lot of requests are coming in, the xml parsing needs a lot of threads and I reach the max very quickly (500).

Using the default thread pool, is it possible that I reach the max thread faster because some http requests "take" some avalaible threads?

Maybe that would not happen with a separated thread pool , so the play IO use only a few (= nb cores) threads...

It would be quite strange because as the controller is not blocking, it should free the threads very quickly, even if a lot of threads are avalaible...

I didn't tried this yet because to use another pool because I'm not really sure that would make a difference and it's quite verbose to do with Java...

Chanan Braunstein

unread,

May 21, 2013, 10:31:36 AM5/21/13

to play-fr...@googlegroups.com

Hi Loic,

Did you mean to paste some numbers in the first line? If so they are missing. I am really curious to see your findings.

Loic Descotte

unread,

May 21, 2013, 10:52:46 AM5/21/13

to play-fr...@googlegroups.com

Hi,

Nothing is missing :) the results is what is explained bellow the first line (not faster than before)

Loïc

On Tue, May 21, 2013 at 4:31 PM, Chanan Braunstein <cha...@gmail.com> wrote:

Hi Loic,

Did you mean to paste some numbers in the first line? If so they are missing. I am really curious to see your findings.

--
You received this message because you are subscribed to a topic in the Google Groups "play-framework" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/play-framework/SDpPA6UPyFQ/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to play-framewor...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Loïc Descotte

http://about.me/loicd

Chanan Braunstein

unread,

May 21, 2013, 11:20:40 AM5/21/13

to play-fr...@googlegroups.com

Were you getting timeouts on the Async version that you weren't getting with the sync version?

Loic Descotte

unread,

May 21, 2013, 11:21:51 AM5/21/13

to play-fr...@googlegroups.com

Yes I do when I send to many requests

On Tue, May 21, 2013 at 5:20 PM, Chanan Braunstein <cha...@gmail.com> wrote:

Were you getting timeouts on the Async version that you weren't getting with the sync version?

--

You received this message because you are subscribed to a topic in the Google Groups "play-framework" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/play-framework/SDpPA6UPyFQ/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to play-framewor...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Guillaume Bort

unread,

May 21, 2013, 11:29:29 AM5/21/13

to play-fr...@googlegroups.com

On Tue, May 21, 2013 at 4:52 PM, Loic Descotte <loic.d...@gmail.com> wrote:

Nothing is missing :) the results is what is explained bellow the first line (not faster than before)

Using asynchronous strategy will never make your code faster... it's not a magic wand that would improve suddenly the power of your CPU and make your server serve requests faster.

If in this case the bottleneck is your CPU, using async call won't help you. It only helps to use non-blocking IO when the waiting to the IO is the bottleneck.

For example, given this situation where you would use synchronous IO and synchronous code:

- get a thread

- call Task1

- block 1s on IO for the Task1 result

- call Task2

- block 1s on OP for the Task2 result

- combine result 1 and result 2 and serve the response.

- free the thread

In this case you have a response time of 2s for each request. Also assuming that the backends computing the tasks 1 & 2 are not the bottleneck, the total throughput of your application would be dependant of the total number of thread you have on the JVM. For example with 2000 threads (that would probably be the max you could configure on a standard JVM), you could serve 1000 request/s

Now, if you can do non-blocking IO and manage asynchronous code:

- get a thread

- call Task1 & Task2

- free the thread

- when result 1 and result 2 is available:

- get a thread

- combine result 1 and result 2 and serve the response.

- free the thread

Now you would have a response time of 1s for each request. And the total application throughput would not be directly dependent of the number of threads you have on the JVM. Given that the only CPU intensive taks in this case are "call Task1 & Task2" and " combine result 1 and result 2 and serve the response", if they are cheap enough you could serve 10.000 request/s without any problem.

You should not see async and NIO as a way to improve (at least not directly) your code performance, but as a way to parallelize some computation when it is functionally possible, and as a way to improve the SCALABILITY of your application.

Loic Descotte

unread,

May 21, 2013, 11:43:23 AM5/21/13

to play-fr...@googlegroups.com

On Tue, May 21, 2013 at 5:29 PM, Guillaume Bort <guillau...@gmail.com> wrote:

Using asynchronous strategy will never make your code faster... i

Yes, sorry that's what not I meant, I mean that after some modifications it's not faster than the first tries I made with asynchronous.

My problem here is that the code is more than 10x slower than the synchonous version and that I can can handle less requests with the async version (so scalabitlity is my issue)

But now we agree that the problem is in the XML parsing. I just need to find the right way to do it to avoid my performances issues.

It must be a common issue : having an async WS client API and not being able to parse the XML response without blocking is quite disturbing ... at least with the Java API. That's why I 'm wondering (c.f my previous question) if a separated thread pool would really solve the issue in my case.

Guillaume Bort

unread,

May 21, 2013, 11:46:13 AM5/21/13

to play-fr...@googlegroups.com

In the 2.2 Java API there will be a way to specify a custom ExecutionContext for each map call.

In your case, which ExecutionContext are you using for the XML parsing?

--

You received this message because you are subscribed to the Google Groups "play-framework" group.

To unsubscribe from this group and stop receiving emails from it, send an email to play-framewor...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

Loic Descotte

unread,

May 21, 2013, 11:54:27 AM5/21/13

to play-fr...@googlegroups.com

Good news :)

I'm using the defaut dispatcher to run the XML decoding in an Akka future. But as my app does only one thing, I have no risk to block some other user common actions.

All users do the same thing : call a WS and decode XML.

My problem is when I run my gatling tests, it seems that the number of threads is not enough, after 500 connections, it slows down and I have some tiemout errors. The strange thing is that with the fully synchronous version (with a jersey client) I have no issue at all. I can use this version but I would like to understand more...

So I'm still wondering if using a another execution context would not help :do http requests "take" too many some free threads so there is nothing left for the xml decoding?

Thanks,

Loïc

--
You received this message because you are subscribed to a topic in the Google Groups "play-framework" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/play-framework/SDpPA6UPyFQ/unsubscribe?hl=en-US.
To unsubscribe from this group and all its topics, send an email to play-framewor...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Loïc Descotte

http://about.me/loicd

Chanan Braunstein

unread,

May 21, 2013, 12:06:09 PM5/21/13

to play-fr...@googlegroups.com

Hi Guiliaume,

My concern is that on the Sync version Loic wasn't getting timeout, but with the Async version he was. I would have hoped that with the Async version while slower, it should have queued the request better and not timed out.

Loic Descotte

unread,

May 21, 2013, 12:24:45 PM5/21/13

to play-fr...@googlegroups.com

It's a gatling timeout, not a play timeout

--

Alexandru Nedelcu

unread,

May 22, 2013, 12:14:01 PM5/22/13

to play-fr...@googlegroups.com

On Tue, May 21, 2013 at 1:33 PM, Loic <loic.d...@gmail.com> wrote:

It's not reallt faster, it seems that as a lot of requests are coming in, the xml parsing needs a lot of threads and I reach the max very quickly (500).

If CPU-bound operations, like XML parsing, is really the bottleneck, then using a lot of threads is pointless and actually makes the problem worse. You should only do that in a thread-pool with a number of threads that's directly proportional, preferably equal, to the number of CPU cores you have.

I also do not think that XML parsing is the issue here, given that you need to process 500 documents per second, unless you're doing something else there, or if you're running this on a slow CPU, or if the library you're using is really awful.

Are you sure you properly debugged it? Never make uneducated assumptions. Always profile and make assumptions based on actual data provided by your profiler. Use VisualVM, or if you want something more fancy, you could use YourKit Profiler (it has a special offer right now for personal licenses). One thing to look out for is the CPU usage. If you're not using it (i.e. it's less than 70%), then you're doing something wrong.

Another thing you can do is to checkout the number of outgoing connections that your application creates. You're saying that you start many requests in parallel, but who knows, maybe you're screwing up somewhere. This can be done by in-app instrumenting of your code (like incrementing a counter when you start a request, decrementing it when a request is finished and remember the maximum number and the average you get each second). You can also measure connections created by using standard Unix tools, such as "lsof".

Alexandru Nedelcu

unread,

May 22, 2013, 12:15:53 PM5/22/13

to play-fr...@googlegroups.com

Forgot to add. For in-app instrumentation of your code, you can also use pre-made libraries, such as Codahale's Metrics: http://metrics.codahale.com/

--
Alexandru Nedelcu
http://bionicspirit.com

Martin Grotzke

unread,

May 22, 2013, 12:21:59 PM5/22/13

to play-fr...@googlegroups.com

A good thing would be to just measure xml parsing with sample data in a micro benchmark.

Cheers,
Martin

Pascal Voitot Dev

unread,

May 22, 2013, 12:22:03 PM5/22/13

to play-fr...@googlegroups.com

If you really to know where is the hotspots (if there is one in the code and it's not due to IO problem for ex), use a profiler, it's the best way.

VisualVM by example.

You run your stuff and after warmup, you make a snapshot and search for hotspots... you should see it quite quickly!

Pasacl

Alexandru Nedelcu

unread,

May 22, 2013, 12:57:51 PM5/22/13

to play-fr...@googlegroups.com

Of course, you can also suffer from "Uniformly Slow Code", but that's just life :-)

http://c2.com/cgi/wiki?UniformlySlowCode

Alexandru Nedelcu
http://bionicspirit.com

Loic Descotte

unread,

May 23, 2013, 4:21:15 AM5/23/13

to play-fr...@googlegroups.com

Hi,

I've made some profiling with visualVM with the following context

- calling the external WS with play.libs.WS

- parsing the XML responses with DOM and XPATH in a separated thrad using a future

- sending a JSON response (with toJson()) with the async() method of controllers

- the default pool is used everywhere (simpler with the java API) with a 500 max threads size

-500 users simulated with gatling to send 10000 requests (ramp = 10)

Here are my results :

- A lot of "TimeoutException: No response received after 60000" from Gatling

-CPU is between 80% and 100% (not at 100 all the time)

-CPU sample after 5 minutes :

scala.concurrent.forkjoin.ForkJoinPool.scan : 66.7% of time

sun.nio.ch.SelectorImpl.select : 13.3%

play.libs.XPath.selectNode: 5.8: %

scala.collection.JavaConversion$JEnumerationWrapper.hasNext : 4.2%

Same test with the default number of threads (nb cpu I guess - so 2 for me) :

- A lot of "TimeoutException: No response received after 60000" from Gatling too

-CPU is between 80% and 100% too

-CPU sample after 5 minutes :

sun.nio.ch.SelectorImpl.select : 35%

play.libs.XPath.selectNode: 16.4: %

scala.collection.JavaConversion$JEnumerationWrapper.hasNext : 12%

scala.concurrent.forkjoin.ForkJoinPool.scan : 10%

Just to be clear: I'm not blocked with this issue as with my synchronous jersey client I have some good performances. This topic was just to understand better the best way to do this. It seens that as the network is not my bottleneck , I have no blocking with the external webservice and my bottleneck is the XML parsing, async is not the best way to go for my use case.

Thanks for everything

Loïc

Reply all

Reply to author

Forward

Message has been deleted