Better support for 20+ CPU cores

117 views
Skip to first unread message

sri

unread,
Mar 1, 2016, 11:26:37 PM3/1/16
to Mojolicious
This might be of interest to some of you.


Pretty much anyone with access to a big server can help.

--
sebastian

Helmut Wollmersdorfer

unread,
Mar 2, 2016, 4:33:17 AM3/2/16
to Mojolicious
Cores or threads?

I have a cluster (idle), each node 2 CPUs x 4 cores = 8 cores = 16 threads. SATA RAID 10. So not the typical hardware for high traffic.

The others with 24 threads are under load.

John Scoles

unread,
Mar 2, 2016, 9:23:02 AM3/2/16
to mojol...@googlegroups.com
Well here is my input

Running 30s test @ http://127.0.0.1:8080
 2 threads and 100 connections
 Thread Stats   Avg      Stdev     Max   +/- Stdev
   Latency     7.47ms    5.09ms  46.01ms   65.63%
   Req/Sec     7.05k   365.44     9.26k    76.33%
 420696 requests in 30.01s, 46.84MB read
Requests/sec:  14018.34
Transfer/sec:      1.56MB

Only 12 processors today. Might have the big box 32 up in a day or two.

cheers
John


Date: Tue, 1 Mar 2016 20:26:37 -0800
From: mojol...@googlegroups.com
To: mojol...@googlegroups.com
Subject: [Mojolicious] Better support for 20+ CPU cores
--
You received this message because you are subscribed to the Google Groups "Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mojolicious...@googlegroups.com.
To post to this group, send email to mojol...@googlegroups.com.
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.

Juergen Nickelsen

unread,
Mar 2, 2016, 9:44:47 AM3/2/16
to mojol...@googlegroups.com
On 02.03.2016 05:26, 'sri' via Mojolicious wrote:
> https://github.com/kraih/mojo/issues/925

I put a number of results there (2..24 cores, in steps of two) from 4
runs, and the shell script that I used to make the runs. All with
Mojolicious 6.03.

For some reasons, the results of the different runs seem rather
inconsistent, but I am pretty sure the reason is not the base load of
the server (which is at about 0.5 to 0.7). In requests/s:

Workers Run 1 Run 2 Run 3 Run 4
2 3290 3232 3271 3209
4 6551 6725 6867 6668
6 9939 10285 10421 9934
8 4692 3771 3441 4884
10 3139 6022 5983 3383
12 10231 6841 6818 10562
14 9810 11320 11391 10041
16 2725 3062
18 1831 807
20 20403 6283 6259 20857
22 2265 7044 7025 3141
24 968 11489 11345 67

(There were connection problems in runs 2 and 3.)

Regards, Juergen.

--
<Juergen....@fu-berlin.de> Tel +49.30.838-50740 Fax -450740
Zentraleinrichtung fuer Datenverarbeitung, Central Systems (Unix)
Freie Universitaet Berlin, Fabeckstrasse 32, 14195 Berlin, DE

sri

unread,
Mar 2, 2016, 10:19:42 AM3/2/16
to Mojolicious
The results from another load test seem to confirm the findings so far, which suggest that the current defaults don't just scale indefinitely.


My first guess is an uneven distribution of connections among the workers, perhaps setting -M 2 or -M 5 would fix that.

--
sebastian

Juergen Nickelsen

unread,
Mar 2, 2016, 11:02:50 AM3/2/16
to mojol...@googlegroups.com
On 02.03.2016 16:19, 'sri' via Mojolicious wrote:
> My first guess is an uneven distribution of connections among the
> workers, perhaps setting -M 2 or -M 5 would fix that.

I tried it again with the -M2 and -M5 settings as suggested (see the
file names).

https://github.com/kraih/mojo/issues/925#issuecomment-191302254

With a single-digit number of workers, I can see (using the precision
measurement program top(1)) that at first, the load distribution of the
workers is quite even, each at 100% CPU, and after a while, it gets a
little more and more uneven.

With 10 or more workers, the load of all workers seems to cave in after
a while, sometimes even after very few seconds, with *none* of the
workers bearing a significant load any more. Except that 20 workers
indeed looks like a sweet spot, with the load caving in only later (but
it does) in some cases. Not in all.

So, behaviour is quite erratic for whatever reason. Oh, and by the way,
this is on Debian Jessie, with the Debian-provided Perl 5.14.2.

Juergen Nickelsen

unread,
Mar 2, 2016, 1:03:33 PM3/2/16
to mojol...@googlegroups.com
A few more runs, now with Mojolicious-6.51; most of them seem not quite
as erratic, but I have still seen the perl processes stop producing load
after a few (or very few) seconds with 10 workers or more. And sometimes
with fewer.

It is really particular that the connect problems only ever happen with
16 and 18 workers.

Workers Run1 Run2 Run3 Run4 Run5 Run6 Run7
2 3263 3264 3324 2792 3323 3297 3116
4 6755 6898 6716 6594 6561 6972 6728
6 10567 9821 9816 10579 10410 10509 10153
8 6406 5901 4548 5899 4745 5197 4764
10 6389 1424 2785 2104 2189 4546 3535
12 3497 14446 9811 11146 9839 9666 11098
14 14334 4082 11642 10435 12228 9964 9641
16 5883 2705 4456 3556
18 14424 10190 1638 10466
20 6658 4647 2903 11885 2426 18637 10773
22 3653 6010 10079 2863 10111 4565 3689
24 14368 10516 11680 10238 12374 1681 10440
(Requests/s each)

To compare, the same "wrk" run against another page on the same machine,
that is served by Apache with a 301 and the standard Apache "Moved
Permanently" template (a redirect to the HTTPS site) is good for 110361
requests/s.

For the raw data see
https://github.com/kraih/mojo/issues/925#issuecomment-191348484

Regards, Juergen.

--
<Juergen....@fu-berlin.de> Tel +49.30.838-50740 Fax -450740
Zentraleinrichtung fuer Datenverarbeitung, Special Intelligence

sri

unread,
Mar 2, 2016, 1:10:51 PM3/2/16
to Mojolicious
A few more runs, now with Mojolicious-6.51; most of them seem not quite
as erratic, but I have still seen the perl processes stop producing load
after a few (or very few) seconds with 10 workers or more. And sometimes
with fewer.

Yea, we need to find out why that is. I'm still surprised that -M 2 didn't result in better load balancing. Perhaps a less conservative "-M 1 -a 0 -c 10" does better?

--
sebastian

sri

unread,
Mar 2, 2016, 1:14:48 PM3/2/16
to Mojolicious
 Perhaps a less conservative "-M 1 -a 0 -c 10" does better?

Or just set -c dynamically to "100 / $workers".

--
sebastian

sri

unread,
Mar 2, 2016, 1:25:08 PM3/2/16
to Mojolicious
Or just set -c dynamically to "100 / $workers".

Yea, this is it, you have to limit the number of concurrent connections per worker in these benchmarks to evenly distribute load. Changing -M might even be counter productive, while setting "-a 0" should help too.

--
sebastian

sri

unread,
Mar 2, 2016, 8:17:10 PM3/2/16
to Mojolicious
I've just released Mojolicious 6.52 with a few bug fixes that will make scaling for different workloads a lot easier.


--
sebastian

Helmut Wollmersdorfer

unread,
Mar 3, 2016, 6:06:31 AM3/3/16
to Mojolicious


Am Mittwoch, 2. März 2016 17:02:50 UTC+1 schrieb Juergen Nickelsen:

So, behaviour is quite erratic for whatever reason. Oh, and by the way,
this is on Debian Jessie, with the Debian-provided Perl 5.14.2.

Are you sure?

My Jessie servers have Perl v5.20.2.

Juergen Nickelsen

unread,
Mar 3, 2016, 6:11:31 AM3/3/16
to mojol...@googlegroups.com
On 03.03.2016 12:06, Helmut Wollmersdorfer wrote:

> Am Mittwoch, 2. März 2016 17:02:50 UTC+1 schrieb Juergen Nickelsen:
>
> So, behaviour is quite erratic for whatever reason. Oh, and by the way,
> this is on Debian Jessie, with the Debian-provided Perl 5.14.2.
>
> Are you sure?
> My Jessie servers have Perl v5.20.2.

Oh, you're right. The Perl version I gave is correct, but that machine
is still on wheezy. Thanks for the correction!

Regards, Juergen.

--
<Juergen....@fu-berlin.de> Tel +49.30.838-50740 Fax -450740
Zentraleinrichtung fuer Datenverarbeitung, Security Individual

Juergen Nickelsen

unread,
Mar 3, 2016, 11:58:21 AM3/3/16
to mojol...@googlegroups.com
On 03.03.2016 02:17, 'sri' via Mojolicious wrote:
> I've just released Mojolicious 6.52 with a few bug fixes that will make
> scaling for different workloads a lot easier.

Even if the issue is closed now, I still want to show (off :-) the
summary of the results I got from today's test runs with Mojolicious
5.62 and the updated test setup:

workers run 1 run 2 run 3 run 4 run 5 run 6 run 7 run 8
2 3585 3544 3553 3586 3470 3529 3537 3509
4 7359 7097 6911 6937 7386 7288 6940 7477
6 11182 11272 11513 11493 11176 11319 10878 11111
8 15443 15414 15252 15421 15413 15232 15537 15663
10 18943 19412 19238 19253 19102 19116 19198 18982
12 21901 22164 22363 22119 21824 22045 21922 21995
14 23328 23281 23427 23197 23571 23179 23688 23651
16 24375 24271 24459 24404 24760 24536 24665 24433
18 25814 25948 25420 25443 25833 24715 25605 25401
20 26693 26703 26487 26769 26758 26789 26587 26827
22 28007 27598 27674 27593 27886 27652 27748 27726
24 28276 27868 28323 28418 28334 28367 28561 28459

In case anyone is interested, I have attached the test driver script and
the summary generator script. Raw data available on request if you are
interested in the details of wrk's output.

Regards, Juergen.

--
<Juergen....@fu-berlin.de> Tel +49.30.838-50740 Fax -450740
Zentraleinrichtung fuer Datenverarbeitung, Supernatural Infrastructure
multicore-benchmark.sh
grabvals.pl

sri

unread,
Mar 3, 2016, 12:29:20 PM3/3/16
to Mojolicious
Even if the issue is closed now, I still want to show (off :-) the
summary of the results I got from today's test runs with Mojolicious
5.62 and the updated test setup

Oh, by all means, please keep posting data. If there's interesting new findings i will immediately reopen the issue. :)

--
sebastian

sri

unread,
Mar 3, 2016, 1:50:44 PM3/3/16
to Mojolicious
Allright, diagrams suggest that we are in a pretty good spot with scalability, you just have to tune your web server correctly.


Would still be interesting to see results for a server with more than 12 cores.

--
sebastian

sri

unread,
Mar 4, 2016, 2:56:24 PM3/4/16
to Mojolicious
Btw. There's a possibility that epoll (which we support through EV, and you can enable with LIBEV_FLAGS=4) might improve load balancing too.

--
sebastian
Reply all
Reply to author
Forward
0 new messages