[erlang-questions] SMP for IO-bound applications

51 views
Skip to first unread message

Erisa Dervishi

unread,
Jun 21, 2012, 2:37:44 PM6/21/12
to erlang-q...@erlang.org
Hi,
As part of my studies, I have recently been doing some performance evaluations on Erlang SMP improvements  for IO-bound applications.

The applications I considered for the evaluation were Emysql driver (https://github.com/Eonblast/Emysql) and Mnesia.
I have created an Erlang module for performing reads/writes from/to Mysql DB through Emysql driver, and another Erlang module for communicating with Mnesia.
Both the modules have two methods write(nr_records) which writes nr_records records to the database (MySQL, or Mnesia), and read(nr_records) which reads nr_records random records from the db. (both the reads and writes are consecutive, not in chunks)
I have a load generator tool which simulates simultaneous requests to both the modules and gathers performance statistics. I generate as much load as needed to stress the applications (near the saturation point)
The metrics I uses are the throughput (no. of sessions/sec), and the session duration. A session is a call to write(nr_records) or read(nr_records) function depending on the test case (read or write). So if I call write(500), I measure the duration of a session that does 500 hundred inserts into the database.

The tests I run have these parameters:

A)Type of operations: 1- Reads 2-Writes
I have just one table with 20 fields, and the reads and writes are just select and insert operations in that table

B) I/O applications: 1- Emysql 2-Mnesia

C) SMP parameters:
1- SMP enabled, no. of schedulers = no. of cpu cores
2-SMP  enabled, no. of schedulers = 1
3- SMP disabled

D) Erlang OTP versions:
1- Erlang R12B (The SMP has only one run-queue and multiple schedulers)
2- Erlang R15B (Improved SMP, 1 run-queue per each scheduler)

The tests included all possible combinations from A,B,C,D. They were run in Solaris 10 x86 (a 16 cores machine).
In general these were the results I got:

Emysql driver:
1- There is a a big difference in performance between SMP enabled and disabled in both the Erlang versions (R12 and R15)  for both read and write tests.  So I can say that SMP rocks! However, you have to have enough load to achieve that (for low load I could not see any difference, sometimes it was even better to disable SMP)

2- I was expecting a much better performance for SMP enabled (no. of cores = no. of schedulers) in R15 than in R12, since the schedulers' logic has changed a lot from R12 to R15. But the results were more or less the same.  I was thinking since I had a multithreaded db like MySql, a multiple db-connections driver as Emysql, and SMP with multiple schedulers with their own run queues, I could get better results than in R12 where there is only one run-queue and multiple schedulers (more lock contention).
Is it maybe because the processes are just doing IO and nothing CPU-intensive?

3- I realized that CPU usage in R15 SMP enabled is twice higher than the CPU use in R12 SMP enable. All the cores have a CPU usage over 90%, and I can see that the beam process is using up to 75-80% of the CPU, the rest is the mysql daemon process. Is it because in R15 the scheduling algorithm has become more complex? 

Mnesia

I have only one table (as I said before) created with the attribute disc_copies, and it is stored only in one node. The whole table fits in RAM, and I tried to keep it simple by controlling the size of the table during my tests, in order not exceed RAM capacity 

4-  For the write tests in Mnesia I see the same behavior as in Mysql. However, since the data are all loaded in RAM, the session duration is shorter, and the gain in performance between SMP enabled and disabled is not as huge as in Mysql.

5- For the read tests, I have two observations. First, the dirty reads are 10 times faster than the reads that use transactions. Second, I have better results for SMP disabled (twice faster, though in both cases the session duration is in order of millisecods, 70 msec vs 30 msec).

6- Same as in Mysql, no difference in performance between R12 and R15 with SMP enabled, for both read and write tests.

If you have read so far, and have a comment on my results, you are welcome. Especially about the no difference btw R12 and R15 and the high CPU usage in R15 when the applications are IO-bound.

Thanks,
Erisa







Erisa Dervishi

unread,
Jun 21, 2012, 3:28:13 PM6/21/12
to erlang-q...@erlang.org
One additonal comment I forgot to mention:

For all Mnesia tests, the CPU usage is low in general. Only for the transactional read test in R15 with SMP enabled i get a CPU usage of 70%. 
From the load generator tool, i call the read(N) function that is showed below:

exec_read(0) -> done_reading;

exec_read(N)  ->

        Id=random:uniform(7000000),

        %%io:format("~p~n",[Id]),

        %%[_Row] = mnesia:dirty_read({subscriber,Id}),

        Fun = fun() ->

           mnesia:read({subscriber,Id})

        end,

        {atomic,[Row]}=mnesia:transaction(Fun),

        %io:format("~p~n",[Row]),

        exec_read(N-1).

read(N) ->

        {A1,A2,A3} = now(),

        random:seed(A1, A2, A3),

        exec_read(N).


I am using mpstat while my tests are running and beside the CPU usage, another difference of this test from the other Mnesia tests is that the syscl (system calls) column of mpstat has a value which is 20 times higher than the other test cases.


be.d...@gmail.com

unread,
Jun 22, 2012, 4:22:37 AM6/22/12
to eris...@gmail.com, erlang-q...@erlang.org
I think that much of CPU usage is created by your MySQL driver.
You should have a look at driver details. There may be copying of the query you send to MySQL between your benchmarking process and the real process that owns the socket. The same with result - parsing response and sending result back to benchmarking process. There may be a bottleneck in choosing MySQL connection from connection pool if it is made through supervisor.
Try to use etop to see what functions are at the bottlenecks.
I mean your io-intensive test may be just the test showing driver bottlenecks and not the problems in smp.
> *Emysql driver:*

> 1- There is a a big difference in performance between SMP enabled and
> disabled in both the Erlang versions (R12 and R15)  for both read and write
> tests.  So I can say that SMP rocks! However, you have to have enough load
> to achieve that (for low load I could not see any difference, sometimes it
> was even better to disable SMP)
>
> 2- I was expecting a much better performance for SMP enabled (no. of cores
> = no. of schedulers) in R15 than in R12, since the schedulers' logic has
> changed a lot from R12 to R15. But the results were more or less the same.
>  I was thinking since I had a multithreaded db like MySql, a multiple
> db-connections driver as Emysql, and SMP with multiple schedulers with
> their own run queues, I could get better results than in R12 where there is
> only one run-queue and multiple schedulers (more lock contention).
> Is it maybe because the processes are just doing IO and nothing
> CPU-intensive?
>
> 3- I realized that CPU usage in R15 SMP enabled is twice higher than the
> CPU use in R12 SMP enable. All the cores have a CPU usage over 90%, and I
> can see that the beam process is using up to 75-80% of the CPU, the rest is
> the mysql daemon process. Is it because in R15 the scheduling algorithm has
> become more complex?
>
> *Mnesia*
> *
> *

Erisa Dervishi

unread,
Jun 22, 2012, 7:51:52 AM6/22/12
to be.d...@gmail.com, erlang-q...@erlang.org
Hey, thank you for your reply, but I use the same mysql driver even for my evaluations in  Erlang OTP R12 and do not get the same CPU usage (for R12 is up to 45%). And the tests are exactly the same (same load). I have also checked the message queue of the process that supervises the connections, and it looks like there isn't a bottleneck in there (the queue length is most of the time 0 and up to 4 messages at most)
And even for Mnesia the CPU usage (though low), doubles from R12 to R15 with smp enabled from 6% to 12% for write tests

Ronny Meeus

unread,
Jun 24, 2012, 4:47:01 AM6/24/12
to Erisa Dervishi, erlang-q...@erlang.org
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions
>

Hello

I'm very interested also in the behavior of Erlang applications in an
SMP environment (so also my post "Strange observation running in an
SMP environment." on this mailing list). Is the code that you have
created somewhere available so that I can also play with it? In case
not: are you willing to share it with the community?

Thanks

--
Best regards,
Ronny
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Erisa Dervishi

unread,
Jun 24, 2012, 2:11:58 PM6/24/12
to Ronny Meeus, erlang-q...@erlang.org
Yes sure, even though I am using Tsung (a load generator tool written in Erlang) to generate my load and get some results back. I have added a new Tsung plugin for the purpose of my tests. I am a bit busy right now but If you are interested I will share what I have done so far.

However, I would really like to have an opinion from an expert about my comments in here

Björn-Egil Dahlberg

unread,
Jun 24, 2012, 2:37:12 PM6/24/12
to Erisa Dervishi, erlang-q...@erlang.org
Hi Erisa,

In R15 we try to spin schedulers even more in an effort to reduce their sleep. This is especially useful when we have jitter in runnability among processes. If a queue is empty a scheduler is forced to sleep, the latency in wake up is something we want avoid, hence we spin. The system is at its best when it is fully saturated.

Another feature in R15 is delayed dealloc. Memory deallocation in the wrong thread is instead scheduled to be deallocated. This feature lessens lock contentions on memory allocators and improves scalability but will in all likelihood also increase CPU utilization (I have seen this on other measurements). In the future the schedulers will probably take this into account among other constraints and rules.

If you want to investigate this further I recommend using a gprof-enabled vm to examine where the time is spent.

Regards,
Björn-Egil

Scott Lystig Fritchie

unread,
Jul 2, 2012, 8:39:43 AM7/2/12
to Erisa Dervishi, erlang-q...@erlang.org
>>> Erisa Dervishi <eris...@gmail.com> wrote:

ed> As part of my studies, I have recently been doing some performance
ed> evaluations on Erlang SMP improvements for IO-bound applications.
ed> [...]
ed> The tests I run have these parameters:

Hi, Erisa. Sorry to jump into this email thread so late. I didn't
see any mention of a couple of parameters in your summary, so I'm
wondering if you've used or studied the effect of these "erl" command
line flags:

+A n (where N is an integer > 0)
+K true

The first flag tells the VM to use the asynchronous I/O thread pool
for local disk I/O. The second uses a different socket ready/activity
mechanism rather than the old-and-frequently-slower select(2) system
call. (See http://www.erlang.org/doc/man/erl.html for summary.)

-Scott
Reply all
Reply to author
Forward
0 new messages