Our goals are:
Our observations so far:
emulator and other. We are yet to understand what other could be. According to Erlang documentation is unaccounted things.| METRIC | BAD | GOOD |
|---|---|---|
| user cpu | 46% - 57% | 19% - 40% |
| system cpu | 20% - 37% | 1% - 10% |
| network traffic | 6M - 19M | up to 8M |
| system interrupts | 120k - 196k | 10k - 20k |
| syscalls | 1.6M - 2.1M | 49k - 110k |
| task-clock 10sec | 68255 | 12324 |
| cpu profiling info |
We have gathered lots of metrics in attempt to identify why the BAD cluster uses so much CPU. All the information can be found here https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841 along with the environment information.
We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
We are experiencing a very high cpu utilization in 3 clustered Erlang VMs running RabbitMQ. We have deployed another cluster in an attempt to reproduce the same behaviour without much success.Our goals are:
- Find out where the CPU is being utilized
- Choose the right tools to analyze CPU utilization
Our observations so far:
- The BAD cluster observes a pretty excessive CPU utilization, both user and system ones, and also network.
- The BAD cluster also observes a higher Erlang scheduler utilization, specially on microstate
emulatorandother. We are yet to understand whatothercould be. According to Erlang documentation is unaccounted things.
- The BAD cluster observes a considerably higher number of system calls which we are yet to identify (dunno how) why is that.
- The BAD cluster does not necessarily run higher number of reductions. In fact, the GOOD cluster runs more reductions and yet has a lower scheduler utilization.
METRIC BAD GOOD user cpu 46% - 57% 19% - 40% system cpu 20% - 37% 1% - 10% network traffic 6M - 19M up to 8M system interrupts 120k - 196k 10k - 20k syscalls 1.6M - 2.1M 49k - 110k task-clock 10sec 68255 12324 cpu profiling info We have gathered lots of metrics in attempt to identify why the BAD cluster uses so much CPU. All the information can be found here https://gist.github.com/MarcialRosales/226716f0cb9e27cd9ab02eac04702841 along with the environment information.
We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.
Hello,On Wed, Jul 18, 2018 at 5:34 AM Marcial Rosales <mros...@pivotal.io> wrote:We are experiencing a very high cpu utilization in 3 clustered Erlang VMs running RabbitMQ. We have deployed another cluster in an attempt to reproduce the same behaviour without much success.Our goals are:
- Find out where the CPU is being utilized
- Choose the right tools to analyze CPU utilization
Our observations so far:
- The BAD cluster observes a pretty excessive CPU utilization, both user and system ones, and also network.
- The BAD cluster also observes a higher Erlang scheduler utilization, specially on microstate
emulatorandother. We are yet to understand whatothercould be. According to Erlang documentation is unaccounted things.If you compile Erlang with "./configure --with-microstate-accounting=extra" (as suggested by Danil), the other part will be broken into more granular parts. However, looking at your perf recordings I would guess that most of the other time is time spent spinning before going to sleep.
- The BAD cluster observes a considerably higher number of system calls which we are yet to identify (dunno how) why is that.
Maybe use strace and then write some small script that counts the different syscall made?
We appreciate a lot any insights as to what could be causing the issue and/or in relation to additional tools we could use.
My gut tells me that there is some syscall that is a lot slower on XEN than it is in the KVM. In virtualized environments I always tend to suspect the time source first. Different hypervisors have very different performance for getting the time and the Erlang VM does a lot of time fetching.
Lukas