Different cpu consumption by scylla threads with different linux kernels after the "nodetool drain" command

65 views

Skip to first unread message

Mark Barinstein

<mark.barinstein@gmail.com>

unread,

Mar 29, 2023, 7:44:42 AM3/29/23

to ScyllaDB users

Hi All,

The question is about different cpu consumption by scylla threads with different linux kernels after the nodetool drain command.
All the results are from a single node system, but behavior of multi-node systems is nearly the same.
scylladb 5.1.5 Open Source.
We suspect, that different kernel functions are called depending on an OS kernel version at least, and this explains different behavior.
The details are below.

Questions:
Is this known behavior?
Does it work as designed?

top [-1] -H -n1 -b -p $(pidof scylla)

Linux kernels 3.x / 4.x
Ubuntu 18.04, Centos 7/8, RHEL 8.1

Threads: 12 total, 1 running, 11 sleeping, 0 stopped, 0 zombie
%Cpu(s): 14.8 us, 9.8 sy, 0.0 ni, 73.8 id, 0.0 wa, 0.0 hi, 1.6 si, 0.0 st
KiB Mem : 3861256 total, 3041724 free, 455896 used, 363636 buff/cache
KiB Swap: 4063228 total, 4063228 free, 0 used. 3171324 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8993 scylla 20 0 16.0t 201992 32912 R 93.3 5.2 1:46.35 scylla <--
8994 scylla 20 0 16.0t 201992 32912 S 0.0 5.2 0:01.36 reactor-1
...

Linux kernels 5.x
Ubuntu 20.04, Centos 7 (5.x kernel is installed manually)

Threads: 8 total, 0 running, 8 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 1.7 sy, 0.0 ni, 98.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 3990136 total, 3286472 free, 406180 used, 297484 buff/cache
KiB Swap: 4063228 total, 4063228 free, 0 used. 3354412 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1263 scylla 20 0 16.0t 239768 65260 S 0.0 6.0 0:02.38 scylla
1265 scylla 20 0 16.0t 239768 65260 S 0.0 6.0 0:01.70 reactor-1
...

On distros with the top -1 option available we see that first 1 or 2 threads are 100% busy.
The situation is slightly different in a multi-node environment:
On the drained node the reactor-1 thread consumes 100% cpu as well (2 theads are 100% busy in this case).
But not on other nodes, where the main scylla process consumes 100 of cpu only.

strace -p $(pidof scylla) -c

Linux kernels 3.x / 4.x
Ubuntu 18.04, Centos 7/8, RHEL 8.1

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.58 4.569387 3 1359826 epoll_pwait
0.22 0.010306 20 510 write
0.16 0.007365 6 1115 timerfd_settime
0.02 0.000973 10 97 timer_settime
0.01 0.000638 6 95 rt_sigreturn
0.00 0.000046 5 8 rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00 4.588715 1361651 total

Linux kernels 5.x
Ubuntu 20.04, Centos 7 (5.x kernel is installed manually)

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
71.14 0.692684 507 1365 io_pgetevents
6.17 0.060087 12 4635 timerfd_settime
5.31 0.051699 27 1859 io_submit
4.71 0.045870 57 794 write
3.95 0.038451 20 1901 182 read
3.46 0.033645 22 1510 membarrier
2.78 0.027074 9 2886 rt_sigprocmask
2.48 0.024196 14 1702 timer_settime
------ ----------- ----------- --------- --------- ----------------
100.00 0.973706 16652 182 total

Mark Barinstein

<mark.barinstein@gmail.com>

unread,

May 5, 2023, 4:14:38 PM5/5/23

to ScyllaDB users

For those who are interested.
The answer to this question has been provided here:
https://forum.scylladb.com/t/different-cpu-consumption-by-scylla-threads-with-different-linux-kernels-after-the-nodetool-drain-command