[erlang-questions] Why does an idle erlang process (couchdb, to be precise) call epoll_wait so often?

45 views
Skip to first unread message

Jann Horn

unread,
May 22, 2013, 7:09:28 AM5/22/13
to erlang-q...@erlang.org
This is strace output from a totally idle couchdb process:

[pid 18350] 17:40:15.086577 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.086610 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.086643 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.086675 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.086736 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.086771 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.086940 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.087180 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.087219 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.087251 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.087287 epoll_wait(3, {}, 256, 228) = 0
[pid 18350] 17:40:15.315646 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.315718 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.315751 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.315800 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.315833 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.315865 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.315900 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.315953 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.315992 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.316081 epoll_wait(3, {}, 256, 651) = 0
[pid 18350] 17:40:15.967979 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.969211 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.969926 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.970484 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.970616 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.970717 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.970806 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.970895 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.970983 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.971071 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.971158 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.971243 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:15.971343 epoll_wait(3, {}, 256, 345) = 0
[pid 18350] 17:40:16.316908 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.317266 epoll_wait(3, {}, 256, 650) = 0
[pid 18350] 17:40:16.969333 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.970057 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.970580 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.970736 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.970827 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.970917 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.971011 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.971118 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.971206 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.971300 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.971386 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:16.971485 epoll_wait(3, {}, 256, 115) = 0
[pid 18350] 17:40:17.087042 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:17.087421 accept(10, 0x7fb208dbfba0, [28]) = -1 EAGAIN (Resource temporarily unavailable)
[pid 18350] 17:40:17.087915 epoll_ctl(3, EPOLL_CTL_DEL, 10, {EPOLLIN, {u32=10, u64=73199780460757002}}) = 0
[pid 18350] 17:40:17.088035 epoll_ctl(3, EPOLL_CTL_ADD, 10, {EPOLLIN, {u32=10, u64=73199780460757002}}) = 0
[pid 18350] 17:40:17.088139 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:17.088231 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:17.088327 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:17.088477 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:17.088565 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:17.088660 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:17.088746 epoll_wait(3, {}, 256, 0) = 0
[pid 18350] 17:40:17.088833 epoll_wait(3, {}, 256, 0) = 0

This seems to be, at least partly, intentional – erts/emulator/beam/erl_process.c
contains a constant named "ERTS_SCHED_SYS_SLEEP_SPINCOUNT" which is set to 10.

Can anyone tell me what the rationale behind this excessive busylooping is? A few
dozen syscalls per second for nothing seems a bit weird to me.
signature.asc

Benoit Chesneau

unread,
May 22, 2013, 9:46:17 AM5/22/13
to Jann Horn, erlang-q...@erlang.org
How would it work to receive an event to wake up if doesn't listen on them?

- benoit
> _______________________________________________
> erlang-questions mailing list
> erlang-q...@erlang.org
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Jann Horn

unread,
May 22, 2013, 9:56:38 AM5/22/13
to Benoit Chesneau, erlang-q...@erlang.org
On Wed, May 22, 2013 at 03:46:17PM +0200, Benoit Chesneau wrote:
> How would it work to receive an event to wake up if doesn't listen on them?

Uh... register all fds you're interested in using epoll, then do
epoll_wait(fd, events, maxevents, -1)? Isn't that actually quite
normal? The syscall will return as soon as something happens, but
not earlier.

You don't have to poll different event sources, there
are many facilities that can multiplex those event sources for
you and will wake you up as soon as something interesting happens.
E.g. select, poll, epoll, kqueue and event ports (the last two aren't
available on linux). I can't imagine a reason why you'd have to
poll events like this in any OS.
signature.asc

Lukas Larsson

unread,
May 22, 2013, 12:42:59 PM5/22/13
to Jann Horn, erlang-q...@erlang.org
You have to poll like that because fd's are not the only thing which can trigger load on the system. Timeouts for instance are triggered by calling gettimeofday which means you have to break out of epoll_wait before the next timeout happens. Also the spinning is done to make the system respond faster to events by delaying sleeping in the kernel.

Lukas

Lukas Larsson

unread,
May 23, 2013, 5:13:15 AM5/23/13
to Jann Horn, erlang-q...@erlang.org

On Thu, May 23, 2013 at 10:56 AM, Jann Horn <ja...@thejh.net> wrote:
On Wed, May 22, 2013 at 06:42:59PM +0200, Lukas Larsson wrote:
> You have to poll like that because fd's are not the only thing which can
> trigger load on the system. Timeouts for instance are triggered by calling
> gettimeofday which means you have to break out of epoll_wait before the
> next timeout happens.

Why not by specifying a timeout in the epoll_wait call?
 
If you look closely to the strace you can see that for every n:th call there is a small timeout given to epoll_wait. This timeout is calculated by looking at the next timeout and a number of other factors.



> Also the spinning is done to make the system respond
> faster to events by delaying sleeping in the kernel.

Ah, ok... and that brings a performance gain?

It brings a latency gain, which in turn can bring a performance gain. Sleeping in the kernel is a (relatively) expensive thing to do and by spinning before sleeping the schedulers stay more responsive. You can configure this behaviour through the runtime flags +sbwt, +sws, +swt. See http://www.erlang.org/doc/man/erl.html for some more details.

Jann Horn

unread,
May 23, 2013, 4:56:56 AM5/23/13
to Lukas Larsson, erlang-q...@erlang.org
On Wed, May 22, 2013 at 06:42:59PM +0200, Lukas Larsson wrote:
> You have to poll like that because fd's are not the only thing which can
> trigger load on the system. Timeouts for instance are triggered by calling
> gettimeofday which means you have to break out of epoll_wait before the
> next timeout happens.

Why not by specifying a timeout in the epoll_wait call?


> Also the spinning is done to make the system respond
> faster to events by delaying sleeping in the kernel.

Ah, ok... and that brings a performance gain?


signature.asc

Lukas Larsson

unread,
May 27, 2013, 4:34:07 AM5/27/13
to Jann Horn, Erlang Questions

On Fri, May 24, 2013 at 4:41 PM, Jann Horn <ja...@thejh.net> wrote:
On Thu, May 23, 2013 at 11:13:15AM +0200, Lukas Larsson wrote:
> On Thu, May 23, 2013 at 10:56 AM, Jann Horn <ja...@thejh.net> wrote:
>
> > On Wed, May 22, 2013 at 06:42:59PM +0200, Lukas Larsson wrote:
> > > You have to poll like that because fd's are not the only thing which can
> > > trigger load on the system. Timeouts for instance are triggered by
> > calling
> > > gettimeofday which means you have to break out of epoll_wait before the
> > > next timeout happens.
> >
> > Why not by specifying a timeout in the epoll_wait call?
> >
>
> If you look closely to the strace you can see that for every n:th call
> there is a small timeout given to epoll_wait. This timeout is calculated by
> looking at the next timeout and a number of other factors.

Does that mean that the couchdb code has set some timer that fires with a high frequency?
Because even when epoll_wait is called with a timeout, the timeout is below one second.


It could mean that there is a timer, there is also a bunch of other internal emulator events which have to be taken care of which could cause the low timeout.
 

> >
> > > Also the spinning is done to make the system respond
> > > faster to events by delaying sleeping in the kernel.
> >
> > Ah, ok... and that brings a performance gain?
> >
>
> It brings a latency gain, which in turn can bring a performance gain.
> Sleeping in the kernel is a (relatively) expensive thing to do and by
> spinning before sleeping the schedulers stay more responsive. You can
> configure this behaviour through the runtime flags +sbwt, +sws, +swt. See
> http://www.erlang.org/doc/man/erl.html for some more details.

But e.g. on a single-core machine, spinning before sleeping makes no sense, right?
So for single-core machines (VMs/mobile devices), this should be turned off?
 
I'd assume so, but I haven't made any measurements for it.


Btw, what's so expensive about sleeping in the kernel? The context switches?
Trashing the cache?
It varies depending on OS and HW combination, but those are the main things.
Reply all
Reply to author
Forward
0 new messages