Node 3.2-pre process.nextTick might be delayed a few seconds due to event polling

21 views
Skip to first unread message

Garo

unread,
Dec 14, 2010, 6:33:31 AM12/14/10
to nodejs
I've have been testing our software with latests node from git
upstream and I've noticed a strange bug (feature?) where a callback
pushed to process.nextTick is delayed few seconds on situation where
the program has nothing to do.

Examination with gdb shows that the program spends the extra delay
inside the libev event loop polling and my theory is that the event
loop goes into the event poll to wait for new async events to come,
but doesn't check before if it has something to execute in the
process.nextTick() queue. In practive I've observed polling times from
0.7 second to almost four seconds, which I feel is very unacceptable
delay to wait for the process.nextTick queue to be processed in
situations where the program has nothing else to do. The
process.nextTick() queue is emptied after the poll has ended, so this
problem might not appear in practice in production environments where
the process is getting constant stream of new events.

I tried to dig into the libev code to see if there would be any
obvious bugs or hooks where this could be fixed, but I'm not a libev
expert so could just verify my theory on the problem. Is this a bug,
or a feature which hasn't been considered yet?

Example GDB stacktrace on linux.
(gdb) where
#0 0x00002aaaac50c018 in epoll_wait () from /lib64/libc.so.6
#1 0x000000000050a2eb in epoll_poll ()
#2 0x000000000050b96a in ev_run ()
#3 0x00000000004df207 in node::Start ()
#4 0x00002aaaac455994 in __libc_start_main () from /lib64/libc.so.6
#5 0x00000000004dc409 in _start ()
(gdb) quit

Example GDB stacktrace on mac.
0x00007fff8851516a in kevent ()
(gdb) where
#0 0x00007fff8851516a in kevent ()
#1 0x000000010005a065 in kqueue_poll (timeout=3.5926599502563477) at
ev_kqueue.c:101
#2 0x000000010005c4c9 in ev_run (flags=0) at ../deps/libev/ev.c:2414
#3 0x0000000100011bc3 in ev_loop (flags=0) at ev.h:782
#4 0x0000000100016a9f in node::Start (argc=2, argv=0x7fff5fbff9c8)
at ../src/node.cc:2008
#5 0x000000010001127f in main (argc=2, argv=0x7fff5fbff9c8) at ../src/
node_main.cc:6

- Juho Mäkinen

Jorge

unread,
Dec 14, 2010, 10:45:23 AM12/14/10
to nod...@googlegroups.com

Yes. It's a bug. This demonstrates the problem very well: https://gist.github.com/740593

$ node lazyEventLoop.js
-> ctr: 6, t: 1998ms -> 0.00KHz
--
Jorge.

Garo

unread,
Dec 14, 2010, 4:23:24 PM12/14/10
to nodejs
Thanks for good source which shows the bug. I've filled issue to
github: https://github.com/ry/node/issues/issue/511

I've also verified that the bug doesn't exists in 0.2.5 but exists in
nodejs git upstream.

- Garo

Jorge

unread,
Dec 15, 2010, 4:45:29 AM12/15/10
to nod...@googlegroups.com
On 14/12/2010, at 22:23, Garo wrote:
> On Dec 14, 5:45 pm, Jorge <jo...@jorgechamorro.com> wrote:
>> Yes. It's a bug. This demonstrates the problem very well:https://gist.github.com/740593
>>
>> $ node lazyEventLoop.js
>> -> ctr: 6, t: 1998ms -> 0.00KHz
>
> Thanks for good source which shows the bug. I've filled issue to
> github: https://github.com/ry/node/issues/issue/511
>
> I've also verified that the bug doesn't exists in 0.2.5 but exists in
> nodejs git upstream.


I think the problem might be in the timers module.
Does your code have any setTimeouts/setIntervals ?
If so, if you run your code without any timers (just temporarily disabled, by commenting them out, perhaps) does the problem disappear ?
--
Jorge.

Garo

unread,
Dec 15, 2010, 4:53:45 AM12/15/10
to nodejs, r...@tinyclouds.org
It seems that Ryan's commit 241ea7e5954e4a049af85cd8d51b84202cf0ea3a,
named "Simplify next tick logic by looping around ev_loop" caused this
problem.

I removed that patch from my node working copy and it fixed the
problem.

- Garo

Jorge

unread,
Dec 15, 2010, 6:07:18 AM12/15/10
to nod...@googlegroups.com
On 15/12/2010, at 10:53, Garo wrote:
> It seems that Ryan's commit 241ea7e5954e4a049af85cd8d51b84202cf0ea3a,
> named "Simplify next tick logic by looping around ev_loop" caused this
> problem.
>
> I removed that patch from my node working copy and it fixed the
> problem.

Cool. I'd like to remove it too, could you tell me the necessary git trickery :-) ?

TIA,
--
Jorge.

Jorge Chamorro

unread,
Dec 15, 2010, 6:18:40 AM12/15/10
to nod...@googlegroups.com


A git revert 241ea7e5954e4a049af85cd8d51b84202cf0ea3a , is it ?
--
Jorge.

Garo

unread,
Dec 15, 2010, 6:27:04 AM12/15/10
to nodejs
I wish I could. I found it easier to just manually write back the code
which was removed than to dig around git manuals. The patch is simple,
just redo the few lines of code which were altered in src/node.cc.

I think that just "git revert <commit>" doesn't remove a single
commit, but instead it reverts all history from HEAD to that.

- Garo

Jorge

unread,
Dec 15, 2010, 6:48:40 AM12/15/10
to nod...@googlegroups.com
On 15/12/2010, at 12:27, Garo wrote:
> On Dec 15, 1:18 pm, Jorge Chamorro <jo...@jorgechamorro.com> wrote:
>> On 15/12/2010, at 12:07, Jorge wrote:
>>>> I removed that patch from my node working copy and it fixed the
>>>> problem.
>>
>>> Cool. I'd like to remove it too, could you tell me the necessary git trickery :-) ?
>>
>> A git revert 241ea7e5954e4a049af85cd8d51b84202cf0ea3a , is it ?
>
> I wish I could. I found it easier to just manually write back the code
> which was removed than to dig around git manuals. The patch is simple,
> just redo the few lines of code which were altered in src/node.cc.
>
> I think that just "git revert <commit>" doesn't remove a single
> commit, but instead it reverts all history from HEAD to that.

It has created a new commit to undo the given one:
http://www.kernel.org/pub/software/scm/git/docs/git-revert.html

And it works fine, now. Thanks again !
--
Jorge.

Garo

unread,
Dec 15, 2010, 8:21:17 AM12/15/10
to nodejs
We switched our production (yes, we run 3.2-pre in production) to use
this patch and we saw the request latency drop from ~45ms to about
22ms.

- Garo
Reply all
Reply to author
Forward
0 new messages