Promise.wait broken inside "connect" event handler (testcase attached)

248 views

Skip to first unread message

Michael Carter

unread,

Feb 14, 2010, 6:48:15 PM2/14/10

to nod...@googlegroups.com

Hello,

I understand the limitations of Promise.wait, and I've seen previous bug reports. Beyond the fact that using promise.wait doesn't provide true coros/fibers, there is, I think, a more fundamental bug: Calling wait in the context of certain event handlers, such as in the handler of a tcp connection's "connect" event, will cause the handler to refire.

I've attached the test case that should clearly demonstrate this problem.

As a side note, please don't remove the .wait() functionality without adding an alternative blocking file reading api -- we depend on it in the js.io project for our module loader.

Thanks,

Michael Carter

test_promise_wait_within_tcp_connect_event.js

Ryan Dahl

unread,

Feb 18, 2010, 5:13:09 PM2/18/10

to nod...@googlegroups.com

Hi Michael,

Sorry it took so long to get back to you. I will be removing wait() in
the next release of Node. It has already been removed from the
documentation.

When I initially added wait(), I naïvely thought that I had done a
simple implementation of coroutines. You quickly pointed out that this
was not the case -- that the most recently wait()ed promise must
complete before any others could. This is because wait() is
implemented as a call to ev_loop() - that is - it reenters the event
loop without returning from the function. This allows for execution to
halt on that function while the process remains non-blocking. However,
further events start at the top of the existing call stack. This leads
to bugs where a promise waits, then another promise starts waiting
before the first had completed, then yet another promise waits before
the second had completed (even if the first had) and so on - growing
the call stack ever larger.

A proper implementation of wait() necessitates true coroutines (maybe
'fibers' are a better name). Each time wait() is called a new call
stack will be allocated and the ev_loop() called in that stack. Any
new events would be executed on the second stack. When the first
promise completed, the process would context switch back to the first
stack and resume operation. Coroutines are much better than threads
because the user has control of the context switches. They don't need
to worry about non-atomic variable changes or the kernel scheduler
making bad decisions. But courtines do have some costs: context
switching and the memory for the call stacks ; unlike threads they are
not executed concurrently, so they don't make a process more parallel.

Since you pointed out the wait() bug, I've known I'd either eventually
need to remove wait() or re-implement it with coroutines. This week I
ran into a bug which has convinced me to remove it.

I have an AMQP parser which would make callbacks when it encountered
various pieces of data. I store the parser state inside the parser
object -- a natural thing to do. In one of these callbacks, wait() was
being called. During the wait(), the socket would receive more data,
try to execute the parser again, and get screwed up. The reason it got
screwed up was that the state referred to the time when the callback
was executed - which is not where the parser had resumed. The problem
is that by storing state inside my parser object, it broke
coroutine-safety. To solve this, I would need to "lock" the parser
somehow so that further executions could not be made until all its
callbacks had completed.

This sort of mental complication is exactly what I'm trying to avoid in Node.

wait(), even if implemented with coroutines, does not warent users to
need to think about coroutine-safety. In fact, because of the
increased context switches and more memory usage, wait will make
programs perform worse. The conclusion is clear. wait() must go.

> As a side note, please don't remove the .wait() functionality without adding
> an alternative blocking file reading api -- we depend on it in the js.io project
> for our module loader.

Node has a couple of truly synchronous routines for interacting with
the file system. They are in the 'fs' module and have the suffix
'Sync'.