I have a C library and I am writing a non-blocking wrapper using Node JS. I
am using uv_work_queue to call C library function. Workflow will be like,
1- First function gets called on main thread. This one sets up a WorkerData
(a struct used for passing data to thread) instance with all necessary
information required to do the job. This object is wrapped inside uv_work_t
instance's "data" field.
2 - Calls uv_queue_work (uv_default_loop(), ... )
3 - Each worker thread will acquire a mutex lock, get a work handle (C
library requires this handle for every call) from the queue (std::queue)
and removes the handle from queue. It releases the mutex after all these.
If no handles are available in the queue, a new one will be created.
4 - C library will be called with the above handle. Once the work is done,
acquire mutex lock and the handle will be added back to the queue.
I wrote some JS code to test the above functionality. It will be something
like,
for(i = 0; i < 100; i ++)
{
mylib.process ("test", function (err, result) {
});
Interestingly, this fails with random segmentation faults. I have seen some
segfaults at accessing the queue even though all the operations that modify
the queue are in critical section. I have also got assertion failures
inside libuv. node: ../deps/uv/src/unix/stream.c:489: uv__write: Assertion
`n == 0' failed.
Some observations:
* When I reduce the loop iterations to 20, all works good.
* When I add a delay in between calls, it works.
I am wondering why this happens? It looks like when in a tight loop,
mutexes are not working correctly and allowing multiple threads to get into
the critical section. When these segfaults happening, I have seen queue's
size() returns some random values which is a clear indication of mutexes
failing.
> I have a C library and I am writing a non-blocking wrapper using Node JS.
> I am using uv_work_queue to call C library function. Workflow will be like,
> 1- First function gets called on main thread. This one sets up a
> WorkerData (a struct used for passing data to thread) instance with all
> necessary information required to do the job. This object is wrapped inside
> uv_work_t instance's "data" field.
> 2 - Calls uv_queue_work (uv_default_loop(), ... )
> 3 - Each worker thread will acquire a mutex lock, get a work handle (C
> library requires this handle for every call) from the queue (std::queue)
> and removes the handle from queue. It releases the mutex after all these.
> If no handles are available in the queue, a new one will be created.
> 4 - C library will be called with the above handle. Once the work is done,
> acquire mutex lock and the handle will be added back to the queue.
> I wrote some JS code to test the above functionality. It will be something
> like,
> for(i = 0; i < 100; i ++)
> {
> mylib.process ("test", function (err, result) {
> });
> }
> Interestingly, this fails with random segmentation faults. I have seen
> some segfaults at accessing the queue even though all the operations that
> modify the queue are in critical section. I have also got assertion
> failures inside libuv. node: ../deps/uv/src/unix/stream.c:489: uv__write:
> Assertion `n == 0' failed.
> Some observations:
> * When I reduce the loop iterations to 20, all works good.
> * When I add a delay in between calls, it works.
> I am wondering why this happens? It looks like when in a tight loop,
> mutexes are not working correctly and allowing multiple threads to get into
> the critical section. When these segfaults happening, I have seen queue's
> size() returns some random values which is a clear indication of mutexes
> failing.
However it's hard to tell what is exactly happen no your side, can you post
backtraces or core dumps there? (`gdb --args node test.js` then `run` and
`bt` once segfaulted).
Also I've noticed that you ain't destroying mutex there:
https://github.com/navaneeth/libvarnam-nodejs/blob/master/src/varnamj....
I really recommend you to fix this, because it may be possible that
your
object gets garbage collected when a worker thread is running. If this is
the case - you can fix it by adding Ref() call before starting work request
and Unref() once it was completed.
Cheers,
Fedor.
On Sat, Sep 1, 2012 at 10:31 PM, Navaneeth.K.N <navaneet...@gmail.com>wrote:
> I have a C library and I am writing a non-blocking wrapper using Node JS.
> I am using uv_work_queue to call C library function. Workflow will be like,
> 1- First function gets called on main thread. This one sets up a
> WorkerData (a struct used for passing data to thread) instance with all
> necessary information required to do the job. This object is wrapped inside
> uv_work_t instance's "data" field.
> 2 - Calls uv_queue_work (uv_default_loop(), ... )
> 3 - Each worker thread will acquire a mutex lock, get a work handle (C
> library requires this handle for every call) from the queue (std::queue)
> and removes the handle from queue. It releases the mutex after all these.
> If no handles are available in the queue, a new one will be created.
> 4 - C library will be called with the above handle. Once the work is done,
> acquire mutex lock and the handle will be added back to the queue.
> I wrote some JS code to test the above functionality. It will be something
> like,
> for(i = 0; i < 100; i ++)
> {
> mylib.process ("test", function (err, result) {
> });
> }
> Interestingly, this fails with random segmentation faults. I have seen
> some segfaults at accessing the queue even though all the operations that
> modify the queue are in critical section. I have also got assertion
> failures inside libuv. node: ../deps/uv/src/unix/stream.c:489: uv__write:
> Assertion `n == 0' failed.
> Some observations:
> * When I reduce the loop iterations to 20, all works good.
> * When I add a delay in between calls, it works.
> I am wondering why this happens? It looks like when in a tight loop,
> mutexes are not working correctly and allowing multiple threads to get into
> the critical section. When these segfaults happening, I have seen queue's
> size() returns some random values which is a clear indication of mutexes
> failing.
> However it's hard to tell what is exactly happen no your side, can you > post backtraces or core dumps there? (`gdb --args node test.js` then `run` > and `bt` once segfaulted).
at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../include/c++/4.7.1/b its/stl_deque.h:1376 #2 0x00007ffff6164587 in std::queue<varnam*, std::deque<varnam*, std::allocator<varnam*> > >::push (this=0xc93e10, __x=@0x7ffff7f0cd00: 0x7fffec0008c0) at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../include/c++/4.7.1/b its/stl_queue.h:212 #3 0x00007ffff61619ad in Varnam::ReturnHandle (this=0xc93dd0, handle=0x7fffec0008c0) at ../src/varnamjs.cc:152 #4 0x00007ffff61613d2 in perform_transliteration_async (req=0xcda2e0) at ../src/varnamjs.cc:62 #5 0x000000000059980c in ?? () #6 0x00007ffff6935e0f in start_thread () from /usr/lib/libpthread.so.0 #7 0x00007ffff666d45d in clone () from /usr/lib/libc.so.6
Below one is from my C library. It is an assertion failure and happens because std::queue::front returned me an invalid pointer. In this situation, queue's size will be a random value. I believe, this is also because multiple threads modifying the queue.
(gdb) bt #0 0x00007ffff65bbfa5 in raise () from /usr/lib/libc.so.6 #1 0x00007ffff65bd428 in abort () from /usr/lib/libc.so.6 #2 0x00007ffff65b5002 in __assert_fail_base () from /usr/lib/libc.so.6 #3 0x00007ffff65b50b2 in __assert_fail () from /usr/lib/libc.so.6 #4 0x00007ffff5e8dca1 in reset_pool (handle=0xc4b000) at /home/nkn/open_source/varnam/libvarnam/varray.c:284 #5 0x00007ffff5e88f5c in varnam_transliterate (handle=0xc4b000, input=0xcda808 "test 40", output=0x7ffff7f4dd38) at /home/nkn/open_source/varnam/libvarnam/transliterate.c:71 #6 0x00007ffff61612f5 in perform_transliteration_async (req=0xcda760) at ../src/varnamjs.cc:49 #7 0x000000000059980c in ?? () #8 0x00007ffff6935e0f in start_thread () from /usr/lib/libpthread.so.0 #9 0x00007ffff666d45d in clone () from /usr/lib/libc.so.6
Here is one from libuv
Program received signal SIGSEGV, Segmentation fault. 0x0000000000597e10 in uv__io_feed () (gdb) bt #0 0x0000000000597e10 in uv__io_feed () #1 0x00000000005a63df in ?? () #2 0x00000000005a6c0f in uv_write2 () #3 0x00000000005674f1 in v8::Handle<v8::Value> node::StreamWrap::WriteStringImpl<(node::WriteEncoding)1>(v8::Arguments const&) () #4 0x0000000000566d49 in node::StreamWrap::WriteUtf8String(v8::Arguments const&) () #5 0x00001f4e0b5a8cb7 in ?? () #6 0x00007fffffffd728 in ?? () #7 0x00007fffffffd730 in ?? () #8 0x0000000000000001 in ?? () #9 0x0000000000000000 in ?? ()
> Also I've noticed that you ain't destroying mutex there: > https://github.com/navaneeth/libvarnam-nodejs/blob/master/src/varnamj.... I really recommend you to fix this, because it may be possible that your > object gets garbage collected when a worker thread is running. If this is > the case - you can fix it by adding Ref() call before starting work request > and Unref() once it was completed.
Please download it and run it. If you run it without any modification, you should see the problems. Currently loop is iterating till 1000 and it is failing for me. In my friend's computer it was working fine for 1000 iterations. He modified it to a larger value to get the failure. So you might need to change that number to get a failure.
Here is what the code does.
* Calls "process" function 1000 times from JS. * "process" function UnWraps the class instance from V8 and assign that to the workerdata instance. * Call uv_queue_work. * Worker thread will read a handle from the queue after acquiring a mutex. This handle will be removed from the queue. * After doing the work, handle will be returned back to the queue. This will also happen after acquiring a mutex.
I got double free errors and segfault. But this is random and sometimes the whole thing just works. In the "test.js", I have commented out calls to "sleep". Uncommenting this seems to fix the issue.
I am wondering why this happens. Any help would be great.
> Please download it and run it. If you run it without any modification, you
> should see the problems. Currently loop is iterating till 1000 and it is
> failing for me. In my friend's computer it was working fine for 1000
> iterations. He modified it to a larger value to get the failure. So you
> might need to change that number to get a failure.
> Here is what the code does.
> * Calls "process" function 1000 times from JS.
> * "process" function UnWraps the class instance from V8 and assign that to
> the workerdata instance.
> * Call uv_queue_work.
> * Worker thread will read a handle from the queue after acquiring a mutex.
> This handle will be removed from the queue.
> * After doing the work, handle will be returned back to the queue. This will
> also happen after acquiring a mutex.
> I got double free errors and segfault. But this is random and sometimes the
> whole thing just works. In the "test.js", I have commented out calls to
> "sleep". Uncommenting this seems to fix the issue.
> I am wondering why this happens. Any help would be great.
> I am using Node v0.8.8 on Arch Linux.
It's a thread safety issue but it's got nothing to do with mutexes:
the JS object gets garbage collected while the work request is
running. You need something like the patch below to make it safe.
> > Please download it and run it. If you run it without any modification,
> you
> > should see the problems. Currently loop is iterating till 1000 and it is
> > failing for me. In my friend's computer it was working fine for 1000
> > iterations. He modified it to a larger value to get the failure. So you
> > might need to change that number to get a failure.
> > Here is what the code does.
> > * Calls "process" function 1000 times from JS.
> > * "process" function UnWraps the class instance from V8 and assign that
> to
> > the workerdata instance.
> > * Call uv_queue_work.
> > * Worker thread will read a handle from the queue after acquiring a
> mutex.
> > This handle will be removed from the queue.
> > * After doing the work, handle will be returned back to the queue. This
> will
> > also happen after acquiring a mutex.
> > I got double free errors and segfault. But this is random and sometimes
> the
> > whole thing just works. In the "test.js", I have commented out calls to
> > "sleep". Uncommenting this seems to fix the issue.
> > I am wondering why this happens. Any help would be great.
> > I am using Node v0.8.8 on Arch Linux.
> It's a thread safety issue but it's got nothing to do with mutexes:
> the JS object gets garbage collected while the work request is
> running. You need something like the patch below to make it safe.
On Sunday, 2 September 2012 16:51:17 UTC+5:30, Ben Noordhuis wrote:
> It's a thread safety issue but it's got nothing to do with mutexes: > the JS object gets garbage collected while the work request is > running. You need something like the patch below to make it safe.
On Mon, Sep 3, 2012 at 4:44 AM, Navaneeth KN <navaneet...@gmail.com> wrote:
> This makes perfect sense and it works well. But I am wondering why
> everything was working when I add a delay ?
Pure luck. It would've crashed eventually, you were writing to
deallocated memory.