uv_mutex_lock seems to be failing in tight loop

350 views
Skip to first unread message

Navaneeth.K.N

unread,
Sep 1, 2012, 2:31:45 PM9/1/12
to nod...@googlegroups.com
Hello,

I have a C library and I am writing a non-blocking wrapper using Node JS. I am using uv_work_queue to call C library function. Workflow will be like,

1- First function gets called on main thread. This one sets up a WorkerData (a struct used for passing data to thread) instance with all necessary information required to do the job. This object is wrapped inside uv_work_t instance's "data" field.
2 - Calls uv_queue_work (uv_default_loop(), ... )
3 - Each worker thread will acquire a mutex lock, get a work handle (C library requires this handle for every call) from the queue (std::queue) and removes the handle from queue. It releases the mutex after all these. If no handles are available in the queue, a new one will be created.
4 - C library will be called with the above handle. Once the work is done, acquire mutex lock and the handle will be added back to the queue.

I wrote some JS code to test the above functionality. It will be something like,

for(i = 0; i < 100; i ++)
{
    mylib.process ("test", function (err, result)  {
    });
}

actual code is available here - https://github.com/navaneeth/libvarnam-nodejs/blob/master/test.js

Interestingly, this fails with random segmentation faults. I have seen some segfaults at accessing the queue even though all the operations that modify the queue are in critical section. I have also got assertion failures inside libuv. node: ../deps/uv/src/unix/stream.c:489: uv__write: Assertion `n == 0' failed.

Some observations:

* When I reduce the loop iterations to 20, all works good.
* When I add a delay in between calls, it works.

I am wondering why this happens? It looks like when in a tight loop, mutexes are not working correctly and allowing multiple threads to get into the critical section. When these segfaults happening, I have seen queue's size() returns some random values which is a clear indication of mutexes failing.

Code is avilable here - https://github.com/navaneeth/libvarnam-nodejs/blob/master/src/varnamjs.cc. Varnam::Transliterate() is the entry point.

Any help would be great!

--
Thanks
Navaneeth

Shripad K

unread,
Sep 1, 2012, 3:05:31 PM9/1/12
to nod...@googlegroups.com
What version of Node are you on?



--
Thanks
Navaneeth

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nod...@googlegroups.com
To unsubscribe from this group, send email to
nodejs+un...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Fedor Indutny

unread,
Sep 1, 2012, 3:08:44 PM9/1/12
to nod...@googlegroups.com
Hi,

First of all, on what platform are you seeing this SEGFAULTs?

Libuv is using posix threads for unixes ( https://github.com/joyent/libuv/blob/master/src/unix/thread.c#L70-73 ), and CriticalSection on windows ( https://github.com/joyent/libuv/blob/master/src/win/thread.c#L129-131 ). So if you suppose that failure is related to those routines - you're probably experiencing a kernel bug.

However it's hard to tell what is exactly happen no your side, can you post backtraces or core dumps there? (`gdb --args node test.js` then `run` and `bt` once segfaulted).

Also I've noticed that you ain't destroying mutex there: https://github.com/navaneeth/libvarnam-nodejs/blob/master/src/varnamjs.h#L24 . I really recommend you to fix this, because it may be possible that your object gets garbage collected when a worker thread is running. If this is the case - you can fix it by adding Ref() call before starting work request and Unref() once it was completed.

Cheers,
Fedor.





--
Thanks
Navaneeth

--

Navaneeth KN

unread,
Sep 1, 2012, 11:39:32 PM9/1/12
to nod...@googlegroups.com, fe...@indutny.com
Hello,

Thanks for the help. I am using node v0.8.8.


On Sunday, 2 September 2012 00:39:14 UTC+5:30, Fedor Indutny wrote:
Hi,

First of all, on what platform are you seeing this SEGFAULTs?

I am using Arch Linux and compiling the node module with GCC.
 

Libuv is using posix threads for unixes ( https://github.com/joyent/libuv/blob/master/src/unix/thread.c#L70-73 ), and CriticalSection on windows ( https://github.com/joyent/libuv/blob/master/src/win/thread.c#L129-131 ). So if you suppose that failure is related to those routines - you're probably experiencing a kernel bug.

However it's hard to tell what is exactly happen no your side, can you post backtraces or core dumps there? (`gdb --args node test.js` then `run` and `bt` once segfaulted).

I get different errors for each run.

The below backtrace happens becuase muliple threads are trying to insert into queue at same time. You can see the ReturnHandle function (https://github.com/navaneeth/libvarnam-nodejs/blob/master/src/varnamjs.cc#L148) uses mutexes before modifying std::queue.

(gdb) bt
#0  0x00007ffff61660a9 in __gnu_cxx::new_allocator<varnam*>::construct (this=0xc93e10, __p=0x8, __val=@0x7ffff7f0cd00: 0x7fffec0008c0)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../include/c++/4.7.1/ext/new_allocator.h:120
#1  0x00007ffff6165684 in std::deque<varnam*, std::allocator<varnam*> >::push_back (this=0xc93e10, __x=@0x7ffff7f0cd00: 0x7fffec0008c0)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../include/c++/4.7.1/bits/stl_deque.h:1376
#2  0x00007ffff6164587 in std::queue<varnam*, std::deque<varnam*, std::allocator<varnam*> > >::push (this=0xc93e10, __x=@0x7ffff7f0cd00: 0x7fffec0008c0)
    at /usr/lib/gcc/x86_64-unknown-linux-gnu/4.7.1/../../../../include/c++/4.7.1/bits/stl_queue.h:212
#3  0x00007ffff61619ad in Varnam::ReturnHandle (this=0xc93dd0, handle=0x7fffec0008c0) at ../src/varnamjs.cc:152
#4  0x00007ffff61613d2 in perform_transliteration_async (req=0xcda2e0) at ../src/varnamjs.cc:62
#5  0x000000000059980c in ?? ()
#6  0x00007ffff6935e0f in start_thread () from /usr/lib/libpthread.so.0
#7  0x00007ffff666d45d in clone () from /usr/lib/libc.so.6

Below one is from my C library. It is an assertion failure and happens because std::queue::front returned me an invalid pointer. In this situation, queue's size will be a random value. I believe, this is also because multiple threads modifying the queue.

(gdb) bt
#0  0x00007ffff65bbfa5 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff65bd428 in abort () from /usr/lib/libc.so.6
#2  0x00007ffff65b5002 in __assert_fail_base () from /usr/lib/libc.so.6
#3  0x00007ffff65b50b2 in __assert_fail () from /usr/lib/libc.so.6
#4  0x00007ffff5e8dca1 in reset_pool (handle=0xc4b000) at /home/nkn/open_source/varnam/libvarnam/varray.c:284
#5  0x00007ffff5e88f5c in varnam_transliterate (handle=0xc4b000, input=0xcda808 "test 40", output=0x7ffff7f4dd38)
    at /home/nkn/open_source/varnam/libvarnam/transliterate.c:71
#6  0x00007ffff61612f5 in perform_transliteration_async (req=0xcda760) at ../src/varnamjs.cc:49
#7  0x000000000059980c in ?? ()
#8  0x00007ffff6935e0f in start_thread () from /usr/lib/libpthread.so.0
#9  0x00007ffff666d45d in clone () from /usr/lib/libc.so.6

Here is one from libuv

Program received signal SIGSEGV, Segmentation fault.
0x0000000000597e10 in uv__io_feed ()
(gdb) bt
#0  0x0000000000597e10 in uv__io_feed ()
#1  0x00000000005a63df in ?? ()
#2  0x00000000005a6c0f in uv_write2 ()
#3  0x00000000005674f1 in v8::Handle<v8::Value> node::StreamWrap::WriteStringImpl<(node::WriteEncoding)1>(v8::Arguments const&) ()
#4  0x0000000000566d49 in node::StreamWrap::WriteUtf8String(v8::Arguments const&) ()
#5  0x00001f4e0b5a8cb7 in ?? ()
#6  0x00007fffffffd728 in ?? ()
#7  0x00007fffffffd730 in ?? ()
#8  0x0000000000000001 in ?? ()
#9  0x0000000000000000 in ?? ()
 

Also I've noticed that you ain't destroying mutex there: https://github.com/navaneeth/libvarnam-nodejs/blob/master/src/varnamjs.h#L24 . I really recommend you to fix this, because it may be possible that your object gets garbage collected when a worker thread is running. If this is the case - you can fix it by adding Ref() call before starting work request and Unref() once it was completed.

Thanks for the suggestion. I am keeping my object in V8 (https://github.com/navaneeth/libvarnam-nodejs/blob/master/src/varnamjs.cc#L234) when New() gets called. All the subsequent function calls will UnWrap() (https://github.com/navaneeth/libvarnam-nodejs/blob/master/src/varnamjs.cc#L266) the object and pass it to worker thread. Is UnWrap() reducing the reference count and making the object ready to collect? Also on what object should I call Ref() and Unref()?

Thanks.
Navaneeth

Navaneeth KN

unread,
Sep 2, 2012, 2:34:44 AM9/2/12
to nod...@googlegroups.com, fe...@indutny.com
Hello,

I have created a sample code that reporduces the problem.

Code - https://gist.github.com/3595299

Please download it and run it. If you run it without any modification, you should see the problems. Currently loop is iterating till 1000 and it is failing for me. In my friend's computer it was working fine for 1000 iterations. He modified it to a larger value to get the failure. So you might need to change that number to get a failure.

Here is what the code does.

* Calls "process" function 1000 times from JS.
* "process" function UnWraps the class instance from V8 and assign that to the workerdata instance.
* Call uv_queue_work.
* Worker thread will read a handle from the queue after acquiring a mutex. This handle will be removed from the queue.
* After doing the work, handle will be returned back to the queue. This will also happen after acquiring a mutex.

I got double free errors and segfault. But this is random and sometimes the whole thing just works. In the "test.js", I have commented out calls to "sleep". Uncommenting this seems to fix the issue.

I am wondering why this happens. Any help would be great.

I am using Node v0.8.8 on Arch Linux.

Thanks
Navaneeth

Ben Noordhuis

unread,
Sep 2, 2012, 7:21:06 AM9/2/12
to nod...@googlegroups.com
It's a thread safety issue but it's got nothing to do with mutexes:
the JS object gets garbage collected while the work request is
running. You need something like the patch below to make it safe.

diff --git a/mutex-test.cc b/mutex-test.cc
index 3953a4d..3605612 100644
--- a/mutex-test.cc
+++ b/mutex-test.cc
@@ -39,6 +39,8 @@ public:

NativeLibrary* GetHandle();
void ReturnHandle(NativeLibrary* handle);
+ void Ref() { node::ObjectWrap::Ref(); }
+ void Unref() { node::ObjectWrap::Unref(); }

private:
Foo()
@@ -142,6 +144,7 @@ void after_work(uv_work_t *req)
data->callback->Call(Context::GetCurrent()->Global(), 2, argv);
}

+ data->clazz->Unref();
data->callback.Dispose();
delete data;
data = NULL;
@@ -184,6 +187,7 @@ Handle<Value> Foo::Process(const Arguments& args)
data->request.data = data;
data->callback = Persistent<Function>::New(Local<Function>::Cast(args[1]));
data->clazz = ObjectWrap::Unwrap<Foo>(args.This());
+ data->clazz->Ref();

uv_queue_work(uv_default_loop(), &data->request,
perform_work_async, after_work);

On a side note, Foo::~Foo() needs to call uv_mutex_destroy().

Fedor Indutny

unread,
Sep 2, 2012, 7:32:04 AM9/2/12
to nod...@googlegroups.com
I told ya! :)

Cheers,
Fedor.



Navaneeth KN

unread,
Sep 2, 2012, 10:44:39 PM9/2/12
to nod...@googlegroups.com
Hello Ben & Fedor,

Thanks. It worked well.

This makes perfect sense and it works well. But I am wondering why everything was working when I add a delay ?

Thanks for the help.

Navaneeth

Ben Noordhuis

unread,
Sep 3, 2012, 6:49:10 AM9/3/12
to nod...@googlegroups.com
On Mon, Sep 3, 2012 at 4:44 AM, Navaneeth KN <navan...@gmail.com> wrote:
> This makes perfect sense and it works well. But I am wondering why
> everything was working when I add a delay ?

Pure luck. It would've crashed eventually, you were writing to
deallocated memory.
Reply all
Reply to author
Forward
0 new messages