Re: RRD 2.4 and SPSC queue: livelock

172 views
Skip to first unread message

Dmitriy V'jukov

unread,
May 24, 2012, 5:39:09 AM5/24/12
to Gavin Lambert, rel...@googlegroups.com
+rel...@googlegroups.com

On Thu, May 24, 2012 at 1:12 PM, Gavin Lambert <gavin....@compacsort.com> wrote:

Hi Dmitry,

 

I’ve recently been experimenting with lock-free queues in order to improve performance (currently my app has a ring-buffer-with-two-locks “MPMC” queue, which gets in trouble if there’s too much contention on one side or if it wants to resize the buffer).

 

I came across your website (which, incidentally, is awesome, as is RRD)


Thanks!
 

and I was having a play with your unbounded SPSC queue, specifically splitting it up into intrusive and non-intrusive variants (the latter using std::allocator).  I tried running the latter through the RRD and it ran ok on the default settings, but if I change the search_type to sched_full then it reports a livelock, which to my untrained eye looks like dequeue pre-empted enqueue to the point where it never got its value.

 

I tried the same thing on the spsc_queue example included in RRD and got a similar livelock (the iteration count and execution history are shorter on my lock vs. yours though (~2000 vs. ~8000 iterations, ~2000 vs. ~4000 history); not sure what that means).

 

The relevant snippet from the RRD output for my modified queue:

relacy_2_4\relacy\var.hpp(231) : [19] 0: <0003FE78> load, value=11

test_queue.cpp(45) : [20] 0: <003196A0> atomic store, value=00000000, (prev value=00000000), order=relaxed

test_queue.cpp(46) : [21] 0: <00317C88> atomic load, value=00317CA8, order=relaxed

test_queue.cpp(46) : [22] 0: <00317CA8> atomic store, value=003196A0, (prev value=00000000), order=release

test_queue.cpp(47) : [23] 0: <00317C88> atomic store, value=003196A0, (prev value=00317CA8), order=relaxed

relacy_2_4\relacy\context.hpp(457) : [24] 0: [THREAD FINISHED]

test_queue.cpp(52) : [25] 1: <00317C48> atomic load, value=00317CA8, order=relaxed

test_queue.cpp(52) : [26] 1: <00317CA8> atomic load, value=00000000 [NOT CURRENT], order=consume

test_queue.cpp(52) : [27] 1: <00317C48> atomic load, value=00317CA8, order=relaxed

test_queue.cpp(52) : [28] 1: <00317CA8> atomic load, value=00000000 [NOT CURRENT], order=consume

test_queue.cpp(52) : [29] 1: <00317C48> atomic load, value=00317CA8, order=relaxed

test_queue.cpp(52) : [30] 1: <00317CA8> atomic load, value=00000000 [NOT CURRENT], order=consume

(These last two loads just repeat several hundred times; it’s basically the dequeue code spinning getting the tail and tail->next.)

 

I tried adding a YieldProcessor() call in the spin loop inside the dequeue thread but it didn’t seem to change anything.  (Not sure if it’s supposed to, actually.)  But putting a Sleep(0) in there makes it get all the way through without errors.

 

Is this a real problem or is it spurious?  Is there another setting I need to tweak?  (I’m running it with VS2008.)

 

(Also, I added some code to RRD to report memory allocations/frees in the history, to make it easier to track down memory issues.  Unfortunately it doesn’t get the line context (it reports memory.hpp instead), but at least you can match up the timing to the address.  Let me know if you’re interested in a patch, though it was a pretty ugly hack.)

 



Hi Gavin,

Default random scheduler does not detect livelocks (because it is inherently fair).
The fact that example code livelocks is OK, examples should not be 100% correct, they are to demonstrate bug detection.

In Relacy I intercept some synchronization-related function in order to reason about program behavior, so I think I just intercepted Sleep() but not YieldProcessor(). So if the test passes with Sleep(), there is no problem on your side.

Regarding memory allocations, I am pretty sure than Relacy output memory allocations/frees in history. So I am not sure why you do not see them. Perhaps that's because of std::allocator. Relacy uses quite  hacky preprocessor-based interception, so it can miss some things, especially in std lib. Try to use plain malloc/free/new/delete in tests instead.

Best regards

Gavin Lambert

unread,
May 25, 2012, 2:11:21 AM5/25/12
to Dmitriy V'jukov, rel...@googlegroups.com
Quoth Dmitriy V'jukov:
> In Relacy I intercept some synchronization-related function in order to reason
> about program behavior, so I think I just intercepted Sleep() but not
> YieldProcessor(). So if the test passes with Sleep(), there is no problem on your side.

Cool. It definitely helped me to track down some allocation-related bugs in my code that would have been really hard to see otherwise.

I did have another odd case with a different queue type where it would complain about an uninitialized variable unless I used RL_FORCE_SEQ_CST. Sorted it out eventually by changing a few more memory_order_relaxeds into _consumes, though I'm still not entirely clear on how it managed to be uninitialized without that.

And if I try to use the sched_full search type with a test case that dares to enqueue two items instead of just one, it gives me a search space in the millions of millions of iterations. I don't think I have that kind of patience. :) (Still, it's passing with one item with sched_full, and with three items with sched_bound, so I'm guessing it's ok.)

> Regarding memory allocations, I am pretty sure than Relacy output memory
> allocations/frees in history. So I am not sure why you do not see them. Perhaps
> that's because of std::allocator. Relacy uses quite hacky preprocessor-based
> interception, so it can miss some things, especially in std lib. Try to use
> plain malloc/free/new/delete in tests instead.

Oh I see now, they're over in context.hpp; I was expecting them to be in memory.hpp.

In case anyone else is crazy enough to want to play with std::allocator with RRD, I submit the following:
(It does require using this allocator on your class in the test case, and adding $s into the code under test, but that seems consistent with existing functionality. You could remove the need to add the $s by adding extra overloads but then the traces would be less helpful, although it would allow some instrumentation of things like std::vector<>.)

template<typename T>
struct construct_event
{
rl::debug_info var_info_;
T const* var_addr_;
bool construct_;

void output(std::ostream& s) const
{
s << "<" << std::hex << var_addr_ << std::dec << "> "
<< (construct_ ? "construct" : "destruct") << " " << typeid(T).name();
}
};

template<typename T>
class relacy_allocator : public std::allocator<T>
{
public:
relacy_allocator() {}
relacy_allocator(const relacy_allocator&) {}
template<typename U> relacy_allocator(const relacy_allocator<U>&) {}
template<typename U> relacy_allocator& operator=(const relacy_allocator<U>&) { return *this; }

template<class U>
struct rebind
{ // convert a relacy_allocator<T> to a relacy_allocator<U>
typedef relacy_allocator<U> other;
};

pointer allocate(size_type count, rl::debug_info_param info)
{
return (pointer)rl::rl_malloc(count * sizeof(T), info);
}

pointer allocate(size_type count, const void *, rl::debug_info_param info)
{
return allocate(count, info);
}

void deallocate(pointer p, size_type, rl::debug_info_param info)
{
rl::rl_free(p, info);
}

void construct(pointer p, const_reference v, rl::debug_info_param info)
{
rl::context& c = rl::ctx();
RL_HIST(construct_event<T>) {RL_INFO, p, true} RL_HIST_END();
std::allocator<T>::construct(p, v);
}

void destroy(pointer p, rl::debug_info_param info)
{
rl::context& c = rl::ctx();
RL_HIST(construct_event<T>) {RL_INFO, p, false} RL_HIST_END();
std::allocator<T>::destroy(p);
}
};


Dmitriy V'jukov

unread,
May 25, 2012, 2:48:51 AM5/25/12
to Gavin Lambert, rel...@googlegroups.com
On Fri, May 25, 2012 at 10:11 AM, Gavin Lambert <gavin....@compacsort.com> wrote:
Quoth Dmitriy V'jukov:

> In Relacy I intercept some synchronization-related function in order to reason
> about program behavior, so I think I just intercepted Sleep() but not
> YieldProcessor(). So if the test passes with Sleep(), there is no problem on your side.

Cool.  It definitely helped me to track down some allocation-related bugs in my code that would have been really hard to see otherwise.

I did have another odd case with a different queue type where it would complain about an uninitialized variable unless I used RL_FORCE_SEQ_CST.  Sorted it out eventually by changing a few more memory_order_relaxeds into _consumes, though I'm still not entirely clear on how it managed to be uninitialized without that.

Well, Relacy must provide you a detailed execution history that leads to the bug.


And if I try to use the sched_full search type with a test case that dares to enqueue two items instead of just one, it gives me a search space in the millions of millions of iterations.  I don't think I have that kind of patience. :)  (Still, it's passing with one item with sched_full, and with three items with sched_bound, so I'm guessing it's ok.)

I am happy it found some bugs for you early in development cycle.
Yeah, sched_full for anything other that tiny. I usually verify with random scheduler 10^8 or so executions, then switch to bound and start increasing context switch bound 1,2,3,4,...

 

> Regarding memory allocations, I am pretty sure than Relacy output memory
> allocations/frees in history. So I am not sure why you do not see them. Perhaps
> that's because of std::allocator. Relacy uses quite hacky preprocessor-based
> interception, so it can miss some things, especially in std lib. Try to use
> plain malloc/free/new/delete in tests instead.


Thanks.
Well, manual source level instrumentation sucks. However that time it allowed me to implement Relacy really quickly.
Currently I am working on a full-fledged data race detector - ThreadSanitizer (http://code.google.com/p/thread-sanitizer), which uses compiler instrumentation. It's possible to re-implement Relacy the same way, but there are no such plans.


Gavin Lambert

unread,
Jun 11, 2012, 4:25:24 AM6/11/12
to Relacy Race Detector
On May 25, 6:48 pm, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:
> Well, Relacy must provide you a detailed execution history that leads to
> the bug.

It does, but sometimes it can be tricky to work out *why*, especially
since it only lists the atomic and VAR operations. (Things got better
once I'd used the allocator above, but it was still tricky to match up
the events when some of them used the address of an object and others
used the address of the rl::var wrapper around it.) But I did get
there in the end, which is the important thing. :)

> I am happy it found some bugs for you early in development cycle.
> Yeah, sched_full for anything other that tiny. I usually verify with random
> scheduler 10^8 or so executions, then switch to bound and start increasing
> context switch bound 1,2,3,4,...

Sounds like a good method; is that mentioned on the website somewhere?

Dmitriy V'jukov

unread,
Jun 11, 2012, 10:31:04 AM6/11/12
to rel...@googlegroups.com
On Mon, Jun 11, 2012 at 4:25 AM, Gavin Lambert <gavin....@compacsort.com> wrote:
On May 25, 6:48 pm, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:
> Well, Relacy must provide you a detailed execution history that leads to
> the bug.

It does, but sometimes it can be tricky to work out *why*, especially
since it only lists the atomic and VAR operations.  (Things got better
once I'd used the allocator above, but it was still tricky to match up
the events when some of them used the address of an object and others
used the address of the rl::var wrapper around it.)

Oh, when does it happen?
 
But I did get
there in the end, which is the important thing. :)

I am not sure what would be a good way to explain an execution. Some bugs are really tough. Can it be made simpler? For some bugs I remember sitting for half an hour with the report trying to understand what Relacy tries to communicate to me... that was real bugs.
Btw AFAIR, there is something along the lines of RL_LOG(...) which allows you to add custom messages to the history.

 

> I am happy it found some bugs for you early in development cycle.
> Yeah, sched_full for anything other that tiny. I usually verify with random
> scheduler 10^8 or so executions, then switch to bound and start increasing
> context switch bound 1,2,3,4,...

Sounds like a good method; is that mentioned on the website somewhere?

It is mentioned right here :) 

All about lockfree/waitfree algorithms, multicore, scalability, parallel computing and related topics:
http://www.1024cores.net

Gavin Lambert

unread,
Jun 11, 2012, 8:37:53 PM6/11/12
to rel...@googlegroups.com
Quoth Dmitriy V'jukov:
> Oh, when does it happen?

The main ones I was having trouble with were "ACCESS TO FREED MEMORY" errors
(because I was playing around with std::allocator and trying to
construct/destruct the values as needed while keeping their memory intact,
similar to std::vector); the idea was to permit a queue to contain objects
that aren't default-constructible. They were in the end technically genuine
errors, but probably would have caused no harm in real code (eg. calling the
"int" destructor multiple times, which is usually a no-op -- but in this
case happened to be a rl::atomic or rl::var, so was actually doing
something). I don't see that as a problem in Relacy, though; it's doing the
right thing. (Though it might be nice if there were a way to tell it to
temporarily ignore a specific instance of an error like this so you can
concentrate on finding more serious issues, but I suspect that could get
technically tricky.)

As to the mismatch in reported pointers, I suspect that's actually my
allocator's fault -- when I call relacy_allocator<rl::var<int>>::destruct()
it will report the address of the rl::var<int>, but if the var<> destructor
raises an error it reports the pointer to the int itself. This could
probably be fixed with a bit more intelligent specialisation.

> I am not sure what would be a good way to explain an execution. Some
> bugs are really tough. Can it be made simpler? For some bugs I
> remember sitting for half an hour with the report trying to
> understand what Relacy tries to communicate to me... that was real bugs.

Yeah, exactly, it's a hard problem, and it's hard thinking of a way it could
be improved. (Although I did end up modifying the debug_info_param
operator<< to still strip_path even for RL_MSVC_OUTPUT; I prefer having the
filename/line at the start since I'm used to MSVC errors but having the
*complete* path to the source file was just cluttering up the output and
making it harder to read. But I know that this wasn't what RL_MSVC_OUTPUT
was intended for so again, probably my fault.)

> Btw AFAIR, there is something along the lines of RL_LOG(...) which allows
> you to add custom messages to the history.

I did discover that a bit later on, and it did help, but...

> It is mentioned right here :) 
>
> All about lockfree/waitfree algorithms, multicore, scalability, parallel
> computing and related topics:
> http://www.1024cores.net

... that's the thing, though, it doesn't seem to be. Unless I'm blind, or
it's just really well hidden. The RRD page mentions in passing that there
are three different schedulers but doesn't talk about the best way to use
each one (or even how to). The docs that are there are great at explaining
what it is and why it is, but don't have much on how to use it beyond a
couple of examples. I know this is a hobby project and docs are always weak
for those (if they exist at all), and that you've moved on to
ThreadSanitizer now, but it would be nice if there were a few more details
about things like the schedulers, RL_LOG, and other such options that lurk
in the depths of the code... :) [Maybe that info was on "the old website",
which appears to no longer exist?]

(Don't get me wrong; as I said before, it's awesome. It's just a bit tricky
to figure out how to use properly at times.)


Reply all
Reply to author
Forward
0 new messages