Hi Dmitry,
I’ve recently been experimenting with lock-free queues in order to improve performance (currently my app has a ring-buffer-with-two-locks “MPMC” queue, which gets in trouble if there’s too much contention on one side or if it wants to resize the buffer).
I came across your website (which, incidentally, is awesome, as is RRD)
and I was having a play with your unbounded SPSC queue, specifically splitting it up into intrusive and non-intrusive variants (the latter using std::allocator). I tried running the latter through the RRD and it ran ok on the default settings, but if I change the search_type to sched_full then it reports a livelock, which to my untrained eye looks like dequeue pre-empted enqueue to the point where it never got its value.
I tried the same thing on the spsc_queue example included in RRD and got a similar livelock (the iteration count and execution history are shorter on my lock vs. yours though (~2000 vs. ~8000 iterations, ~2000 vs. ~4000 history); not sure what that means).
The relevant snippet from the RRD output for my modified queue:
relacy_2_4\relacy\var.hpp(231) : [19] 0: <0003FE78> load, value=11
test_queue.cpp(45) : [20] 0: <003196A0> atomic store, value=00000000, (prev value=00000000), order=relaxed
test_queue.cpp(46) : [21] 0: <00317C88> atomic load, value=00317CA8, order=relaxed
test_queue.cpp(46) : [22] 0: <00317CA8> atomic store, value=003196A0, (prev value=00000000), order=release
test_queue.cpp(47) : [23] 0: <00317C88> atomic store, value=003196A0, (prev value=00317CA8), order=relaxed
relacy_2_4\relacy\context.hpp(457) : [24] 0: [THREAD FINISHED]
test_queue.cpp(52) : [25] 1: <00317C48> atomic load, value=00317CA8, order=relaxed
test_queue.cpp(52) : [26] 1: <00317CA8> atomic load, value=00000000 [NOT CURRENT], order=consume
test_queue.cpp(52) : [27] 1: <00317C48> atomic load, value=00317CA8, order=relaxed
test_queue.cpp(52) : [28] 1: <00317CA8> atomic load, value=00000000 [NOT CURRENT], order=consume
test_queue.cpp(52) : [29] 1: <00317C48> atomic load, value=00317CA8, order=relaxed
test_queue.cpp(52) : [30] 1: <00317CA8> atomic load, value=00000000 [NOT CURRENT], order=consume
(These last two loads just repeat several hundred times; it’s basically the dequeue code spinning getting the tail and tail->next.)
I tried adding a YieldProcessor() call in the spin loop inside the dequeue thread but it didn’t seem to change anything. (Not sure if it’s supposed to, actually.) But putting a Sleep(0) in there makes it get all the way through without errors.
Is this a real problem or is it spurious? Is there another setting I need to tweak? (I’m running it with VS2008.)
(Also, I added some code to RRD to report memory allocations/frees in the history, to make it easier to track down memory issues. Unfortunately it doesn’t get the line context (it reports memory.hpp instead), but at least you can match up the timing to the address. Let me know if you’re interested in a patch, though it was a pretty ugly hack.)
Quoth Dmitriy V'jukov:
> In Relacy I intercept some synchronization-related function in order to reason
> about program behavior, so I think I just intercepted Sleep() but not
> YieldProcessor(). So if the test passes with Sleep(), there is no problem on your side.
Cool. It definitely helped me to track down some allocation-related bugs in my code that would have been really hard to see otherwise.
I did have another odd case with a different queue type where it would complain about an uninitialized variable unless I used RL_FORCE_SEQ_CST. Sorted it out eventually by changing a few more memory_order_relaxeds into _consumes, though I'm still not entirely clear on how it managed to be uninitialized without that.
And if I try to use the sched_full search type with a test case that dares to enqueue two items instead of just one, it gives me a search space in the millions of millions of iterations. I don't think I have that kind of patience. :) (Still, it's passing with one item with sched_full, and with three items with sched_bound, so I'm guessing it's ok.)
> Regarding memory allocations, I am pretty sure than Relacy output memory
> allocations/frees in history. So I am not sure why you do not see them. Perhaps
> that's because of std::allocator. Relacy uses quite hacky preprocessor-based
> interception, so it can miss some things, especially in std lib. Try to use
> plain malloc/free/new/delete in tests instead.
On May 25, 6:48 pm, "Dmitriy V'jukov" <dvyu...@gmail.com> wrote:It does, but sometimes it can be tricky to work out *why*, especially
> Well, Relacy must provide you a detailed execution history that leads to
> the bug.
since it only lists the atomic and VAR operations. (Things got better
once I'd used the allocator above, but it was still tricky to match up
the events when some of them used the address of an object and others
used the address of the rl::var wrapper around it.)
But I did get
there in the end, which is the important thing. :)
Sounds like a good method; is that mentioned on the website somewhere?
> I am happy it found some bugs for you early in development cycle.
> Yeah, sched_full for anything other that tiny. I usually verify with random
> scheduler 10^8 or so executions, then switch to bound and start increasing
> context switch bound 1,2,3,4,...