Windows IOCP wrappers (and data race)

Gavin Lambert

unread,

Dec 9, 2013, 3:43:56 AM12/9/13

to rel...@googlegroups.com

I'm currently experimenting with ways to perform single threaded actions from a multi-threaded workqueue (some more details are at https://groups.google.com/d/msg/lock-free/7odffiOjUZg/9qcx5fJqa9AJ, although I've changed the code a bit since then).

The core workqueue is based on Windows IOCP, which I've written some minimal wrappers for in RRD (they're incomplete but cover the parts I needed). I've attached these.

As an experiment I tried commenting out the critical section in this code, since it's only protecting the internal queue, and theoretically there's no need to protect this since RRD's internal code executes single-threaded and only context switches at defined points anyway; I expected the semaphore would have provided any necessary cross-thread synchronisation for external code.

This seems to work ok (and is significantly faster) for most of my test cases but there's one in particular that fails with a data race if this internal lock is not used. I'm not sure whether this indicates that the wrappers need something else added to them if they're not using the lock, or if the lock was merely hiding a genuine problem in the test case, or something else.

In summary, the test case is for 4 threads, with two each calling Send("value") and the other two being the workqueue workers. Send posts the data to an MPSC queue and then posts a function to the workqueue; this function uses an atomic<bool> to check whether it's already running and exits if so; otherwise it dequeues the data and does work on it involving non-atomic vars (since it should be single-threaded at this point). I'm expecting this to eventually deadlock (since I haven't resolved that yet); and it does that when I have the internal lock. But without the lock it's reporting a race on the non-atomic.

The work itself involves an update of a variable and then a post back to the workqueue to run a second-half function that also uses the variable. It's in the second-half handler that the race occurs; the two functions can (and do, in the problem case) execute on different threads, but not concurrently as one queues the other.

I've added some extra logging to show the acq_rel_order and timestamp values to narrow it down a bit.

An excerpt from the output is below (I've edited it a bit to make it more readable):

test::Post : [587] 0: unblocking thread 3
rl::sync_var<4>::acq_rel : [588] 3: acq_rel_order {[0=415],[1=407],[2=402],[3=398]} local {[0=415],[1=407],[2=402],[3=398]}
test::Post : [589] 0: <006FF844> semaphore: post value=1 new_count=0 unblocked=1
rl::context_impl<struct test,class rl::random_scheduler<4> >::fiber_proc_impl : [590] 0: [THREAD FINISHED]
rl::sync_var<4>::release : [591] 0: acq_rel_order {[0=416],[1=407],[2=402],[3=394]} local {[0=416],[1=407],[2=402],[3=394]}
test::process_all : [592] 2: <007010D0> atomic load, value=00000000, order=relaxed
rl::var_data_impl<4>::load : [593] 2: check acq_rel_order {[0=394],[1=407],[2=405],[3=394]} store {[0=0],[1=400],[2=0],[3=0]}
rl::var_data_impl<4>::load : [594] 2: new order 406
test::process_all : [595] 2: <007010F0> load, value=12345678
rl::var_data_impl<4>::load : [597] 2: check acq_rel_order {[0=394],[1=407],[2=406],[3=394]} store {[0=0],[1=0],[2=0],[3=3]}
rl::var_data_impl<4>::load : [598] 2: new order 407
test::DequeueItem : [599] 2: <00592DC4> load, value=
rl::var_data_impl<4>::store : [600] 2: check acq_rel_order {[0=394],[1=407],[2=407],[3=394]} store {[0=0],[1=0],[2=0],[3=3]} load {[0=0],[1=0],[2=407],[3=0]}
rl::var_data_impl<4>::store : [601] 2: new order 408
test::DequeueItem : [602] 2: <00592DC4> store, value=12345678
rl::var_data_impl<4>::load : [606] 2: check acq_rel_order {[0=394],[1=407],[2=408],[3=394]} store {[0=0],[1=0],[2=408],[3=3]}
rl::var_data_impl<4>::load : [607] 2: new order 409
test::DoWork1 : [608] 2: <00592DC4> load, value=12345678
rl::sync_var<4>::acq_rel : [610] 2: acq_rel_order {[0=415],[1=407],[2=410],[3=398]} local {[0=415],[1=407],[2=410],[3=398]}
test::Post : [611] 2: <006FF844> semaphore: post value=1 new_count=1 unblocked=0
rl::sync_var<4>::acq_rel : [616] 2: acq_rel_order {[0=415],[1=407],[2=413],[3=398]} local {[0=415],[1=407],[2=413],[3=398]}
test::RunOne : [617] 2: <006FF844> semaphore: wait succeeded new_count=0
test::RunOne : [619] 3: <006FF844> semaphore: wait succeeded new_count=0
rl::var_data_impl<4>::load : [621] 3: check acq_rel_order {[0=415],[1=407],[2=402],[3=398]} store {[0=0],[1=0],[2=0],[3=4]}
rl::var_data_impl<4>::load : [622] 3: new order 399
test::DoWork2 : [623] 3: <00592DF0> load, value=
rl::var_data_impl<4>::load : [624] 3: check acq_rel_order {[0=415],[1=407],[2=402],[3=399]} store {[0=0],[1=0],[2=408],[3=3]}
test::DoWork2 : [625] 3: <00592DC4> load, value=
test::DoWork2 : [626] 3: DATA RACE (data race detected)

So, thread 2 does a store at 408 into 00592DC4, and thread 3 has only seen thread 2's changes up to 402, so it reports a race. But these aren't independent -- it's the Post at [611] that queues DoWork2 for execution in the first place, and this occurs on thread 2 after the store. (Although thread 3 is already awake due to an earlier Post of a do-nothing function.)

I *think* this indicates that something is missing in the IOCP wrappers (since I assume the OS does some internal locking of its own between the two threads), but using a critical section as in the attached code seems heavier (and slower!) than should be necessary. Any suggestions? Or do you think something else is wrong?

iocp.hpp

Orlando Selenu

unread,

Dec 11, 2015, 10:27:48 AM12/11/15

to Relacy Race Detector

I am very curious if you had any luck with this IOCP code.

Thanks in advance,

- Orlando

Dmitry Vyukov

unread,

Dec 14, 2015, 6:57:56 AM12/14/15

to rel...@googlegroups.com

Hi Gavin,

Yes, I agree that you don't need CRITICAL_SECTION in IOCP emulation.
That code runs single-threaded, logical threads are switched only on
predefined points.

What I don't understand in the log is the following part:

test::Post : [611] 2: <006FF844> semaphore: post value=1 new_count=1

unblocked=0,

rl::sync_var<4>::acq_rel : [616] 2: acq_rel_order
{[0=415],[1=407],[2=413],[3=398]} local
{[0=415],[1=407],[2=413],[3=398]}
test::RunOne : [617] 2: <006FF844> semaphore: wait succeeded new_count=0
test::RunOne : [619] 3: <006FF844> semaphore: wait succeeded new_count=0

Semaphore count becomes 1, but then 2 threads get unblocked on it. And
they both say that new_count = 0. This looks suspicious. I think it
can help if you add more debug logging there to figure out how exactly
it happens.

> --
> You received this message because you are subscribed to the Google Groups
> "Relacy Race Detector" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to relacy+un...@googlegroups.com.
> To post to this group, send email to rel...@googlegroups.com.
> Visit this group at http://groups.google.com/group/relacy.
> For more options, visit https://groups.google.com/d/optout.

--
Dmitry Vyukov

All about lockfree/waitfree algorithms, multicore, scalability,
parallel computing and related topics:
http://www.1024cores.net

Reply all

Reply to author

Forward