This has been brought up in an earlier thread. Sorry for brevity.
Sent from a phone
Hi Eric,
I think it is correct. It is effectively a seqlock:
http://en.wikipedia.org/wiki/Seqlock
You may also consider using SSE instructions to do atomic 16 byte loads.
Sent from a phone
>> temp; // same integral type as counter
>> counter = pair->counter;
>> do
>> {
>> temp = counter;
>> head = pair->pointer;
>> COMPILER_BARRIER(); // to ensure that pair->pointer is read before
>> pair->counter
>> counter = pair->counter;
>> }
>> while (counter != temp);
>>
>> Correct? Easier way? Bunk?
>
> I think it is correct. It is effectively a seqlock:
> http://en.wikipedia.org/wiki/Seqlock
Agreed.
> You may also consider using SSE instructions to do atomic 16 byte loads.
I believe this is incorrect. The Intel manuals do not guarantee that
16-byte SSE instruction are atomic. They only guarantee that aligned
64-bit (or smaller) instructions and instructions with a LOCK prefix are
atomic, and you cannot apply a LOCK prefix to an SSE instruction.
See IA32 Software Dev. Manual Vol 3A, sections 7.1.1 and 7.1.2.2.
It might be that in practice SSE instructions are atomic, but they are
not guaranteed.
Anthony
--
Author of C++ Concurrency in Action http://www.stdthread.co.uk/book/
just::thread C++11 thread library http://www.stdthread.co.uk
Just Software Solutions Ltd http://www.justsoftwaresolutions.co.uk
15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK. Company No. 5478976
I must have brain disorder...
What's your memory model :)
It's a way too relaxed :)
Samy, I'm afraid I don't understand your point about the memory model. If the "x86-TSO" memory model (see http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf) is correct, then the writing thread's execution of the lock cmpxchg16b instruction flushes that thread's store buffer, such that the reading thread always obtains the "current" value of both halves of the pointer/counter pair. No?
Cheers,
Eric
Quoted from the Intel manual (V3, CH8):
"Software should access semaphores (shared memory used for signalling
between multiple processors) using identical addresses and operand
lengths. For example, if one processor accesses a semaphore using a
word access, other processors should not access the semaphore using a
byte access."
This important detail can be easily forgotten and I've seen it break
applications in the wild.
--
Samy Al Bahra [http://repnop.org]
May you please describe how exactly it broke the application? Did the
semaphore cross cache line boundary? I was always curious how it can
happen.