questions about 1024cores algorithms

233 views
Skip to first unread message

Roman Gershman

unread,
Jan 7, 2016, 1:41:09 AM1/7/16
to lock...@googlegroups.com
Hi,
I have  (beginner) questions about the lockless algorithms:

1. I saw several times that class data starts with initial padding of size cache line. Here, for example
I am guessing it's done to protect against false sharing. Is it done in order to guard class data against memory accesses outside of the scope of the algorithm?

Dmitry specially mentions XCHG as slowest operation on producer side and than says that it's most problematic place is right after "mpscq_node_t* prev = XCHG(&self->head, n);".
is XCHG equivalent to std::atomic_exchange? Is it considered as wait operation?

When you say that consumer is blocked if producer is blocked at (*) you mean that if the producer thread is preempted right after the exchange it will block consumer?
How come consumer is blocked if it does not have loops?

Thank you!




--
Best regards,
     Roman

Dmitry Vyukov

unread,
Feb 15, 2016, 3:41:08 PM2/15/16
to Scalable Synchronization Algorithms
On Thursday, January 7, 2016 at 7:41:09 AM UTC+1, Roman Gershman wrote:
Hi,
I have  (beginner) questions about the lockless algorithms:

1. I saw several times that class data starts with initial padding of size cache line. Here, for example
I am guessing it's done to protect against false sharing. Is it done in order to guard class data against memory accesses outside of the scope of the algorithm?


Hi Roman,

Yes, you are right.
I guess I did it to be 100% sure there is no false sharing in benchmarks. If you control what's located around the queue then the first and the last padding may be not necessary.


 
Dmitry specially mentions XCHG as slowest operation on producer side and than says that it's most problematic place is right after "mpscq_node_t* prev = XCHG(&self->head, n);".
is XCHG equivalent to std::atomic_exchange? Is it considered as wait operation?

Yes, XCHG === std::atomic_exchange.
It is not a wait operation in the sense semaphore or condition variable are blocking operations.
It is just a single processor instruction or so, but it can be slow under contention.

 

When you say that consumer is blocked if producer is blocked at (*) you mean that if the producer thread is preempted right after the exchange it will block consumer?
How come consumer is blocked if it does not have loops?



It is expected that there a loop in the caller (consumer needs to wait for work somehow). E.g. 

while (!deque()) {}

is effectively a spin-mutex wait.
It is also possible to move this wait loop into deque function itself, then the blocking will be more explicit.



 
Reply all
Reply to author
Forward
0 new messages