serialize() is actually implemented as
"In our implementation, we adopt the synchronization technique described by Dice et al. [1], where the slow thread (namely, the stealer) binds directly to the processor on which the fast thread (namely, the consumer) is currently running, preempting it from the processor, and then returns to run on its own processor. Thread displacement serves as a full memory fence, hence, a stealer that invokes the displacement binding right after updating the ownership (before the line 99 in Algorithm 5) observes the updated consumer's index. On the other hand, the steal-free fast path is not a ected by this change."
So if I understand correctly, "stop the world" is implemented as forcing a scheduling out of the fast thread so the slow thread has uncontended access to the fast thread's data.
In the paper by Dice et al, they state this technique is worth doing when the following inequality holds
(Thread1_iterations * COST(MEMBAR)) > (Thread2_iterations * COST(SERIALIZE)
Since the cost of a membar is way lower than a desched + resched, what is the approximate ratio of Thread2_iterations/Thread1_iterations ?
ie : how many steal operation per jobs consumed before the synchronization cost becomes unbearable ?
With regards
David