Hello,
I have just read the following PhD paper about NUMA Cohort locks...
http://dspace.mit.edu/handle/1721.1/72670
and i have read also the following Master research paper about NUMA
cohort locks...
https://cs.brown.edu/research/pubs/theses/masters/2012/ma.pdf
So we have to be smart please , so follow with me...
I have just read the above papers and i have completly understood there
algorithm, it uses local locks and a global lock and it look
like a distributed algorithm that tries to minimize at best the
inter-socket coherence traffic , so i have not had a problem to
understand it easily, but i have not been satisfied with those papers,
why ? if you read carefully the PhD paper above you will notice that
there benchmarks are saying that the Lock cohort scales to about 6x
compared with a non-Numa Lock... and they have explained this 6x scaling
by the fact that there Lock cohort tries to minimize at best the
inter-socket coherence traffic , but i am not convinced by there
explanation, cause this 6x scaling comes instead from the fact that
there is parallelism inside the function that permit us to enter
the local locks first , this parallelism is around 6 CPU clocks or so
and there is another serial part of the cache-line transfer in there
other function of two integers tranfers from the L2 cache memory to the
CPU that is around two clocks CPU and there is a serial part that spins
for about 4 ms, so from the Amdahl's law this will scale to around 6x ,
so the scaling don't come from the fact that they are minimizing
at best the inter-socket cache coherence traffic as they are saying,
but from the Amdahl's law that says so as i have just explained it to
you, so if you are transfering more data from the L2 cache to the CPU
inside the critical section of the Lock cohort there benchmarks with a
Lock cohort will scale much less than 6x, so if you have undertstood
what i want to say to you , is that the lock cohort doesn't bring you
much scalability if you are transfering more than 4 bytes from the L2
cache to the CPU inside the critical section of the Lock cohort cause
this will scale much less than 6x, so what i want to say is that the
non-NUMA locks are still useful i think , and my scalable MLock can be
used also in realtime systems, the NUMA lock cohort can not.
You can download my scalable MLock from
https://sites.google.com/site/aminer68/scalable-mlock
Thank you for your time.
Amine Moulay Ramdane.