Hello...
About scalability of my efficient Threadpool engine with priorities that
scales very well..
I have to explain something important about my efficient Threadpool
engine with priorities that scales very well:
In NUMA enabled systems all memory operations require snooping across
all CPU sockets to keep cache data coherent. In other words, before data
is retrieved from RAM snooping operation will check cache content of
local and remote CPUs to find the copy of this data.
In Haswell CPUs there is the following more powerful snooping mode:
Cluster-on-Die (COD) snooping mode is ideal for highly NUMA optimized
workloads. Compared to two previous modes where snoops are simply
broadcasted the COD will first snoop the directory cache and then the
home agent.
With Cluster on Die snooping mode each Memory controller now serves only
half of the memory access requests thus increasing memory bandwidth and
reducing the memory access latency. And obviously two memory controllers
can serve twice as much memory operations.
So you have to understand that my efficient Threadpool engine with
priorities that scales very well is only sharing some variables across
cores on the producers side, so i think it will scale well
with Cluster-on-Die (COD) snooping mode.
You can download my efficient Threadpool engine with priorities that
scales very well from:
https://sites.google.com/site/scalable68/an-efficient-threadpool-engine-with-priorities-that-scales-very-well
I have also implemented a powerful scalable ParallelFor() in it.
Thank you,
Amine Moulay Ramdane.