It depends on what your goals are. If your goal is to get your program
absolutely as fast as possible, and you have full control over your
hardware and software, then spinlocks might be the solution. This is
assuming you know exactly what you are doing (but in that case you would
not need help from strangers in internet).
We lesser mortals need to write software for real world scenarios where
there are other software pieces the customer wants to run on their
computer, and sometimes they even have audacity to think our software
piece is not the most important one. Getting the spinlocks wrong and
blocking the whole machine needlessly is something which may easily
happen, so I would not consider them unless the profiler says the
standard mutex is becoming a bottleneck. This has never happened so far
to me.
Also, before micro-optimizing the things like locking speed, one should
review the algorithm. I have not found willingness to delve into your
original post, but I see there is a regular mandatory wait in all
faster threads for the slowest one, plus the size of a single task
seemed ridiculously small (8 kB). Even if those impediments are
justified, it would be a pretty special scenario and one would not be
able to draw many general conclusions about multithreading performance
from that scenario.