Hi Dragos,
Shepherd pinning in Qthreads is used relatively frequently, just not in an obvious way. In particular, all qt_loop constructs use shepherd pinning to assign consistent loop iterations to specific shepherds, and as part of task spawning logic (so that tasks can be spawned in parallel rather than launched serially). Doing this with shepherd pinning was so consistently better and faster (in particular, keeping the loop iterations assigned to each shepherd consistent) that there is no longer a way to do the same thing without shepherd pinning. The only thing that comes close is to do a self-scheduled loop (qt_loop_queue), which is intended for loops with unpredictable latencies per iteration. In practice, the former is usually faster than the latter, for lots of reasons (e.g. the latter requires synchronization to orchestrate the execution of ranges).
There are, however, no benchmarks in the Qthreads repository which attempt to highlight this, which is, I think, what Dylan was getting at.
If you're curious about studying the impact of task pinning on benchmark runtime, I recommend reading Stephen Olivier's paper from a few years ago. It won Best Student Paper at the Supercomputing conference:
http://dl.acm.org/citation.cfm?doid=1988796.1988804 That paper focuses on OpenMP and some extensions to it that he designed for specifying task locality, but those extensions were implemented with Qthreads and his results are all from code that he ran on Qthreads. His results are really well explained, and the paper includes one of the coolest graphs of task behavior I've ever seen (it's one of those "wow, why didn't I think of doing that?" papers).
Hope that helps,
~Kyle