While interesting as a general subject, and for kernel or hypervisor programmers (something I happen to do), spinlocks in user-mode code have much bigger problems than this. So whenever I see user-mode code with spinlocks in it, I cringe.
Not that you should never-ever-ever-ever used them, but they are one of those things that should have a huge BEWARE sign on them. The same kind that blind folded knife throwing demonstrations should come with. (So replace "never-ever-ever-ever" with just "never"). I've seen spinlocks used "responsibly" in libraries that usually provide a non-spinlock execution option - some written by people on this list, but I've also seen the effects of people using those libraries without understanding the implications.
As a general practice, the minute you start using spinlocks in user-mode code, you've left the generally-deployable software realm. If you have such spinlocks anywhere in the code you use (including any libraries), you had better have absolute control over what all CPUs in your system are doing, and how they are allocated to user processes and threads, and how many threads can possibly be competing for CPU resources. This control needs to assure that a spinlock-holding thread will never be in a situation where it may be pre-empted while holding the spinlock. Without this control, something as small as a 20msec load spike, or a background cron job perturbing your scheduling assumptions can completely ruin your day, making you wish (from a performance point of view) you didn't have those spinlocks.
The reason is simple: with user-level spinlocks, things will work well and as-expected ONLY as long as your user threads are NEVER interrupted or suspended for any reason that is outside of your control. Any interruption in the execution of a spinlock-holding user thread tends to have cascading bad behavior that collapses and stalls the performance of otherwise super-speedy code.
Spinlocks, as a general concept, start with the assumption that the critical section of code protected by the spinlock will complete in a short amount of time. Based on that assumption, code using of spinlocks in making a critical performance tradeoff: it is willing to burn CPU time in one thread while it waits for another to let go of the spin lock. As long as the base assumption holds, the amount of CPU burned remains small, and the tradeoff is worth it. But think of what happens when a spinlock-holding thread "just happens to be" preempted by another thread that needs and deserves some CPU according to whatever scheduling mechanism controls such things. Imagine that this newly-deserving thread (or some other thread) wants to grab the spinlock that is currently held by a runnable-but-not-running thread. No forward progress will occur for any thread waiting for the spinlock for entire scheduling quantum (typically 4-10 msec), even though all CPUs will appear to be pegged and busy (with threads wanting the spinlock).
The way this is handled in OS kernel code is simple: a spinlock holding thread will disable preemption while holding spin lock. It may also disable interrupts, but disabling preemption is critical. Unfortunately, user-mode code (almost by definition) has no ability to do the same.
-- Gil.