Recursive spinlock

Mr Flibble

unread,

May 8, 2020, 8:55:52 PM5/8/20

to

Hi!

Today I created a C++ recursive spinlock (based on the spinlock in Boost) and it seems to work! Spinlocks can be a lot faster than the mutexes supplied by the operating system.

https://github.com/i42output/neolib/blob/master/include/neolib/mutex.hpp

#cpp #gamedev #coding #neoGFX

/Flibble

--
"Snakes didn't evolve, instead talking snakes with legs changed into snakes." - Rick C. Hodgin

“You won’t burn in hell. But be nice anyway.” – Ricky Gervais

“I see Atheists are fighting and killing each other again, over who doesn’t believe in any God the most. Oh, no..wait.. that never happens.” – Ricky Gervais

"Suppose it's all true, and you walk up to the pearly gates, and are confronted by God," Byrne asked on his show The Meaning of Life. "What will Stephen Fry say to him, her, or it?"
"I'd say, bone cancer in children? What's that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery that is not our fault. It's not right, it's utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates a world that is so full of injustice and pain. That's what I would say."

Bonita Montero

unread,

May 9, 2020, 3:49:43 AM5/9/20

to

> Today I created a C++ recursive spinlock (based on the spinlock in
> Boost) and it seems to work! Spinlocks can be a lot faster than the
> mutexes supplied by the operating system.

Spinlocks don't make sense in userland because you can't disable
the scheduler in userland. The problem here is that a thread holding
a spinlock might get de-scheduled and other treads will spin and con-
sume CPU-time unutil it gets re-scheduled to relinquish the ownership.

Chris M. Thomasson

unread,

May 9, 2020, 5:57:12 PM5/9/20

to

Mostly true. However, an adaptive lock, that spins a couple of times
before it actually blocks can be useful in userland. The PAUSE
instruction can help out here.

Mr Flibble

unread,

May 9, 2020, 6:11:20 PM5/9/20

to

Which is what my recursive spinlock does; Bonita should try looking at the fucking code before commenting.

Chris M. Thomasson

unread,

May 9, 2020, 7:22:41 PM5/9/20

to

On 5/9/2020 3:11 PM, Mr Flibble wrote:
> On 09/05/2020 22:57, Chris M. Thomasson wrote:
>> On 5/9/2020 12:49 AM, Bonita Montero wrote:
>>>> Today I created a C++ recursive spinlock (based on the spinlock in
>>>> Boost) and it seems to work! Spinlocks can be a lot faster than the
>>>> mutexes supplied by the operating system.
>>>
>>> Spinlocks don't make sense in userland because you can't disable
>>> the scheduler in userland. The problem here is that a thread holding
>>> a spinlock might get de-scheduled and other treads will spin and con-
>>> sume CPU-time unutil it gets re-scheduled to relinquish the ownership.
>>>
>>
>> Mostly true. However, an adaptive lock, that spins a couple of times
>> before it actually blocks can be useful in userland. The PAUSE
>> instruction can help out here.
>
> Which is what my recursive spinlock does; Bonita should try looking at
> the fucking code before commenting.

Indeed. Fwiw, there are clever distributed adaptive locks, however I am
not sure if they would be the right fit in your system. There are some
fun ones, I remember a while back about an algorihtm called MLock. I
cannot find the posts right now over on comp.arch, however they can be
useful.

Bonita Montero

unread,

May 10, 2020, 1:37:07 AM5/10/20

to

> Mostly true. However, an adaptive lock, that spins a couple of times
> before it actually blocks can be useful in userland. The PAUSE
> instruction can help out here.

No, this also doesn't make sense. That's because if there's a high
likehood that spinning gives success the locks are usually held only
a very short time. And if they are there's a low likehood of collision
so that you can drop spinning.

Bonita Montero

unread,

May 10, 2020, 1:45:35 AM5/10/20

to

> Which is what my recursive spinlock does; Bonita should try looking at
> the fucking code before commenting.

Then don't talk about a spinlock. Spinlocks aren't composite-locks
that incorporate kernel-waiting but just spinning.

Mr Flibble

unread,

May 10, 2020, 1:51:53 AM5/10/20

to

B U L L S H I T

Bonita Montero

unread,

May 10, 2020, 1:55:39 AM5/10/20

to

>>> Which is what my recursive spinlock does; Bonita should try looking
>>> at the fucking code before commenting.

>> Then don't talk about a spinlock. Spinlocks aren't composite-locks
>> that incorporate kernel-waiting but just spinning.

> B U L L S H I T

Look at the Wikipedia, dude.

Chris M. Thomasson

unread,

May 10, 2020, 7:40:01 AM5/10/20

to

On 5/9/2020 10:36 PM, Bonita Montero wrote:
>> Mostly true. However, an adaptive lock, that spins a couple of times
>> before it actually blocks can be useful in userland. The PAUSE
>> instruction can help out here.
>
> No, this also doesn't make sense.

Adaptive locks make sense. A lot of OS's use them.

https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-initializecriticalsectionandspincount

[...]

Bonita Montero

unread,

May 10, 2020, 7:49:38 AM5/10/20

to

> Adaptive locks make sense. A lot of OS's use them.

Tell me another OS than Windows that uses them. I told you the reason
they don't make sense. Again: if the likehood you get hold of the lock
by spinning is high the lock is only held a very short time. If it is
held a very short time, the likehood of a collision of two threads is
low. So spinning doesn't make sense.

> https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-initializecriticalsectionandspincount

That's just a stupid Microsoft-idea.

Chris M. Thomasson

unread,

May 10, 2020, 8:00:40 AM5/10/20

to

On 5/10/2020 4:49 AM, Bonita Montero wrote:
>> Adaptive locks make sense. A lot of OS's use them.
>
> Tell me another OS than Windows that uses them.

Solaris. And there are others.

> I told you the reason
> they don't make sense. Again: if the likehood you get hold of the lock
> by spinning is high the lock is only held a very short time. If it is
> held a very short time, the likehood of a collision of two threads is
> low. So spinning doesn't make sense.
>
>> https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-initializecriticalsectionandspincount
>
>
> That's just a stupid Microsoft-idea.
>

Why do you think that Microsoft invented adaptive mutexs? lol.

Bonita Montero

unread,

May 10, 2020, 8:07:22 AM5/10/20

to

Mutexes with spinning are just a stupid idea and are not thought
to the end in the sense i told.

Öö Tiib

unread,

May 10, 2020, 8:47:51 AM5/10/20

to

On Sunday, 10 May 2020 15:07:22 UTC+3, Bonita Montero wrote:
> Mutexes with spinning are just a stupid idea and are not thought
> to the end in the sense i told.

So you again repeatedly write a fundamental claim with none
ground provided. The claim is that both POSIX (PTHREAD_MUTEX_ADAPTIVE_NP)
and Win32 (SetCriticalSectionSpinCount) are stupid.

No research ... benchmarks ... theoretical calculations ... links
to such ... just *stupid* and *period*. And that for cascade of posts.
Yawn.

Bonita Montero

unread,

May 10, 2020, 8:55:09 AM5/10/20

to

> No research ... benchmarks ... theoretical calculations ... links
> to such ... just *stupid* and *period*. And that for cascade of posts.
> Yawn.

I told you why. Try to argue against that.

Bo Persson

unread,

May 10, 2020, 10:44:34 AM5/10/20

to

On 2020-05-10 at 13:49, Bonita Montero wrote:
>> Adaptive locks make sense. A lot of OS's use them.
>
> Tell me another OS than Windows that uses them. I told you the reason
> they don't make sense. Again: if the likehood you get hold of the lock
> by spinning is high the lock is only held a very short time. If it is
> held a very short time, the likehood of a collision of two threads is
> low. So spinning doesn't make sense.
>

It could also be that the lock is held for a longer time, but not very
often.

Then it makes sense to make a quick check before doing a more expensive
OS call.

Bo Persson

Bonita Montero

unread,

May 10, 2020, 11:04:42 AM5/10/20

to

>> Tell me another OS than Windows that uses them. I told you the reason
>> they don't make sense. Again: if the likehood you get hold of the lock
>> by spinning is high the lock is only held a very short time. If it is
>> held a very short time, the likehood of a collision of two threads is
>> low. So spinning doesn't make sense.

> It could also be that the lock is held for a longer time, but not very
> often.

Then spinning succeeds only when it is begun shortly before the other
thread releases the mutex, and that's very unlikely.

Chris M. Thomasson

unread,

May 10, 2020, 4:32:40 PM5/10/20

to

On 5/10/2020 5:07 AM, Bonita Montero wrote:
> Mutexes with spinning are just a stupid idea and are not thought
> to the end in the sense i told.
>

An adaptive mutex is a good idea. Any kernel calls that can be avoided
is great! Btw, you should tell the Linux guys that they made a big
mistake. ;^)

Chris M. Thomasson

unread,

May 10, 2020, 4:32:47 PM5/10/20

to

LOL!

Chris M. Thomasson

unread,

May 10, 2020, 4:36:42 PM5/10/20

to

Iirc, they were trying to avoid some futex calls in userspace. Funny
thing about futex... It seems that Microsoft has came up with something
kind of like them:

https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-waitonaddress

I have used futexs, but never WaitOnAddress.

Öö Tiib

unread,

May 10, 2020, 5:48:46 PM5/10/20

to

You said something like that if lock is needed very frequently for
very short time then there is very low likelihood of collision. It
does not make sense at all and even looks rather stupid thing to
say on its own. So it is your usual miss shot from the hip without
anything to back it up?

Chris M. Thomasson

unread,

May 11, 2020, 1:49:00 AM5/11/20

to

Trying to avoid the kernel is usually a good thing. A simple example is
using an atomic to try to avoid calling into a "presumably slow" OS
function. We can break things down into fast-paths, and slow-paths. For
this simple example, a slow-path would be a system call, like a futex or
something. A fast-path avoids expensive sys calls by using atomic
operations. Heck Bonita, would you use a raw semaphore on Windows:

https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-createsemaphorea

I think not? ;^)

Or build a faster one on top of it, processes aside for a moment?
Basically, a version that only calls into the slow-paths when it,
ideally, _really_ needs to. So, imvho, trying to "efficiently" maximize
the fast-paths in sync algorithms can be beneficial, indeed.

For some reason, it kind of sounds like you are arguing against trying
to avoid the slow-path?

An adaptive mutex can be beneficial. Fwiw, a long time ago, 20
years'ish, remember counting how many spins it took for a mutex to hit a
fast-path vs- slow.path. Every slow path meant that the lock spinned in
failure up to its current threshold. I would experiment with dynamically
adjusting said threshold during run time. Fun graphs and data dumped out...

Then there is another way to create an adaptive scheme, one of my old
experiments. Iirc, might of posted about it in comp.programming.threads.
Good ol try_lock can be nice, assuming try lock is not always a
slow-path. For now, assume try lock uses an atomic. Failing to acquire
the lock means the slow paths are rising!

Iirc, this was the first basic crude scheme of the experiment:

while (! try_lock())
{
// okay, the lock is busy...

// try to perform some other work?
// Humm... Other work, can be something
// unrelated to the reason why we are locking.
// humm...

if (! try_perform_other_work_under_contention())
{
// we found that we cannot perform
// other work right now...
// spin on try_lock one more time?
// not yet, lets test this as-is.

// SLOW-PATH!!!!! ALERT!

lock(); // a potentially blocking lock

break;
}
}

/////

// we are locked!

////

unlock();

Instead of spinning doing nothing, a thread tries to perform other work
on each spin, before it finally hits a slow path and blocks. Iirc,
depending on the nature of the "work", this scheme can be made to
work... ;^) A failed try_lock can be a possible opportunity to check for
and even execute other work... If it fails, we actually block on the lock.

The basic concept: is why spin, when other work might be able to be
accomplished?

Bonita Montero

unread,

May 11, 2020, 1:54:42 AM5/11/20

to

> You said something like that if lock is needed very frequently for

^^^^^^^^^^^^^^^^^^^^^^^ ...

> very short time then there is very low likelihood of collision.

Try to really understand it.

Bonita Montero

unread,

May 11, 2020, 1:56:20 AM5/11/20

to

> An adaptive mutex is a good idea. Any kernel calls that can be avoided
> is great! Btw, you should tell the Linux guys that they made a big
> mistake. ;^)

Yes, they made a mistake.
Try to argue against that: if a lock is held a so small time that
spinning might succeed there's such a low likehood of a collision
that spinning isn't necessary.

Chris M. Thomasson

unread,

May 11, 2020, 2:48:49 AM5/11/20

to

Its almost as if you do not understand why they were created in the
first place? To avoid the slow-path!

Chris M. Thomasson

unread,

May 11, 2020, 2:55:08 AM5/11/20

to

On 5/10/2020 10:56 PM, Bonita Montero wrote:

There are situations where a lock can get "hot", not matter what happens
when the lock is held. Its held for a small amount of time, yet can
experience periods of load. Trying to avoid slow-paths is a good thing.

Öö Tiib

unread,

May 11, 2020, 2:58:03 AM5/11/20

to

Try to think yourself instead of shooting nonsense from hip.
Adaptive lock is for very short but aggressively contended sections
and plain futex can't outperform it there.

Chris M. Thomasson

unread,

May 11, 2020, 3:00:13 AM5/11/20

to

On 5/10/2020 10:54 PM, Bonita Montero wrote:
>> You said something like that if lock is needed very frequently for
> ^^^^^^^^^^^^^^^^^^^^^^^ ...
>> very short time then there is very low likelihood of collision.

Why do you assume that? Please keep in mind that the complexity of the
work performed under the protection of the lock is completely different
that how this lock is going to be actually used in a system. Okay, lets
say the critical section is short and sweet, however, the usage of the
lock can experience periods of load, this means contention, and slow
paths are in store... Well, trying to avoid the slow paths is a good thing.

We are talking about general purpose locks, and one cannot necessarily
predict the usage patterns. Having the ability to use an adaptive lock
can be beneficial.

[...]

Chris M. Thomasson

unread,

May 11, 2020, 3:01:31 AM5/11/20

to

You beat me to this! :^)

The usage of the lock, and the actual work performed under its protected
region, are different things.

Bonita Montero

unread,

May 11, 2020, 4:24:19 AM5/11/20

to

>> Yes, they made a mistake.
>> Try to argue against that: if a lock is held a so small time that
>> spinning might succeed there's such a low likehood of a collision
>> that spinning isn't necessary.

> Its almost as if you do not understand why they were created in the
> first place? To avoid the slow-path!

But as I said the slow path is very unlikely to happen if you're
in the situation where locks are only held a very short time.

Bonita Montero

unread,

May 11, 2020, 4:29:21 AM5/11/20

to

>> Try to really understand it.

> Try to think yourself instead of shooting nonsense from hip.

Write me a benchmark to prove your assumptions.

Öö Tiib

unread,

May 11, 2020, 6:44:49 AM5/11/20

to

Why should I write you benchmarks when you post groundless
bullshit? Benchmarks written by me can't cure your stupidity.

Bonita Montero

unread,

May 11, 2020, 6:46:35 AM5/11/20

to

> Why should I write you benchmarks when you post groundless
> bullshit?

You are not even able to argue against this "bullshit".

> Benchmarks written by me can't cure your stupidity.

You also aren't able to prove what you said.
So it's you who has groundless assumptions.

Fred. Zwarts

unread,

May 11, 2020, 6:59:16 AM5/11/20

to

Op 11.mei.2020 om 10:24 schreef Bonita Montero:

But if the slow path is taken always, the locks are held for a long time
always.

--
Paradoxes in the relation between Creator and creature.
<http://www.wirholt.nl/English>.

Bonita Montero

unread,

May 11, 2020, 7:05:38 AM5/11/20

to

>>> Its almost as if you do not understand why they were created in the
>>> first place? To avoid the slow-path!

>> But as I said the slow path is very unlikely to happen if you're
>> in the situation where locks are only held a very short time.

> But if the slow path is taken always, the locks are held for a long
> time always.

Doesn't change what I said.

Öö Tiib

unread,

May 11, 2020, 7:40:04 AM5/11/20

to

On Monday, 11 May 2020 13:46:35 UTC+3, Bonita Montero wrote:
> > Why should I write you benchmarks when you post groundless
> > bullshit?
>
> You are not even able to argue against this "bullshit".

Why should I argue when there is only groundless nonsense?
Nonsense without ground should be dismissed on the ground
of its groundlessness ... nothing to argue there.

>
> > Benchmarks written by me can't cure your stupidity.
>
> You also aren't able to prove what you said.
> So it's you who has groundless assumptions.

I said that you bullshit groundlessly like usual and that your
sentences do not make sense. It is proven by the fact that
you continue to post such sentences.

Bonita Montero

unread,

May 11, 2020, 8:13:44 AM5/11/20

to

> Why should I argue when there is only groundless nonsense?
> Nonsense without ground should be dismissed on the ground
> of its groundlessness ... nothing to argue there.

What I said isn't nonsense.
The "benefits" of adaptive locks have never been verified in practice.

Öö Tiib

unread,

May 11, 2020, 8:49:53 AM5/11/20

to

On Monday, 11 May 2020 15:13:44 UTC+3, Bonita Montero wrote:
> > Why should I argue when there is only groundless nonsense?
> > Nonsense without ground should be dismissed on the ground
> > of its groundlessness ... nothing to argue there.
>
> What I said isn't nonsense.

It is classical groundless nonsense. Like "earth is flat" or
"moon landing was hoax".

> The "benefits" of adaptive locks have never been verified in practice.

Sure, and Elvis is alive.

Bonita Montero

unread,

May 11, 2020, 8:54:58 AM5/11/20

to

> It is classical groundless nonsense. Like "earth is flat" or
> "moon landing was hoax".

Compile this ...

#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <iostream>
#include <cstdint>
#include <cstdlib>
#include <vector>
#include <thread>
#include <random>
#include <atomic>
#include <csignal>

#pragma warning(disable: 6031)
#pragma warning(disable: 26495)

using namespace std;

struct spin_mutex
{
spin_mutex( uint32_t spinCount );
void lock();
void unlock();
uint64_t getSpinSuceeds();
uint64_t getSpinFails();
private:
uint32_t spinCount;
atomic<uint32_t> lockCounter;
HANDLE hEvent;
static
atomic<uint64_t> spinSucceeds,
spinFails;
};

atomic<uint64_t> spin_mutex::spinSucceeds = 0;
atomic<uint64_t> spin_mutex::spinFails = 0;

spin_mutex::spin_mutex( uint32_t spinCount ) :
spinCount( spinCount ),
lockCounter( 0 )
{
hEvent = CreateEvent( nullptr, false, false, nullptr );
}

void spin_mutex::lock()
{
uint32_t sc = spinCount;
for( ; sc; --sc )
{
uint32_t cmp = 0;
if( lockCounter.compare_exchange_weak( cmp, 1, memory_order_acquire,
memory_order_relaxed ) )
{
if( --sc ) // only count if it is not the first attempt to lock the mutex
spinSucceeds.fetch_add( 1, memory_order_relaxed );
return;
}
}
if( spinCount ) // only count fails if there's a spin-count
++spinFails;
if( lockCounter.fetch_add( 1, memory_order_acquire ) != 0 )
WaitForSingleObject( hEvent, INFINITE );
}

void spin_mutex::unlock()
{
if( lockCounter.fetch_sub( 1, memory_order_release ) != 1 )
SetEvent( hEvent );
}

inline
uint64_t spin_mutex::getSpinSuceeds()
{
return spinSucceeds;
}

inline
uint64_t spin_mutex::getSpinFails()
{
return spinFails;
}

void spendCycles( uint64_t cycles );

atomic<bool> stop = false;
HANDLE hEvtStop;

int main( int argc, char **argv )
{
if( argc < 7 )
return EXIT_FAILURE;
unsigned nThreads;
unsigned long long minNonLockedCycles, maxNonLockedCycles;
unsigned long long minLockedCycles, maxLockedCycles;
unsigned spinCount;
sscanf( argv[1], "%u", &nThreads );
sscanf( argv[2], "%llu", &minNonLockedCycles );
sscanf( argv[3], "%llu", &maxNonLockedCycles );
sscanf( argv[4], "%llu", &minLockedCycles );
sscanf( argv[5], "%llu", &maxLockedCycles );
sscanf( argv[6], "%u", &spinCount );
auto sigHandler = []( int signal ) -> void
{
::stop = true;
SetEvent( ::hEvtStop );
};
signal( SIGINT, sigHandler );
size_t const N_DELAYS = 1'000;
vector<uint64_t> nonLockedDelays( N_DELAYS );
vector<uint64_t> lockedDelays( N_DELAYS );
random_device rd;
uniform_int_distribution<uint64_t> uidNonLocked( minNonLockedCycles,
maxNonLockedCycles ),
uidLocked( minLockedCycles,
maxLockedCycles );
for( size_t i = 0; i != N_DELAYS; ++i )
nonLockedDelays[i] = uidNonLocked( rd ),
lockedDelays[i] = uidLocked( rd );
spin_mutex sm( spinCount );
auto thr = [&]( size_t iWaitNonLocked, size_t iWaitLocked )
{
auto getNextDelay = []( vector<uint64_t> delayVector, size_t iDelay )
-> size_t
{
size_t delay = delayVector[iDelay++];
if( iDelay == delayVector.size() )
iDelay = 0;
return delay;
};
spendCycles( getNextDelay( nonLockedDelays, iWaitNonLocked ) );
while( !stop.load( memory_order_relaxed ) )
sm.lock(),
spendCycles( getNextDelay( lockedDelays, iWaitLocked ) ),
sm.unlock(),
spendCycles( getNextDelay( nonLockedDelays, iWaitNonLocked ) );
};
vector<thread> threads;
uniform_int_distribution<size_t> uidIWait( 0, 999 );
for( unsigned t = 0; t != nThreads; ++t )
threads.emplace_back( thr, uidIWait( rd ), uidIWait( rd ) );
WaitForSingleObject( ::hEvtStop, INFINITE );
for( thread &t : threads )
t.join();
cout << "succeeds: " << sm.getSpinSuceeds() << endl;
cout << "fails: " << sm.getSpinFails() << endl;
}

... assemble this ...

PUBLIC ?spendCycles@@YAX_K@Z
_TEXT SEGMENT
?spendCycles@@YAX_K@Z PROC
test rcx, rcx
jz byebye
cycleLoop:
dec rcx ; roughly one jump per clock-cycle
jnz cycleLoop ; because there's only one branch-unit
byebye:
ret
?spendCycles@@YAX_K@Z ENDP
_TEXT ENDS
END

... link it.
And run it with the appropriate parameters to prove what you said.

Bonita Montero

unread,

May 11, 2020, 8:56:45 AM5/11/20

to

>> The "benefits" of adaptive locks have never been verified in practice.

> Sure, and Elvis is alive.

Tell me _one_ soure where appropriate spin-counts
are given or a way how they're determined.

Bonita Montero

unread,

May 11, 2020, 9:34:25 AM5/11/20

to

There were three little bugs in the C++-file.
1: hEvtStop wasn't initialized.
2: at the return of the signal handler the program was terminated.
3: the fail-counter wasn't incremented with relaxed memory ordering.

This should be o.k.:

#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <iostream>
#include <cstdint>
#include <cstdlib>
#include <vector>
#include <thread>
#include <random>
#include <atomic>
#include <csignal>

#pragma warning(disable: 6031)

#pragma warning(disable: 6387)

spinFails.fetch_add( 1, memory_order_relaxed );

if( lockCounter.fetch_add( 1, memory_order_acquire ) != 0 )
WaitForSingleObject( hEvent, INFINITE );
}

void spin_mutex::unlock()
{
if( lockCounter.fetch_sub( 1, memory_order_release ) != 1 )
SetEvent( hEvent );
}

inline
uint64_t spin_mutex::getSpinSuceeds()
{
return spinSucceeds;
}

inline
uint64_t spin_mutex::getSpinFails()
{
return spinFails;
}

void spendCycles( uint64_t cycles );

atomic<bool> stop = false;
HANDLE hEvtStop;

int main( int argc, char **argv )
{
if( argc < 7 )

{
cout << "number of threads, min non locked cycles, max non locked
cycles, min locked cycles, max cycles, spin-count" << endl;

return EXIT_FAILURE;
}
unsigned nThreads;
unsigned long long minNonLockedCycles, maxNonLockedCycles;
unsigned long long minLockedCycles, maxLockedCycles;
unsigned spinCount;
sscanf( argv[1], "%u", &nThreads );
sscanf( argv[2], "%llu", &minNonLockedCycles );
sscanf( argv[3], "%llu", &maxNonLockedCycles );
sscanf( argv[4], "%llu", &minLockedCycles );
sscanf( argv[5], "%llu", &maxLockedCycles );
sscanf( argv[6], "%u", &spinCount );

auto sigHandler = []( int sig ) -> void

{
::stop = true;
SetEvent( ::hEvtStop );

signal( SIGINT, SIG_IGN );
};
signal( SIGINT, sigHandler );
::hEvtStop = CreateEvent( nullptr, FALSE, FALSE, nullptr );

The asm-file is o.k..
But there are no "reasonable" values that could proof that adapive
locks make sense.

Bonita Montero

unread,

May 11, 2020, 9:50:39 AM5/11/20

to

> auto getNextDelay = []( vector<uint64_t> delayVector, size_t
> iDelay ) -> size_t

OMG, does someone guess what's the mistake here ? ;-)

Öö Tiib

unread,

May 11, 2020, 10:29:41 AM5/11/20

to

For *adaptive* mutex? It has estimator inbuilt that uses few instructions
to do statistics and then does not to spin longer than twice of what
was estiated.

Bonita Montero

unread,

May 11, 2020, 10:46:19 AM5/11/20

to

>> Tell me _one_ soure where appropriate spin-counts
>> are given or a way how they're determined.

> For *adaptive* mutex? It has estimator inbuilt that uses few instructions
> to do statistics and then does not to spin longer than twice of what
> was estiated.

You're dreaming ...
Why have this mutexes _static_ spin-counts at initilzation ?
But maybe you could convince me by naming a source for your wild
assumptions. But it would be unrealistic to exptect you could.

Bonita Montero

unread,

May 11, 2020, 10:53:06 AM5/11/20

to

> Why have this mutexes _static_ spin-counts at initilzation ?

Look at:
https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-initializecriticalsectionandspincount
So theres no adaptive spin-count.

So here's again my code. Now it outputs once a second the number of
failed ans successful spins:

for( uint32_t sc = spinCount; sc; --sc )

{
uint32_t cmp = 0;
if( lockCounter.compare_exchange_weak( cmp, 1, memory_order_acquire,
memory_order_relaxed ) )
{
if( --sc ) // only count if it is not the first attempt to lock the mutex
spinSucceeds.fetch_add( 1, memory_order_relaxed );
return;
}
}
if( spinCount ) // only count fails if there's a spin-count
spinFails.fetch_add( 1, memory_order_relaxed );
if( lockCounter.fetch_add( 1, memory_order_acquire ) != 0 )
WaitForSingleObject( hEvent, INFINITE );
}

void spin_mutex::unlock()
{
if( lockCounter.fetch_sub( 1, memory_order_release ) != 1 )
SetEvent( hEvent );
}

inline
uint64_t spin_mutex::getSpinSuceeds()
{

return spinSucceeds.load( memory_order_relaxed );
}

inline
uint64_t spin_mutex::getSpinFails()
{
return spinFails.load( memory_order_relaxed );

using vui64_cit = vector<uint64_t>::const_iterator;
auto thr = [&]( vui64_cit itWaitNonLocked, vui64_cit itWaitLocked )
{
auto getNextDelay = []( vector<uint64_t> const &delays, vui64_cit it )
-> uint64_t
{
uint64_t delay = *it;
if( ++it == delays.end() )
it = delays.begin();
return delay;

};
while( !stop.load( memory_order_relaxed ) )

spendCycles( getNextDelay( nonLockedDelays, itWaitNonLocked ) ),
sm.lock(),
spendCycles( getNextDelay( lockedDelays, itWaitLocked ) ),
sm.unlock();
};
vector<thread> threads;
uniform_int_distribution<size_t> uidIWait( 0, N_DELAYS - 1 );

for( unsigned t = 0; t != nThreads; ++t )

threads.emplace_back( thr, nonLockedDelays.begin() + uidIWait( rd ),
lockedDelays.begin() + uidIWait( rd ) );

while( !stop.load( memory_order_relaxed ) )

if( WaitForSingleObject( ::hEvtStop, 1000 ) == WAIT_TIMEOUT )
{

cout << "succeeds: " << sm.getSpinSuceeds() << endl;
cout << "fails: " << sm.getSpinFails() << endl;
}
for( thread &t : threads )
t.join();
}

Asm:

PUBLIC ?spendCycles@@YAX_K@Z
_TEXT SEGMENT
?spendCycles@@YAX_K@Z PROC
test rcx, rcx
jz byebye
cycleLoop:
dec rcx ; roughly one jump per clock-cycle
jnz cycleLoop ; because there's only one branch-unit
byebye:
ret
?spendCycles@@YAX_K@Z ENDP
_TEXT ENDS
END

Compile, link - and show me which are the parameters that give an
appropriate spin-count.

Alf P. Steinbach

unread,

May 11, 2020, 12:17:49 PM5/11/20

to

Passing a vector by value, forgetting `const`, using unsigned type for
number.

- Alf

Bonita Montero

unread,

May 11, 2020, 12:26:21 PM5/11/20

to

>>> auto getNextDelay = []( vector<uint64_t> delayVector, size_t
>>> iDelay ) -> size_t
>> OMG, does someone guess what's the mistake here ? ;-)

> Passing a vector by value, ...

This was the issue.

> ... forgetting `const`, ...

That's what I later inserted, but I think it's not really necessary.

> using unsigned type for number.

No, size_t was correct.

Alf P. Steinbach

unread,

May 11, 2020, 2:24:47 PM5/11/20

to

Consider defining

using Size = ptrdiff_t;

... and using that instead of `size_t`.

E.g., by default `std::distance` returns a `ptrdiff_t`.

A main advantage is that you avoid unintended implicit conversions from
signed to unsigned value, with possible wrap-around.

Relevant part of the C++ code guidelines: <url:
https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md#arithmetic>.

- Alf

Bonita Montero

unread,

May 11, 2020, 2:31:55 PM5/11/20

to

>> No, size_t was correct.

> Consider defining
> using Size = ptrdiff_t;
> ... and using that instead of `size_t`.

std::vector::operator [] uses size_t as the index so I use it also.
There's absolutely no need for negative indices here.
But in the latest code I changed to iterators.

Öö Tiib

unread,

May 11, 2020, 2:51:35 PM5/11/20

to

Kaz Kylheku who originally invented that PTHREAD_MUTEX_ADAPTIVE_NP has
said that it has spin count adjusted by such estimator. Why should
he lie?

Chris M. Thomasson

unread,

May 11, 2020, 4:06:59 PM5/11/20

to

Do you think that an OS designer would add them in for no reason
whatsoever?

Most OS's use hybrid adaptive locking for their general purpose locks.

Chris M. Thomasson

unread,

May 11, 2020, 4:12:46 PM5/11/20

to

Bonita is having some trouble understanding what can be done. Iirc, Kaz
used to post on comp.programming.threads. It is now a wasteland!

Btw, Bonita. Please implement your code using a race detector, it will
find your bugs. Try Relacy...

Chris M. Thomasson

unread,

May 11, 2020, 4:24:26 PM5/11/20

to

Fwiw, have you ever heard of asymmetric locking? This can even remove
memory barriers for the fast path, to make it hyper fast... ;^)

Think of an asymmetric Dekker algorihtm. Take the two process version
with a usage pattern where process A takes the lock _much_ more
frequently than process B. Therefore, we can heavily optimize process A.

The fun part is that Windows has an essential function called
FlushProcessWriteBuffers that makes it possible to implement.

https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-flushprocesswritebuffers

This allows for a sort of "remote" memory barrier. The slow path,
process B would execute this instruction. Process A does not need to use
any membars. It makes things faster. The usage pattern means that calls
to FlushProcessWriteBuffers will not be frequent.

Chris M. Thomasson

unread,

May 11, 2020, 4:29:44 PM5/11/20

to

On 5/10/2020 8:04 AM, Bonita Montero wrote:
>>> Tell me another OS than Windows that uses them. I told you the reason
>>> they don't make sense. Again: if the likehood you get hold of the lock
>>> by spinning is high the lock is only held a very short time. If it is
>>> held a very short time, the likehood of a collision of two threads is
>>> low. So spinning doesn't make sense.
>
>> It could also be that the lock is held for a longer time, but not very
>> often.
>
> Then spinning succeeds only when it is begun shortly before the other
> thread releases the mutex, and that's very unlikely.

You need to separate the work performed under the critical section, vs
the usage pattern of the mutex. They are separate things. Even if the
critical section is short, it can still be hammered by a bunch of
threads for sustained periods of time. This will create contention.

Ian Collins

unread,

May 11, 2020, 4:43:56 PM5/11/20

to

There be trolls...

--
Ian.

Chris M. Thomasson

unread,

May 11, 2020, 4:48:38 PM5/11/20

to

Are some trolls smarter than others? Is the size of the bridge or where
its located?

Keith Thompson

unread,

May 11, 2020, 5:40:51 PM5/11/20

to

Öö Tiib <oot...@hot.ee> writes:
> On Monday, 11 May 2020 13:46:35 UTC+3, Bonita Montero wrote:
>> > Why should I write you benchmarks when you post groundless
>> > bullshit?
>>
>> You are not even able to argue against this "bullshit".
>
> Why should I argue when there is only groundless nonsense?
> Nonsense without ground should be dismissed on the ground
> of its groundlessness ... nothing to argue there.

No, you shouldn't argue. But I suggest that a better way not to
argue is simply not to reply.

Please stop feeding the troll.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

Bonita Montero

unread,

May 12, 2020, 12:54:06 AM5/12/20

to

> Kaz Kylheku who originally invented that PTHREAD_MUTEX_ADAPTIVE_NP has
> said that it has spin count adjusted by such estimator. Why should
> he lie?

Give the sourcecode as well as a benchmark.
And Windows does it with a fixed spin-count. If this should work my
benchmark should do the same given appropriate paramters. So compile
my benchmark, experiment with different parameters and give a proof.
I'll bet there will be no proof from you.

Öö Tiib

unread,

May 12, 2020, 2:44:06 AM5/12/20

to

Huh? You already *lost* your bet. Kaz Kylheku is simply super without
any doubt. Linux is open source, glibc is open source and
PTHREAD_MUTEX_ADAPTIVE_NP is fastest there for most things.

Observe internet. It is full of examples. My first query to google
reveals Firefox guys observing how performance was 480% times worse
on OS-X two years ago and emulated happily Kaz code of glibc there
to fix it. Problem solved.
https://bugzilla.mozilla.org/show_bug.cgi?id=1457882

So ... who is stupid: spin locks or Bonita?

It is because most critical sections in sane software are short.
Few cycles: Lock, move unique_pointer of your job results into shared
container, unlock. Another few cycles: Lock, take your next job from
queue, unlock. Now work hard on your new job in isolation without
any concurrency. However when there is aggressive contention to that
shared container then hundreds of cycles into kernel call of blocking
the thread start suddenly to frequent without spin locks and *bang*
performance degrades into awful.

Bonita Montero

unread,

May 12, 2020, 2:49:46 AM5/12/20

to

> Huh? You already *lost* your bet. Kaz Kylheku is simply super
> without any doubt. Linux is open source, glibc is open source
> and PTHREAD_MUTEX_ADAPTIVE_NP is fastest there for most things.

Write a benchmark.
Or look at Windows. Windows has a fixed spin-count. My benchmark
does the same and counts the number of successful and unsuccessful
spins. Please compile it and give me parameters that prove these
stupid ideas.
That's the latest code:

#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#include <iostream>
#include <cstdint>
#include <cstdlib>
#include <vector>
#include <thread>
#include <random>
#include <atomic>
#include <csignal>

#pragma warning(disable: 6031)
#pragma warning(disable: 6387)
#pragma warning(disable: 26495)

using namespace std;

struct SpinMutex
{
SpinMutex( uint32_t spinCount );
void lock();
void unlock();
uint64_t getAndResetSpinSuceeds();
uint64_t getAndResetSpinFails();

private:
uint32_t spinCount;
atomic<uint32_t> lockCounter;
HANDLE hEvent;
static
atomic<uint64_t> spinSucceeds,
spinFails;

uint64_t getAndZero( atomic<uint64_t> &value );
};

atomic<uint64_t> SpinMutex::spinSucceeds = 0;
atomic<uint64_t> SpinMutex::spinFails = 0;

SpinMutex::SpinMutex( uint32_t spinCount ) :
spinCount( spinCount ),
lockCounter( 0 ),
hEvent( CreateEvent( nullptr, false, false, nullptr ) )
{
}

void SpinMutex::lock()

{
for( uint32_t sc = spinCount; sc; --sc )
{
uint32_t cmp = 0;
if( lockCounter.compare_exchange_weak( cmp, 1, memory_order_acquire,
memory_order_relaxed ) )
{
if( --sc ) // only count if it is not the first attempt to lock the mutex
spinSucceeds.fetch_add( 1, memory_order_relaxed );
return;
}
}
if( spinCount ) // only count fails if there's a spin-count
spinFails.fetch_add( 1, memory_order_relaxed );
if( lockCounter.fetch_add( 1, memory_order_acquire ) != 0 )
WaitForSingleObject( hEvent, INFINITE );
}

void SpinMutex::unlock()

{
if( lockCounter.fetch_sub( 1, memory_order_release ) != 1 )
SetEvent( hEvent );
}

inline

uint64_t SpinMutex::getAndResetSpinSuceeds()
{
return getAndZero( spinSucceeds );
}

inline
uint64_t SpinMutex::getAndResetSpinFails()
{
return getAndZero( spinFails );
}

inline
uint64_t SpinMutex::getAndZero( atomic<uint64_t> &value )
{
uint64_t ref = value;
while( !value.compare_exchange_weak( ref, 0 ) );
return ref;

}

void spendCycles( uint64_t cycles );

atomic<bool> stop;

HANDLE hEvtStop;

int main( int argc, char **argv )
{

char const *errStr = "1. number of threads, 2. min non locked cycles,
3. max non locked cycles, 4. min locked cycles, 5. max locked cycles, 6.
spin-count";
if( argc < 7 )
{
cout << errStr << endl;

return EXIT_FAILURE;
}
unsigned nThreads;
unsigned long long minNonLockedCycles, maxNonLockedCycles;
unsigned long long minLockedCycles, maxLockedCycles;
unsigned spinCount;
sscanf( argv[1], "%u", &nThreads );
sscanf( argv[2], "%llu", &minNonLockedCycles );
sscanf( argv[3], "%llu", &maxNonLockedCycles );
sscanf( argv[4], "%llu", &minLockedCycles );
sscanf( argv[5], "%llu", &maxLockedCycles );
sscanf( argv[6], "%u", &spinCount );

if( nThreads == 0 || minNonLockedCycles > maxNonLockedCycles ||
minLockedCycles > maxLockedCycles )
{
cout << errStr << endl;
return EXIT_FAILURE;

}
auto sigHandler = []( int sig ) -> void
{
::stop = true;
SetEvent( ::hEvtStop );
signal( SIGINT, SIG_IGN );
};

::stop = false;

::hEvtStop = CreateEvent( nullptr, FALSE, FALSE, nullptr );

signal( SIGINT, sigHandler );
SpinMutex sm( spinCount );
auto thr = [&]()
{
random_device rd;
minstd_rand mr( rd() );
uniform_int_distribution<uint64_t> uidNonLocked( minNonLockedCycles,
maxNonLockedCycles );
uniform_int_distribution<uint64_t> uidLocked( minLockedCycles,
maxLockedCycles );

while( !stop.load( memory_order_relaxed ) )

spendCycles( uidNonLocked( mr ) ),
sm.lock(),
spendCycles( uidLocked( mr ) ),
sm.unlock();
};
vector<thread> threads;

for( unsigned t = 0; t != nThreads; ++t )

threads.emplace_back( thr );

while( !stop.load( memory_order_relaxed ) )
if( WaitForSingleObject( ::hEvtStop, 1000 ) == WAIT_TIMEOUT )
{

cout << "succeeds: " << sm.getAndResetSpinSuceeds() << endl;
cout << "fails: " << sm.getAndResetSpinFails() << endl;

}
for( thread &t : threads )
t.join();
}

PUBLIC ?spendCycles@@YAX_K@Z
_TEXT SEGMENT
?spendCycles@@YAX_K@Z PROC
test rcx, rcx
jz byebye
cycleLoop:
dec rcx ; roughly one jump per clock-cycle
jnz cycleLoop ; because there's only one branch-unit
byebye:
ret
?spendCycles@@YAX_K@Z ENDP
_TEXT ENDS
END

I removed the random-table and changed to LCG-random-numbers
for the non-locked-interval as well for the locked-interval.

So give me the six parameters that prove what you said.

> https://bugzilla.mozilla.org/show_bug.cgi?id=1457882

That's not the proof that I asked for.

Bonita Montero

unread,

May 12, 2020, 2:52:05 AM5/12/20

to

So you can download the MSVC-2019-project here:
https://fil.email/ffP0ghTV
Compile it and give me the parameters that prove
your assumptions.

Ian Collins

unread,

May 12, 2020, 2:53:40 AM5/12/20

to

On 12/05/2020 18:43, Öö Tiib wrote:
> On Tuesday, 12 May 2020 07:54:06 UTC+3, Bonita Montero wrote:
>>> Kaz Kylheku who originally invented that PTHREAD_MUTEX_ADAPTIVE_NP has
>>> said that it has spin count adjusted by such estimator. Why should
>>> he lie?
>>
>> Give the sourcecode as well as a benchmark.
>> And Windows does it with a fixed spin-count. If this should work my
>> benchmark should do the same given appropriate paramters. So compile
>> my benchmark, experiment with different parameters and give a proof.
>> I'll bet there will be no proof from you.
>
> Huh? You already *lost* your bet. Kaz Kylheku is simply super without
> any doubt. Linux is open source, glibc is open source and
> PTHREAD_MUTEX_ADAPTIVE_NP is fastest there for most things.
>
> Observe internet.

https://stackoverflow.com/questions/19863734/what-is-pthread-mutex-adaptive-np

The top reply is from Kaz.

--
Ian.

Öö Tiib

unread,

May 12, 2020, 3:00:09 AM5/12/20

to

On Tuesday, 12 May 2020 09:49:46 UTC+3, Bonita Montero wrote:
> > Huh? You already *lost* your bet. Kaz Kylheku is simply super
> > without any doubt. Linux is open source, glibc is open source
> > and PTHREAD_MUTEX_ADAPTIVE_NP is fastest there for most things.
>
> Write a benchmark.

Why should I waste time to write a benchmark for checking that earth
is round?

> Or look at Windows.

Windows is closed source garbage. Microsoft is awful, look what they
did with Skype. It was great, but now Zoom beats it anytime in all
aspects.

>
> > https://bugzilla.mozilla.org/show_bug.cgi?id=1457882
>
> That's not the proof that I asked for.

Sure you did not ask for proof that Bonita is stupid, and not spin
locks ... but these are the only proofs there are to find.

Bonita Montero

unread,

May 12, 2020, 3:03:38 AM5/12/20

to

> Windows is closed source garbage. Microsoft is awful, look what they
> did with Skype. It was great, but now Zoom beats it anytime in all
> aspects.

My code does the same Windows does and it is not closed-souce.
And some other implementations also have a fixed spin-count.

Öö Tiib

unread,

May 12, 2020, 3:56:22 AM5/12/20

to

On Tuesday, 12 May 2020 10:03:38 UTC+3, Bonita Montero wrote:
> > Windows is closed source garbage. Microsoft is awful, look what they
> > did with Skype. It was great, but now Zoom beats it anytime in all
> > aspects.
>
> My code does the same Windows does and it is not closed-souce.

Chris M. Thomasson gave you advice "Please implement your code using

a race detector, it will find your bugs. Try Relacy..."

No one will look into that straw man code before you have fixed it.

> And some other implementations also have a fixed spin-count.

How does it matter? In our industry the real thing, the best of the
breed is against what we should compare. Rest of "implementations"
are performance issues to work around:
https://bugzilla.mozilla.org/show_bug.cgi?id=1457882

Bonita Montero

unread,

May 12, 2020, 3:58:30 AM5/12/20

to

> Chris M. Thomasson gave you advice "Please implement your code
> using a race detector, it will find your bugs. Try Relacy..."

Races are absolutely legal in this situation and, if your theory
is right, should be prevented by an appropriate spin-count.
And don't believe what Chris says; he is an underskilled developer
with a lot of ideas which are not thought to the end.

Öö Tiib

unread,

May 12, 2020, 6:21:22 AM5/12/20

to

Races are undefined behavior in every code however low, and so if
your straw man mutex allows races then it is worthless to compile.
Chris is far more skilled than you.

Most of work of every good program is embarrassingly parallel.
The shared resources are used for to synchronize effort between
threads. Bigger shared resources are protected with locks; smaller
by having atomic access. Accesses of shared resources are designed
to be quick in scalable architectures for not to make other threads
to wait.

Now lets look at your "skilled" benchmark. It is simulation of
bogus design that does not scale anyway. The threads are doomed
to drowse after one that sits and burns random amount of cycles in
critical section. What is the difference how lot of cores we have,
threads we make or what locks we use when the architecture of
it is defective?

Bonita Montero

unread,

May 12, 2020, 6:45:07 AM5/12/20

to

> Races are undefined behavior in every code however low, and so if
> your straw man mutex allows races then it is worthless to compile.

No, the race here determines which thread can aquire the lock.
Race-conditions don't directly lead to invalid code with lockfree
code and in code which realises locks. In these cases race-con-
ditions determine how is the first to acuire a lock or to update
the data in a lockfree strucure. Every mutex depends on race-con-
ditions!

> Chris is far more skilled than you.

Lol.

> Now lets look at your "skilled" benchmark.
> It is simulation of bogus design that does not scale anyway.

It scales depending on the parameters you give. And having a lower
and upper bound for the random values for the locking-interval as
well for the unlocked interval is absolutely realistic.

> The threads are doomed to drowse after one that sits and burns
> random amount of cycles in critical section.

That's realistic. The time spent in a critical section isn't usually
constant. You can set a lower and upper bound for this time

> What is the difference how lot of cores we have, threads we make
> or what locks we use when the architecture of it is defective?

I think it's worthless to ask for the correct parameters as you
even don't understand the code. You're a underskilled developer
also. As you don't understand that usual mutexes have race-con-
ditions every discussion with you is a waste of time.

Öö Tiib

unread,

May 12, 2020, 7:25:29 AM5/12/20

to

On Tuesday, 12 May 2020 13:45:07 UTC+3, Bonita Montero wrote:
> > Races are undefined behavior in every code however low, and so if
> > your straw man mutex allows races then it is worthless to compile.
>
> No, the race here determines which thread can aquire the lock.
> Race-conditions don't directly lead to invalid code with lockfree
> code and in code which realises locks. In these cases race-con-
> ditions determine how is the first to acuire a lock or to update
> the data in a lockfree strucure. Every mutex depends on race-con-
> ditions!

More nonsense. Mutexes and lock free schemas can be made using
implementation specific atomic accesses and memory barriers but
not with race conditions. And you clearly use standard atomics
there.

If your garbage allows race conditions then it is undefined behavior
and labeling it as "mutex" or "lock free" does grant no privileges.

> > Chris is far more skilled than you.
>
> Lol.
>
> > Now lets look at your "skilled" benchmark.
> > It is simulation of bogus design that does not scale anyway.
>
> It scales depending on the parameters you give. And having a lower
> and upper bound for the random values for the locking-interval as
> well for the unlocked interval is absolutely realistic.
>
> > The threads are doomed to drowse after one that sits and burns
> > random amount of cycles in critical section.
>
> That's realistic. The time spent in a critical section isn't usually
> constant. You can set a lower and upper bound for this time
>
> > What is the difference how lot of cores we have, threads we make
> > or what locks we use when the architecture of it is defective?
>
> I think it's worthless to ask for the correct parameters as you
> even don't understand the code.

I won't sure compile it. It is some kind of newbie crap with
non-static getters for static members. You really refuse to test
it claiming like you did there implement mutexes using memory
fences and volatiles and so tools show false positives?

> You're a underskilled developer
> also. As you don't understand that usual mutexes have race-con-
> ditions every discussion with you is a waste of time.

Fine with me. Talk to mirror.

Bonita Montero

unread,

May 12, 2020, 7:32:29 AM5/12/20

to

> More nonsense. Mutexes and lock free schemas can be made using
> implementation specific atomic accesses and memory barriers but
> not with race conditions. And you clearly use standard atomics
> there.

A race-condition is when the data being modified by multiple threads
is dependent on the timely order of the threads. The shared data of
two threads trying to acquire a mutex is the lock-counter. And how
this is modified depends on how the acesses are timed. So mutexes
(and lockfeee data-structures) are an exception from the rule that
race-conditions are always errors.

> I won't sure compile it. It is some kind of newbie crap with
> non-static getters for static members.

I know that but that doesn't matter here. What matters is that
is sufficient to prove what you said - if it could be proved.

Bonita Montero

unread,

May 12, 2020, 7:58:10 AM5/12/20

to

> I won't sure compile it. It is some kind of newbie crap with

> non-static getters for static members. ...

The both counters need to be non-static, not the accessors static.
But that has nothing to do with if the code is suitable to proof
something you assume.

Öö Tiib

unread,

May 12, 2020, 8:10:24 AM5/12/20

to

On Tuesday, 12 May 2020 14:32:29 UTC+3, Bonita Montero wrote:
> > More nonsense. Mutexes and lock free schemas can be made using
> > implementation specific atomic accesses and memory barriers but
> > not with race conditions. And you clearly use standard atomics
> > there.
>
> A race-condition is when the data being modified by multiple threads
> is dependent on the timely order of the threads. The shared data of
> two threads trying to acquire a mutex is the lock-counter. And how
> this is modified depends on how the acesses are timed. So mutexes
> (and lockfeee data-structures) are an exception from the rule that
> race-conditions are always errors.

That is not what standard says. Standard defines race like
that:
"The execution of a program contains a /data race/ if it contains two
potentially concurrent conflicting actions, at least one of which is
not atomic, and neither happens before the other, except for the
special case for signal handlers described below. Any such data
race results in undefined behavior."

Bonita Montero

unread,

May 12, 2020, 8:17:42 AM5/12/20

to

> That is not what standard says. Standard defines race like
> that:
> "The execution of a program contains a /data race/ if it contains two
> potentially concurrent conflicting actions, at least one of which is
> not atomic, and neither happens before the other, except for the
> special case for signal handlers described below. Any such data
> race results in undefined behavior."

The C-standard isn't defining what we're talking about here.
The Wikipedia should be more definitive.

Öö Tiib

unread,

May 12, 2020, 8:32:09 AM5/12/20

to

It is C++17 standard and we were talking about your C++ code in
comp.lang.c++ so why Wikipedia's matters?
Ironically Wikipedia quotes same what I did verbatim:
https://en.wikipedia.org/wiki/Race_condition#Example_definitions_of_data_races_in_particular_concurrency_models

Öö Tiib

unread,

May 12, 2020, 9:34:44 AM5/12/20

to

How should anyone know if you wanted static or non-static members
when you wrote static? No one can know what you wanted to write.
Non-static accessories of static members is lousy. Lousiness
indicates that it was shallowly considered. Too casual code
usually contains serious logic errors too. About buggy software
however no one cares how fast it behaves incorrectly and gives
bogus answers. So test it yourself.

Or go join "Genial" "poetical" "philosopher" of "scalable" and
"lock free" stuff in comp.programming if you think that skill comes
from nonsense posted not hard work done.

Bonita Montero

unread,

May 12, 2020, 10:12:20 AM5/12/20

to

> It is C++17 standard and we were talking about your C++ code in
> comp.lang.c++ so why Wikipedia's matters?

You are pettifogging. My definition applies to every language.
Think my implmenation would be written in any other language.

Bonita Montero

unread,

May 12, 2020, 10:14:22 AM5/12/20

to

> How should anyone know if you wanted static or non-static members
> when you wrote static?

You said that the accessors need to be static so you alleged
to know it.

> Lousiness
> indicates that it was shallowly considered. Too casual code
> usually contains serious logic errors too. About buggy software
> however no one cares how fast it behaves incorrectly and gives
> bogus answers. So test it yourself.

My code does everything what it is needed to prove what
you assume - if it can be proven. So give me the numbers !

Öö Tiib

unread,

May 12, 2020, 11:12:05 AM5/12/20

to

It is you who are wiggling away from testing. I do not
care. Test that your mutex does not contain data races what C++ standard
defines to be undefined behavior and then test that using your mutex
the more strict race like your Wikipedia defines can be avoided.
Otherwise it is just random code not worth to compile.

Öö Tiib

unread,

May 12, 2020, 11:18:08 AM5/12/20

to

On Tuesday, 12 May 2020 17:14:22 UTC+3, Bonita Montero wrote:
> > How should anyone know if you wanted static or non-static members
> > when you wrote static?
>
> You said that the accessors need to be static so you alleged
> to know it.

You lie. I wrote: "It is some kind of newbie crap with
non-static getters for static members." It does say
nothing what you try to misrepresent me claiming.

Bonita Montero

unread,

May 12, 2020, 11:39:08 AM5/12/20

to

> It is you who are wiggling away from testing.

I give you the code, you shoud prove what you say if you know
what you're talking about; but you don't-

> do not care. Test that your mutex does not contain data races what C++

> standard defines to be undefined behavior ...

Aside from the getter-issue and that the mutex doesn't handle resource
-errors the code is correct. That you think that there are races which
make it not appropriate for the issue we're talking about again shows
that you're an underskilled "developer.

> Otherwise it is just random code not worth to compile.

You're not able to understand it. Aside from the issues above every
mutex with a static spin-count is structured like mine. I said that
data-races are absolutely normal for how mutexes work, you didn't
understood why I said this and took this as a bug as you lack the
simplest knowledge about how a mutex works.
And when we're talking about the data-races beyond the usual working
of a mutex but with spinning: these are also absolutely intended and
not a bug (aside that spinning doesn't make sense).

Bonita Montero

unread,

May 12, 2020, 11:41:05 AM5/12/20

to

> You lie. I wrote: "It is some kind of newbie crap with
> non-static getters for static members." It does say
> nothing what you try to misrepresent me claiming.

Now you're trying to whitewash yourself.
The above exactly means what I said.

Chris M. Thomasson

unread,

May 12, 2020, 3:50:44 PM5/12/20

to

On 5/11/2020 9:53 PM, Bonita Montero wrote:
>> Kaz Kylheku who originally invented that PTHREAD_MUTEX_ADAPTIVE_NP has
>> said that it has spin count adjusted by such estimator. Why should
>> he lie?
>
> Give the sourcecode as well as a benchmark.
> And Windows does it with a fixed spin-count.

Why do you say that? Humm...

https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-setcriticalsectionspincount

This allows the spin count to be modified.

Chris M. Thomasson

unread,

May 12, 2020, 3:57:31 PM5/12/20

to

On 5/12/2020 12:50 PM, Chris M. Thomasson wrote:
> On 5/11/2020 9:53 PM, Bonita Montero wrote:
>>> Kaz Kylheku who originally invented that PTHREAD_MUTEX_ADAPTIVE_NP has
>>> said that it has spin count adjusted by such estimator. Why should
>>> he lie?
>>
>> Give the sourcecode as well as a benchmark.
>> And Windows does it with a fixed spin-count.
>
> Why do you say that? Humm...
>
> https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-setcriticalsectionspincount
>
>
> This allows the spin count to be modified.

[...]

Bonita, calm down and realize that you are having some real trouble with
understanding adaptive mutexs. You don't "seem" to have any experience
with them.

Btw, please run your code through a race detector to kill off the bugs.

Chris M. Thomasson

unread,

May 12, 2020, 3:59:31 PM5/12/20

to

On 5/11/2020 11:49 PM, Bonita Montero wrote:
>> Huh? You already *lost* your bet. Kaz Kylheku is simply super
>> without any doubt. Linux is open source, glibc is open source
>> and PTHREAD_MUTEX_ADAPTIVE_NP is fastest there for most things.
>
> Write a benchmark.

> Or look at Windows. Windows has a fixed spin-count. [...]

https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-setcriticalsectionspincount

[...]

Chris M. Thomasson

unread,

May 12, 2020, 4:00:21 PM5/12/20

to

On 5/11/2020 4:05 AM, Bonita Montero wrote:
>>>> Its almost as if you do not understand why they were created in the
>>>> first place? To avoid the slow-path!
>
>>> But as I said the slow path is very unlikely to happen if you're
>>> in the situation where locks are only held a very short time.
>
>> But if the slow path is taken always, the locks are held for a long
>> time always.
>
> Doesn't change what I said.
>

lol.

Bonita Montero

unread,

May 12, 2020, 4:07:03 PM5/12/20

to

> Why do you say that? Humm...
> https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-setcriticalsectionspincount
> This allows the spin count to be modified.

You're so stupid, dude.
That's the same what my code allows.

Bonita Montero

unread,

May 12, 2020, 4:08:15 PM5/12/20

to

> Bonita, calm down and realize that you are having some real trouble with
> understanding adaptive mutexs. You don't "seem" to have any experience
> with them.

My mutex exactly does the same what the Windows mutex does.

> Btw, please run your code through a race detector to kill off the bugs.

Tell me what the bugs are. There aren't any.

Bonita Montero

unread,

May 12, 2020, 4:09:39 PM5/12/20

to

>> Write a benchmark.
>> Or look at Windows. Windows has a fixed spin-count. [...]

> https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-setcriticalsectionspincount

Take my code _which_does_the_same_like_the_Windows_critical_section_
_with_spin_lock_does_ and show wich are the appropriate parameters
which show the benefit of spinning !

Chris M. Thomasson

unread,

May 12, 2020, 4:10:03 PM5/12/20

to

On 5/12/2020 12:58 AM, Bonita Montero wrote:
>> Chris M. Thomasson gave you advice "Please implement your code
>> using a race detector, it will find your bugs. Try Relacy..."
>
> Races are absolutely legal in this situation and, if your theory
> is right, should be prevented by an appropriate spin-count.

Huh? What does the spin-count have to do with a nasty race condition?
You do not seem to know what you are talking about. Think of a race
condition where the wrong memory barrier was used.

> And don't believe what Chris says; he is an underskilled developer
> with a lot of ideas which are not thought to the end.
>

lol.

Chris M. Thomasson

unread,

May 12, 2020, 4:12:07 PM5/12/20

to

On 5/12/2020 1:08 PM, Bonita Montero wrote:
>> Bonita, calm down and realize that you are having some real trouble
>> with understanding adaptive mutexs. You don't "seem" to have any
>> experience with them.
>
> My mutex exactly does the same what the Windows mutex does.

You just tried to tell us that Windows has a fixed spin count. ;^o

>> Btw, please run your code through a race detector to kill off the bugs.
>
> Tell me what the bugs are. There aren't any.
>

Did you finish correcting them all? You made several of them. Btw,
Relacy can be found here:

https://github.com/dvyukov/relacy

Trust me, it helps.

Chris M. Thomasson

unread,

May 12, 2020, 4:12:59 PM5/12/20

to

You are having memory loss. You said that windows has a fixed spin
count! In reality, it can be adjusted.

Bonita Montero

unread,

May 12, 2020, 4:14:13 PM5/12/20

to

>> My mutex exactly does the same what the Windows mutex does.

> You just tried to tell us that Windows has a fixed spin count. ;^o

You don't understand what I mean with "fixed spin-count". I don't mean
that it isn't adjustable after creation of the mutex. I mean that it
isn't adaptive like with Linux.

>>> Btw, please run your code through a race detector to kill off the bugs.

>> Tell me what the bugs are. There aren't any.

> Did you finish correcting them all? You made several of them. Btw,
> Relacy can be found here:
> https://github.com/dvyukov/relacy
> Trust me, it helps.

I'm pretty sure you haven't read my code.

Bonita Montero

unread,

May 12, 2020, 4:14:38 PM5/12/20

to

>> You're so stupid, dude.
>> That's the same what my code allows.

> You are having memory loss. You said that windows has a fixed spin
> count! In reality, it can be adjusted.

Read my other posting !

Bonita Montero

unread,

May 12, 2020, 4:17:15 PM5/12/20

to

>> Races are absolutely legal in this situation and, if your theory
>> is right, should be prevented by an appropriate spin-count.

> Huh? What does the spin-count have to do with a nasty race condition?

The race-condition on the lock-counter determines which thread acquires
the mutex first.

> You do not seem to know what you are talking about.
> Think of a race condition where the wrong memory barrier was used.

I've used correct memory-barriers: acquire on successful locks, relase
on successful unlocks, otherewise releaxed when I repeat because the
lock-counter has changed meanwhile.
Sorry, dude, read the code firs before you tell nonsense.

Chris M. Thomasson

unread,

May 12, 2020, 4:17:30 PM5/12/20

to

No. I read where you made some corrections.

Chris M. Thomasson

unread,

May 12, 2020, 4:18:31 PM5/12/20

to

You said Windows uses a fixed spin count. It can be dynamically altered.

Bonita Montero

unread,

May 12, 2020, 4:19:28 PM5/12/20

to

>> I'm pretty sure you haven't read my code.

> No. I read where you made some corrections.

You make assumptions on my code without reading it.

Bonita Montero

unread,

May 12, 2020, 4:20:47 PM5/12/20

to

>> Take my code _which_does_the_same_like_the_Windows_critical_section_
>> _with_spin_lock_does_ and show wich are the appropriate parameters
>> which show the benefit of spinning !

> You said Windows uses a fixed spin count. It can be dynamically altered.

Yes, it does. But you don't know what this means. This doesn't mean
that the spin-count isn't adjustable. This does mean that it isn't
adaptive, i.e. self-adjusted, like under Linux.

Bonita Montero

unread,

May 12, 2020, 4:22:26 PM5/12/20

to

>> You said Windows uses a fixed spin count. It can be dynamically altered.

> Yes, it does. But you don't know what this means. This doesn't mean
> that the spin-count isn't adjustable. This does mean that it isn't
> adaptive, i.e. self-adjusted, like under Linux.

If you would have really read the thead, especially what Öö Tiib
said, there would have been no doubt what we were talking about.

Bonita Montero

unread,

May 12, 2020, 4:28:46 PM5/12/20

to

> You make assumptions on my code without reading it.

Here it is again. Now with some POSIX-compatibility. Unfortunately
I'm not familiar how to write inline-assembly for g++. So I've just
written a spendCycles-routine which should spend 8 clock-cycles for
each iteration on my CPU. On other CPUs this might be different. So
mayme someone here could supply some inline-assembly to make the
code of spendCycles timing-equivalent for Li

#if defined(_MSC_VER)
#define _CRT_SECURE_NO_WARNINGS
#include <Windows.h>
#elif defined(__unix__)
#include <semaphore.h>
#endif
#include <iostream>
#include <cstdint>
#include <cstdlib>
#include <vector>
#include <thread>
#include <random>
#include <atomic>
#include <csignal>

#pragma warning(disable: 6031)
#pragma warning(disable: 6387)
#pragma warning(disable: 26495)

using namespace std;

struct Semaphore
{
Semaphore();
void release();
void wait();
private:
#if defined(_MSC_VER)
HANDLE hSem;
#elif defined(__unix__)
sem_t sem;
#endif
};

inline
Semaphore::Semaphore()
#if defined(_MSC_VER)
: hSem( CreateSemaphore( nullptr, 0, 0x7FFFFFFF, nullptr ) )
#endif
{
#if defined(__unix__)
sem_init( &sem, 0, 0 );
#endif
}

inline
void Semaphore::release()
{
#if defined(_MSC_VER)
ReleaseSemaphore( hSem, 1, nullptr );
#elif defined(__unix__)
sem_post( &sem );
#endif
}

inline
void Semaphore::wait()
{
#if defined(_MSC_VER)
WaitForSingleObject( hSem, INFINITE );
#elif defined(__unix__)
sem_wait( &sem );
#endif
}

struct SpinMutex
{
SpinMutex( uint32_t spinCount );
void lock();
void unlock();
uint64_t getAndResetSpinSuceeds();
uint64_t getAndResetSpinFails();
private:
uint32_t spinCount;
atomic<uint32_t> lockCounter;
Semaphore sem;
atomic<uint64_t> spinSucceeds,
spinFails;
uint64_t getAndZero( atomic<uint64_t> &value );
};

SpinMutex::SpinMutex( uint32_t spinCount ) :
spinCount( spinCount ),
lockCounter( 0 ),
spinSucceeds( 0 ),
spinFails( 0 )
{
}

void SpinMutex::lock()
{
for( uint32_t sc = spinCount; sc; --sc )
{
uint32_t cmp = 0;
if( lockCounter.compare_exchange_weak( cmp, 1, memory_order_acquire,
memory_order_relaxed ) )
{
if( --sc ) // only count if it is not the first attempt to lock the mutex
spinSucceeds.fetch_add( 1, memory_order_relaxed );
return;
}
}
if( spinCount ) // only count fails if there's a spin-count
spinFails.fetch_add( 1, memory_order_relaxed );
if( lockCounter.fetch_add( 1, memory_order_acquire ) != 0 )
sem.wait();
}

void SpinMutex::unlock()
{
if( lockCounter.fetch_sub( 1, memory_order_release ) != 1 )
sem.release();
}

inline
uint64_t SpinMutex::getAndResetSpinSuceeds()
{
return getAndZero( spinSucceeds );
}

inline
uint64_t SpinMutex::getAndResetSpinFails()
{
return getAndZero( spinFails );
}

inline
uint64_t SpinMutex::getAndZero( atomic<uint64_t> &value )
{
uint64_t ref = value;
while( !value.compare_exchange_weak( ref, 0 ) );
return ref;
}

void spendCycles( uint64_t cycles );

#if defined(__unix__)
void spendCycles( uint64_t cycles )
{
uint64_t volatile c = cycles;
while( c-- );
}
#endif

atomic<bool> stop;

int main( int argc, char **argv )
{
char const *errStr = "1. number of threads, 2. min non locked cycles,
3. max non locked cycles, 4. min locked cycles, 5. max locked cycles, 6.
spin-count";
if( argc < 7 )
{
cout << errStr << endl;
return EXIT_FAILURE;
}
unsigned nThreads;
unsigned long long minNonLockedCycles, maxNonLockedCycles;
unsigned long long minLockedCycles, maxLockedCycles;
unsigned spinCount;
sscanf( argv[1], "%u", &nThreads );
sscanf( argv[2], "%llu", &minNonLockedCycles );
sscanf( argv[3], "%llu", &maxNonLockedCycles );
sscanf( argv[4], "%llu", &minLockedCycles );
sscanf( argv[5], "%llu", &maxLockedCycles );
sscanf( argv[6], "%u", &spinCount );
if( nThreads == 0 || minNonLockedCycles > maxNonLockedCycles ||
minLockedCycles > maxLockedCycles )
{
cout << errStr << endl;
return EXIT_FAILURE;
}
auto sigHandler = []( int sig ) -> void
{
::stop = true;
signal( SIGINT, SIG_IGN );
};
::stop = false;
signal( SIGINT, sigHandler );
SpinMutex sm( spinCount );
auto thr = [&]()
{
random_device rd;
minstd_rand mr( rd() );
uniform_int_distribution<uint64_t> uidNonLocked( minNonLockedCycles,
maxNonLockedCycles );
uniform_int_distribution<uint64_t> uidLocked( minLockedCycles,
maxLockedCycles );
while( !stop.load( memory_order_relaxed ) )
spendCycles( uidNonLocked( mr ) ),
sm.lock(),
spendCycles( uidLocked( mr ) ),
sm.unlock();
};
vector<thread> threads;
for( unsigned t = 0; t != nThreads; ++t )
threads.emplace_back( thr );
while( !stop.load( memory_order_relaxed ) )
{
this_thread::sleep_for( 1s );
cout << "succeeds: " << sm.getAndResetSpinSuceeds() << endl;
cout << "fails: " << sm.getAndResetSpinFails() << endl;
}
for( thread &t : threads )
t.join();
}