Bonita Montero
unread,Jul 21, 2023, 5:15:35 AM7/21/23You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to
Years ago I wrote a mutex that allows multiple read accesses
or exclusive write access. Instead, I use limited configurable
spinning. Each spinning interval has an x86 PAUSE instruction
to conserve power and not suffocate other threads on the core.
This got me wondering how long a PAUSE instruction takes on
different x86 CPUs. That's why I wrote a small test C++20
program that you can try out. Here it is:
#include <iostream>
#include <chrono>
#include <cstdint>
#include <utility>
#if defined(_MSC_VER)
#include <intrin.h>
#else
#include <x86intrin.h>
#endif
using namespace std;
using namespace chrono;
void pauseLoop( size_t count )
{
auto unroll = []<size_t ...Indices>( index_sequence<Indices ...>, auto fn )
{
(fn.template operator ()<Indices>(), ...);
};
for( ; count >= 10; count -= 10 )
unroll( make_index_sequence<10>(), [&]<size_t I>() { _mm_pause(); } );
while( count-- )
_mm_pause();
}
int main()
{
using hrc_tp = time_point<high_resolution_clock>;
static size_t const TURNS = 100'000'000;
hrc_tp start = high_resolution_clock::now();
pauseLoop( TURNS );
double ns = (int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / (double)TURNS;
cout << ns << "ns" << endl;
}
Agners tables give not values for all CPUs. My Skylake Linux PC
has a latency of 42ns per instruction, my Zen4 Windows PC has a
latency of 12ns.