Why is that usually faster than a normal string

Bonita Montero

unread,

Nov 24, 2022, 2:12:41 AM11/24/22

to

bool svCmp( string_view const &left, string_view const &right )
{
if( left.size() != right.size() )
return false;
#if defined(_WIN32)
__try
{
return memcmp( left.data(), right.data(), left.size() ) == 0;
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
return left == right;
}
#else
static size_t pageMask = 0;
if( !pageMask ) [[unlikely]]
pageMask = ~((size_t)sysconf( _SC_PAGESIZE ) - 1);
auto inSamePage = []( string_view const &sv ) -> bool
{
return ((size_t)to_address( sv.begin() ) & pageMask) ==
((size_t)to_address( sv.end() ) & pageMask);
};
if( inSamePage( left ) && inSamePage( right ) ) [[likely]]
return memcmp( left.data(), right.data(), left.size() ) == 0;
return left == right;
#endif
}

Alf P. Steinbach

unread,

Nov 24, 2022, 5:02:44 AM11/24/22

to

Probably not what your're asking (why it's apparently faster than your
standard library's operator== for string_view) but

* Can make thread safe by using initialization for `pageMask` instead of
`if`.
* Can make pointer-size-safe by using `uintptr_t` instead of `size_t`.
* Can make more clear by s/inSamePage/isWithinOnePage/g

The thread safety introduces a little nano-seconds overhead.

I'm surprised that you defer to the standard library's == in the case
where at least one object spans two or more pages; does `memcmp`
introduce overhead then, and if so how does == handle that?

- Alf

Bonita Montero

unread,

Nov 24, 2022, 10:23:44 AM11/24/22

to

Am 24.11.2022 um 11:02 schrieb Alf P. Steinbach:

> * Can make thread safe by using initialization for `pageMask` instead of
> `if`.

On which of today's architecutre writing a general purpose register
isn't atomic ?

> * Can make pointer-size-safe by using `uintptr_t` instead of `size_t`.

You can silently rely on the fact that a size_t has the width of
a pointe for at least 20 years. uintptr_t is compusive nonsense.

> * Can make more clear by s/inSamePage/isWithinOnePage/g

LOL

> I'm surprised that you defer to the standard library's == in the case
> where at least one object spans two or more pages; does `memcmp`
> introduce overhead then, and if so how does == handle that?

memcmp() requires two completely addressable ranges.

Alf P. Steinbach

unread,

Nov 24, 2022, 11:33:01 AM11/24/22

to

On 24 Nov 2022 16:23, Bonita Montero wrote:
> Am 24.11.2022 um 11:02 schrieb Alf P. Steinbach:

> [snip]

>
>> I'm surprised that you defer to the standard library's == in the case
>> where at least one object spans two or more pages; does `memcmp`
>> introduce overhead then, and if so how does == handle that?
>
> memcmp() requires two completely addressable ranges.

You mean contiguous addresses. And they are. Pages are invisible to C++
memory addressing, except for timing.

- Alf

Richard Damon

unread,

Nov 24, 2022, 11:40:26 AM11/24/22

to

On 11/24/22 10:23 AM, Bonita Montero wrote:
> Am 24.11.2022 um 11:02 schrieb Alf P. Steinbach:
>
>> * Can make thread safe by using initialization for `pageMask` instead
>> of `if`.
>
> On which of today's architecutre writing a general purpose register
> isn't atomic ?
>
>> * Can make pointer-size-safe by using `uintptr_t` instead of `size_t`.
>
> You can silently rely on the fact that a size_t has the width of
> a pointe for at least 20 years. uintptr_t is compusive nonsense.

But it isn't actually True.

It fails for segmented memory archetectures.

Yes, such archetectures are not common now, but not totally impossible.

Why use the WRONG type, when the right type exists.

Bonita Montero

unread,

Nov 24, 2022, 12:17:03 PM11/24/22

to

Of course but C++ implicitly has the notion of memory which
isn't safely addressable, thereby allowing sth. like paging.

Bonita Montero

unread,

Nov 24, 2022, 12:18:40 PM11/24/22

to

Am 24.11.2022 um 17:35 schrieb Richard Damon:

> It fails for segmented memory archetectures.

I don't care for these architecturs since I've not targeted such
architecutres for at least 20 years and I will never again and I
assume that nearly everyone doesn't do that also.

> Why use the WRONG type, when the right type exists.

size_t fits here and I will never run into troubles with that

Andrey Tarasevich

unread,

Nov 24, 2022, 1:40:44 PM11/24/22

to

On 11/23/2022 11:12 PM, Bonita Montero wrote:
> ...

It isn't. "Normal string_view == comparison" normally just compares
lengths and then calls `memcmp`. So, adding a bunch of checks on top of
doing the same thing manually (why?) is not going to make it faster. And
it doesn't.

--
Best regards,
Andrey

Richard Damon

unread,

Nov 24, 2022, 6:49:54 PM11/24/22

to

Your choice.

Note, That also means you need to be careful about ports to DSPs that
decide to use extended pointers to handle character addressing.

Bonita Montero

unread,

Nov 24, 2022, 10:32:31 PM11/24/22

to

Am 25.11.2022 um 00:49 schrieb Richard Damon:

> Note, That also means you need to be careful about ports to DSPs
> that decide to use extended pointers to handle character addressing.

These DSPs don't need portable code anyway.

Richard Damon

unread,

Nov 24, 2022, 10:45:24 PM11/24/22

to

SOME of the code will be specific for them, but other, typically
non-time critical code could well want to be portable.

You never really know what code might want to be ported to what
architectures in the future. Intentionally making it non-portable
without reason just adds to the technical debt in the code.

Öö Tiib

unread,

Nov 25, 2022, 2:46:36 AM11/25/22

to

Comparing texts sounds like some kind of text configuration reading
or receiving requests with text-based protocol. So if it is time-critical
then it should be replaced with some more compact binary like
flatbuffers or such anyway.

Bonita Montero

unread,

Dec 1, 2022, 1:23:40 PM12/1/22

to

At last I only managed to have a faster comparison
of a char-array against a string_view on Windows:

template<typename CharType, typename TraitsType>
#if defined(__cpp_concepts)
requires std::same_as<make_unsigned_t<CharType>, unsigned char>
|| std::same_as<make_unsigned_t<CharType>, wchar_t>
|| std::same_as<make_unsigned_t<CharType>, char8_t>
|| std::same_as<make_unsigned_t<CharType>, char16_t>
|| std::same_as<make_unsigned_t<CharType>, char32_t>
#endif
bool svCmp( CharType const *str, std::basic_string_view<CharType,
TraitsType> const &sv )
{
#if defined(_WIN32)
__try
{
return !str[sv.length()] && memcmp( str, sv.data(), sv.length() *
sizeof(CharType) ) == 0;
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
using sv_cit = std::string_view::const_iterator;
for( sv_cit itSv = sv.cbegin(), itSvEnd = sv.cend(); itSv != itSvEnd;
++itSv, ++str )
if( *str ) [[likely]]
if( *str != *itSv ) [[unlikely]]
return false;
else;
else
return false;
return true;
}
#elif defined(__unix__)
return str == sv;
#endif
}

On Linux this doesn't work since I can't trap a page fault that
easily. If I wanted to check if the C-string is fully within a
single page I'd have to count its length before; but if I do
that I could also do a byte-by-byte comparison.
So on Windows the compiler could optimize the comparison itself
in a way that it uses AVX inside memcmp() for example to have a
comparison of 32 byte chunks.

Bonita Montero

unread,

Dec 1, 2022, 3:54:56 PM12/1/22

to

I wrote a little benchmark that reported that my comparison is
nearly exactly 15 times faster than a bare strcmp() on my Zen2
computer.

#if defined(_WIN32)
#include <Windows.h>
#endif
#include <iostream>
#include <string_view>
#include <cstring>
#include <memory>
#include <chrono>
#include <type_traits>
#include <atomic>

template<typename CharType, typename TraitsType>

requires std::same_as<std::make_unsigned_t<CharType>, unsigned char>
|| std::same_as<CharType, wchar_t>
|| std::same_as<CharType, char8_t>
|| std::same_as<CharType, char16_t>
|| std::same_as<CharType, char32_t>

bool svCmp( CharType const *str, std::basic_string_view<CharType,
TraitsType> const &sv )
{
#if defined(_WIN32)
__try
{
return !str[sv.length()] && memcmp( str, sv.data(), sv.length() *
sizeof(CharType) ) == 0;
}
__except( EXCEPTION_EXECUTE_HANDLER )
{
using sv_cit = std::string_view::const_iterator;
for( sv_cit itSv = sv.cbegin(), itSvEnd = sv.cend(); itSv != itSvEnd;
++itSv, ++str )
if( *str ) [[likely]]
if( *str != *itSv ) [[unlikely]]
return false;
else;
else
return false;
return true;
}

#else

return str == sv;
#endif
}

using namespace std;
using namespace chrono;

atomic<size_t> aSum( 0 );

int main()
{
constexpr size_t N = 0x1000;
string
cStr( N, '*' ),
svStr( cStr );
auto cmpStrcmp = []( char const *str, string_view const &sv ) -> bool {
return strcmp( str, sv.data() ) == 0; };
auto cmpSvCmp = []( char const *str, string_view const &sv ) -> bool {
return svCmp( str, sv ); };
using cmp_sig_t = bool (*)( char const *str, string_view const &sv );
auto bench = [&]( cmp_sig_t cmp ) -> double
{
cmp_sig_t volatile vCmp = cmp;
size_t sum = 0;
(void)svStr.c_str();
auto start = high_resolution_clock::now();
for( size_t r = 1'000'000; r--; )
sum += vCmp( cStr.c_str(), svStr );
double dur = (double)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count();
::aSum = sum;
return dur;
};
cout << (bench( cmpStrcmp ) / bench( cmpSvCmp )) << endl;
}

Unfortunately memcmpy actually isn't SSE or AVX-optimized with
Visual C++'s runtime as I expected. But nevertheless memcmp()
does its comparisons in 8 byte chunks with a x64-compile.
I'm using a volatile pointer to the comparison function to
prevent any optimizations on my own function to have a fair
comparison against memcmp().

Mr Flibble

unread,

Dec 1, 2022, 6:36:22 PM12/1/22

to

Your code is terrible, egregiously terrible: buffer overflows are NEVER a
good idea even in example code; and "using namespace std;" is bad too.

/Flibble

Bonita Montero

unread,

Dec 1, 2022, 11:30:47 PM12/1/22

to

Am 02.12.2022 um 00:36 schrieb Mr Flibble:

> Your code is terrible, egregiously terrible: buffer overflows are NEVER a
> good idea even in example code; and "using namespace std;" is bad too.

There's nothing wrong with my code and I gues you
don't even understand it; you're just a bad programmer.

Mut...@dastardlyhq.com

unread,

Dec 2, 2022, 5:09:34 AM12/2/22

to

You think everyone is a bad programmer if they don't recognise your coding
genuise. I don't agree with Flibble about much but he's right about this.
Your code is almost always an overcomplicated mess. I don't know whether its
because you're showing off or you simply can't think clearly but I'm glad I'll
never have to maintain it.

Bonita Montero

unread,

Dec 2, 2022, 9:22:41 AM12/2/22

to

Am 02.12.2022 um 11:09 schrieb Mut...@dastardlyhq.com:

> You think everyone is a bad programmer if they don't recognise your coding

> genuise. ...

The code is reliable and 15 times faster than stramp on my machine.

> Your code is almost always an overcomplicated mess. ...

It's rather simple for this kind of improvement.

Mr Flibble

unread,

Dec 2, 2022, 9:48:43 AM12/2/22

to

Of course I understand the mess that you think is good code: you are
relying on undefined behaviour (buffer overruns and page faults) in the
fucktarded belief you can create a faster string comparison function; if
we ignore the undefined behaviour there is still an issue: most strings
are shorter than a typical page size which is a problem if the length of
"str" is less than the length of "sv"; if you cannot see what that problem
is then it is YOU who is "just a bad programmer".

/Flibble

Bonita Montero

unread,

Dec 2, 2022, 12:39:51 PM12/2/22

to

Am 02.12.2022 um 15:48 schrieb Mr Flibble:

> Of course I understand the mess that you think is good code: you are
> relying on undefined behaviour (buffer overruns and page faults) in the

> fucktarded belief you can create a faster string comparison function; ...

On Windows this style of programming is safe.
The speedup I measured is factor 15.

> most strings are shorter than a typical page size ...

Of course, but I would still get a speedup by comparing eight chars
in each chunk.

> which is a problem if the length of "str" is less than the length of "sv";

The mistake is the opposite, i.e. if str is longer than sv.
But I've fixed that by returning !*str.

Mr Flibble

unread,

Dec 2, 2022, 1:17:54 PM12/2/22

to

No it isn't and you haven't fixed anything: you are a bad programmer
indeed.

/Flibble

Bonita Montero

unread,

Dec 2, 2022, 1:25:50 PM12/2/22

to

Am 02.12.2022 um 19:17 schrieb Mr Flibble:

> No it isn't and you haven't fixed anything: you are a bad programmer
> indeed.

You don't understand your bug assumptions. If the length of str is less
than that of sv this is detected inside the loop. If it is larger this
is detected by looking if the charater inside str at the end-index of
sv is zero. Who's the bad programmer here ?

Mr Flibble

unread,

Dec 2, 2022, 1:51:07 PM12/2/22

to

No it isn't. You are the bad programmer here.

/Flibble

Why is that usually faster than a normal string_view == comparison ?

Bonita Montero

Alf P. Steinbach

Bonita Montero

Alf P. Steinbach

Richard Damon

Bonita Montero

Bonita Montero

Andrey Tarasevich

Richard Damon

Bonita Montero

Richard Damon

Öö Tiib

Bonita Montero

Bonita Montero

Mr Flibble

Bonita Montero

Mut...@dastardlyhq.com

Bonita Montero

Mr Flibble

Bonita Montero

Mr Flibble

Bonita Montero

Mr Flibble