Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

qsort() vs. std::sort

258 views
Skip to first unread message

Bonita Montero

unread,
Apr 23, 2022, 3:15:38 AM4/23/22
to
I just measured how superior C++'s std::sort is over C's qsort:

#include <iostream>
#include <vector>
#include <random>
#include <chrono>
#include <cstdlib>
#include <algorithm>
#include <memory>
#include <atomic>

using namespace std;
using namespace chrono;

atomic_uint auNonOptimize;

int main()
{
constexpr size_t
V_MAX = 0x1000,
V_MAX_ROUNDS = 0x4000;
vector<unsigned>
vu( V_MAX, 0 ),
vuSort( V_MAX, 0 );
mt19937_64 mt;
uniform_int_distribution<unsigned> uid( 0, -1 );
for( unsigned &u : vu )
u = uid( mt );
auto timedSort = [&]<typename SortFn>( SortFn sortFn, size_t n, size_t
rounds ) -> double
requires requires( SortFn sortFn, unsigned *p ) { { sortFn( p, p ) }; }
{
auto start = high_resolution_clock::now();
unsigned sum = 0;
while( rounds-- )
copy( vu.begin(), vu.begin() + n, vuSort.begin() ),
sortFn( to_address( vuSort.begin() ), to_address( vuSort.begin() ) + n ),
sum += vu[0];
::auNonOptimize = sum;
return (int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / 1.0e9;
};
auto cSort = []( unsigned *begin, unsigned *end )
{
qsort( begin, end - begin, sizeof(unsigned),
[]( void const *left, void const *right ) -> int
{
unsigned
&l = *(unsigned *)left,
&r = *(unsigned *)right;
return l < r ? -1 : l > r ? 1 : 0;
} );
};
auto cppSort = []( unsigned *begin, unsigned *end )
{
sort( begin, end );
};
double tC, tCpp, rel;
for( size_t s = 2, rounds; s <= V_MAX; s *= 2 )
rounds = (V_MAX_ROUNDS / s) * V_MAX_ROUNDS,
tC = timedSort( cSort, s, rounds ),
tCpp = timedSort( cppSort, s, rounds ),
rel = tC / tCpp * 100.0,
cout << s << ": " << (int)(rel + 0.5) << "%" << endl;
}

These are the results on my computer (Ryzen Threadripper 3990X):

2: 279%
4: 303%
8: 531%
16: 611%
32: 543%
64: 627%
128: 536%
256: 481%
512: 402%
1024: 316%
2048: 260%
4096: 230%

Up to 6,3 times faster with less coding !

Bonita Montero

unread,
Apr 23, 2022, 3:31:08 AM4/23/22
to
sum += vusort[0];

fir

unread,
Apr 23, 2022, 6:35:57 AM4/23/22
to
try radix sort, i once measured that things (on 10-yo old machine) and testing sorting 1MB of integers and remember c quicksort (like that from wikipedia) have 150 ms , c++ std:sort 111 ms,
c insertsort 111 ms also or 2 ms worse something like that, c-radix sort (taken from stack overflow) 17 ms beating c++:stdsort more than 3 times
for more details search "quicksort" in this group (the search in google groups work)..i only remember vagually that some version of quicksort i tested had some error which i made by trials of revritting but i dont know which one - the conclusions i remember were btw that quicksort is not so good people belive, insertsort was better afair and especially radix sort smashes c++ sort in pieces
(hovever this research then was tiring and i dont want to go beck into it..untill i would really needet it again)

fir

unread,
Apr 23, 2022, 6:38:49 AM4/23/22
to
> >
> > Up to 6,3 times faster with less coding !
> try radix sort, i once measured that things (on 10-yo old machine) and testing sorting 1MB of integers and remember c quicksort (like that from wikipedia) have 150 ms , c++ std:sort 111 ms,
> c insertsort 111 ms also or 2 ms worse something like that, c-radix sort (taken from stack overflow) 17 ms beating c++:stdsort more than 3 times

sorry mnistype i wanted to type 27 ms (not 17 ms)

Malcolm McLean

unread,
Apr 23, 2022, 7:00:33 AM4/23/22
to
But the advantage is tailing off as N gets larger. Probably because the runtime starts to be
dominated by cache misses rather than by overhead in the comparison function.

fir

unread,
Apr 23, 2022, 7:08:09 AM4/23/22
to
nobody should compare sorting to the qsort that probably comes from 80-ties or something, this
passing this 'predicates' as callbacs (if this predicate goiod word ) is obvious this is not effective

more surprising for me was that c++ ponys clain std::sort is ofc best, thousands od c++ pane care for that and simple routine you cpy and paste beats it 3-4 times (which is very notable)

but i could treat it like old-school anecdote as participating in that pony vars is generally not the best thing man can do imo (though defending C its ok though)

Bonita Montero

unread,
Apr 23, 2022, 7:44:07 AM4/23/22
to
> dominated by cache misses ...

Maybe you're right.

Bonita Montero

unread,
Apr 23, 2022, 8:32:24 AM4/23/22
to
Radix-sort is significantly slower because of the memory-allocations:

#include <iostream>
#include <vector>
#include <random>
#include <chrono>
#include <cstdlib>
#include <algorithm>
#include <memory>
#include <atomic>
#include "radix_sort.h"

using namespace std;
using namespace chrono;

atomic_uint auNonOptimize;

int main()
{
constexpr size_t
V_MAX = 0x1000,
V_MAX_ROUNDS = 0x1000;
using vu_t = vector<unsigned>;
using vu_it = vu_t::iterator;
vu_t
vu( V_MAX, 0 ),
vuSort( V_MAX, 0 );
mt19937_64 mt;
uniform_int_distribution<unsigned> uid( 0, -1 );
for( unsigned &u : vu )
u = uid( mt );
auto timedSort = [&]<typename SortFn>( SortFn sortFn, size_t n, size_t
rounds ) -> double
requires requires( SortFn sortFn, vu_it it ) { { sortFn( it, it ) }; }
{
auto start = high_resolution_clock::now();
unsigned sum = 0;
while( rounds-- )
copy( vu.begin(), vu.begin() + n, vuSort.begin() ),
sortFn( vuSort.begin(), vuSort.begin() + n ),
sum += vuSort[0];
::auNonOptimize = sum;
return (int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / 1.0e9;
};
auto cSort = []( vu_it begin, vu_it end )
{
qsort( to_address( begin ), end - begin, sizeof(unsigned),
[]( void const *left, void const *right ) -> int
{
unsigned
&l = *(unsigned *)left,
&r = *(unsigned *)right;
return l < r ? -1 : l > r ? 1 : 0;
} );
};
auto cppSort = []( vu_it begin, vu_it end )
{
sort( begin, end );
};
auto rdxSort = []( vu_it begin, vu_it end )
{
radix_sort( begin, end );
};
auto rel = []( double tC, double tRel ) -> int { return (int)(tC /
tRel * 100.0 + 0.5); };
double tC, tCpp, tRdx;
for( size_t s = 2, rounds; s <= V_MAX; s *= 2 )
rounds = (V_MAX_ROUNDS / s) * V_MAX_ROUNDS,
tC = timedSort( cSort, s, rounds ),
tCpp = timedSort( cppSort, s, rounds ),
tRdx = timedSort( rdxSort, s, rounds ),
cout << s << ": " << rel( tC, tCpp ) << "%, " << rel( tC, tRdx ) <<
"%" << endl;
}

// radix_sort.h:

#pragma once
#include <iterator>
#include <vector>
#include <array>
#include <climits>
#include <algorithm>
#include <type_traits>

template<std::forward_iterator ForwardIt>
requires std::is_integral_v<typename
std::iterator_traits<ForwardIt>::value_type>
void radix_sort( ForwardIt begin, ForwardIt end )
{
using namespace std;
if( begin == end )
return;
using elem_t = typename iterator_traits<ForwardIt>::value_type;
array<vector<elem_t>, 2> partitions;
for( unsigned bit = 0; bit != sizeof(elem_t) * CHAR_BIT; ++bit )
{
for( ForwardIt it = begin; it != end; ++it )
partitions[(*it >> bit) & 1].push_back( *it );
auto end0 = copy( partitions[0].begin(), partitions[0].end(), begin );
copy( partitions[1].begin(), partitions[1].end(), end0 );
partitions[0].clear();
partitions[1].clear();
}
}

2: 120%, 1%
4: 210%, 2%
8: 498%, 5%
16: 561%, 10%
32: 433%, 13%
64: 535%, 26%
128: 511%, 37%
256: 448%, 39%
512: 400%, 47%
1024: 321%, 55%
2048: 258%, 61%
4096: 225%, 68%

The first number is std::sort vs. qsort, the second number
is radix_sort vs. qsort. I think I could speed up radix_sort
significantly by allocating the buffer on the stack with alloca().

fir

unread,
Apr 23, 2022, 8:41:17 AM4/23/22
to
sobota, 23 kwietnia 2022 o 14:32:24 UTC+2 Bonita Montero napisał(a):
> Radix-sort is significantly slower because of the memory-allocations:
> #include <iostream>
> #include <vector>
> }
>
> 2: 120%, 1%
> 4: 210%, 2%
> 8: 498%, 5%
> 16: 561%, 10%
> 32: 433%, 13%
> 64: 535%, 26%
> 128: 511%, 37%
> 256: 448%, 39%
> 512: 400%, 47%
> 1024: 321%, 55%
> 2048: 258%, 61%
> 4096: 225%, 68%
>
> The first number is std::sort vs. qsort, the second number
> is radix_sort vs. qsort. I think I could speed up radix_sort
> significantly by allocating the buffer on the stack with alloca().

you better give miliseconds - like those numbers i gave
the radix sort i took is here in this thread

https://groups.google.com/g/comp.lang.c/c/I4lJSh45EJ0/m/ezGwgnvWDgAJ

hovever i dont want to get back to this now, this is low lewel of coding and optimisation
and this is wearing, note any time im ready to get down and involved, esp as those algorithms are hard to understand to the bone and being able to do it own way

Bonita Montero

unread,
Apr 23, 2022, 8:52:23 AM4/23/22
to
I output the relative speed vs. the C's qsort.
There's nothing to optimize further with my radix_sort
except from the memory-allocations.

fir

unread,
Apr 23, 2022, 8:59:14 AM4/23/22
to
qsort is nothing to compare as it calls those predicates by pointers (if im not wrong pasing its arguments back and forth on stack, maybe in case of x64 its not but this is overally stupid, besides quicksort si probably not the good alghoritm.. as far as i remember needed to improve it by insertion sort for short partitons etc... i dont want top get back to that)

you should giove miliseconds as it is reall life information which is kinda 'physical thus most interesting and most infromative.. if also could compare this result with rest of the world not some % relative to one of your code to other...giving miliseconds (and soem information on machine and settings) gave much more sense

if you want go pro on sorting handmade solutions will always be fastests imo becouse you may use soem assumptions on what your data and what you need which generall cal cant use... goint this specific way is most fun though its most hardcore too

Bonita Montero

unread,
Apr 23, 2022, 9:05:07 AM4/23/22
to
> qsort is nothing to compare as it calls those predicates by pointers ...


It doesn't matter to what I compare, even if I'd compare to bubblesort
since you can compre the relative numbers among each other.

> you should giove miliseconds ...

Irrelevant since I have relative numbers.

fir

unread,
Apr 23, 2022, 9:37:45 AM4/23/22
to
sobota, 23 kwietnia 2022 o 15:05:07 UTC+2 Bonita Montero napisał(a):
> Am 23.04.2022 um 14:59 schrieb fir:
>
> > you should giove miliseconds ...
>
> Irrelevant since I have relative numbers.

if you say so....

Juha Nieminen

unread,
Apr 25, 2022, 1:56:28 AM4/25/22
to
In comp.lang.c++ Bonita Montero <Bonita....@gmail.com> wrote:
> Up to 6,3 times faster with less coding !

std::sort() benefits a lot from being templated and not requiring callback
functions for comparing elements. (Many of the other operations are also
likely to benefit from directly seeing and handling the element type.)

I am curious to know if qsort() in the major standard library
implementations use the same algorithm as std::sort().

Bonita Montero

unread,
Apr 25, 2022, 2:27:32 AM4/25/22
to
> std::sort() benefits a lot from being templated and not requiring callback
> functions for comparing elements. ...

Obvious, but you can't guess the actual difference from that.

> I am curious to know if qsort() in the major standard library
> implementations use the same algorithm as std::sort().

Basically yes, bt there are minor improvements over quicksorts which
give a slight improved performance.

Malcolm McLean

unread,
Apr 25, 2022, 2:52:30 AM4/25/22
to
On Monday, 25 April 2022 at 06:56:28 UTC+1, Juha Nieminen wrote:
> In comp.lang.c++ Bonita Montero <Bonita....@gmail.com> wrote:
> > Up to 6,3 times faster with less coding !
> std::sort() benefits a lot from being templated and not requiring callback
> functions for comparing elements. (Many of the other operations are also
> likely to benefit from directly seeing and handling the element type.)
>
It's a general problem with C. Most elements are aligned on hardware
word boundaries. However it is awkward to test for or require that. So you
end up writing inefficient byte-copying loops for portability and convenience.

The other issue is that many compilers won't inline an indirect call, even if
it appears only once and is passed in directly.

Thiago Adams

unread,
Apr 27, 2022, 7:21:25 AM4/27/22
to
If a C programmer has concerns about qsort performance he just
make it inline removing the function comparison pointer and using
the function directly.


Bonita Montero

unread,
Apr 27, 2022, 12:36:29 PM4/27/22
to
1. You can't inline a function with arbitrary recursion.
2. The comparison-function for qsort() is never inlined.

William Ahern

unread,
Apr 27, 2022, 8:15:21 PM4/27/22
to
Bonita Montero <Bonita....@gmail.com> wrote:
> I just measured how superior C++'s std::sort is over C's qsort:
<snip>
> These are the results on my computer (Ryzen Threadripper 3990X):
>
> 2: 279%
> 4: 303%
> 8: 531%
> 16: 611%
> 32: 543%
> 64: 627%
> 128: 536%
> 256: 481%
> 512: 402%
> 1024: 316%
> 2048: 260%
> 4096: 230%
>
> Up to 6,3 times faster with less coding !

Unsurprising that inlining can improve performance. Your raw numbers are
useless, though, as you're comparing different algorithms (C++ header
template vs libc), which shall be demonstrated below. Also, technically
speaking your gripe is largely with toolchains (particularly wrt the
performance of standard library interfaces like qsort) as there's nothing
preventing toolchains from inlining qsort and the provided callback; for
historical reasons they simply don't. Newer languages and their toolchains,
like Rust, statically compile everything, including the standard library,
which is something C toolchain and runtime environment providers could just
as well do also. Rust Vector's sort API implicitly relies on static
compilation, otherwise it would suffer the same runtime indirect call issue
as in typical C environments.

I'm unable to compile your C++ code on OpenBSD 7.0; it appears to use
features that are too new. I ran the following LTO inlining experiment on
Alpine Linux 3.15 but using the qsort implementation from OpenBSD 7.0, which
was easiest to compile outside its source tree with minimal modifications
(changing qsort and heapsort to openbsd_qsort and openbsd_heapsort,
respectively, and adding a forward declaration for openbsd_heapsort). As you
can see, OpenBSD's qsort is implemented using heapsort. Not sure which
algorithm Alpine Linux's C++ std::sort is using, but it's either a different
algorithm entirely or a significantly different implementation.

alpine-3-15:/tmp$ make qsort-inline qsort-extern
cc -flto -O2 -march=native -Wall -c -o openbsd-qsort.o openbsd-qsort.c
cc -flto -O2 -march=native -Wall -c -o openbsd-heapsort.o openbsd-heapsort.c
g++ -o qsort-inline qsort.cc openbsd-qsort.o openbsd-heapsort.o -flto -O2 -march=native -Wall -std=c++2a -fvisibility=hidden
cc -shared -o libopenbsd-qsort.so openbsd-qsort.o openbsd-heapsort.o -flto -O2 -march=native -Wall
g++ -o qsort-extern qsort.cc -flto -O2 -march=native -Wall -std=c++2a -fvisibility=hidden -L. -Wl,-rpath="." -lopenbsd-qsort
alpine-3-15:/tmp$ ./qsort-inline
2: 54%
4: 59%
8: 39%
16: 75%
32: 86%
64: 131%
128: 160%
256: 186%
512: 186%
1024: 180%
2048: 151%
4096: 144%
alpine-3-15:/tmp$ ./qsort-extern
2: 75%
4: 95%
8: 84%
16: 176%
32: 195%
64: 285%
128: 362%
256: 490%
512: 588%
1024: 302%
2048: 246%
4096: 230%

William Ahern

unread,
Apr 27, 2022, 8:15:21 PM4/27/22
to
Bonita Montero <Bonita....@gmail.com> wrote:
> Am 27.04.2022 um 13:21 schrieb Thiago Adams:
<snip>
>> If a C programmer has concerns about qsort performance he just
>> make it inline removing the function comparison pointer and using
>> the function directly.
>
> 1. You can't inline a function with arbitrary recursion.

The same is true in C++.

> 2. The comparison-function for qsort() is never inlined.

That's a toolchain issue. Nothing prevents C environments from inlining
qsort and it's callback; most simply haven't done so historically. Other
language environments like Rust implicitly rely on static compilation to
avoid the cost of an indirect function call on every comparison.

Bonita Montero

unread,
Apr 28, 2022, 12:58:14 AM4/28/22
to
> Unsurprising that inlining can improve performance. Your raw numbers are
> useless, though, as you're comparing different algorithms (C++ header
> template vs libc), ...

No, the numbers aren't useless, they compare the C- and the C++-way
to sort things.

> Also, technically
> speaking your gripe is largely with toolchains (particularly wrt the
> performance of standard library interfaces like qsort) as there's nothing
> preventing toolchains from inlining qsort ...

LOL, there's no C compiler which will do that. The qsort-function is
just a static piece of code and nothing is inlined.

Bonita Montero

unread,
Apr 28, 2022, 12:59:23 AM4/28/22
to
Am 28.04.2022 um 02:10 schrieb William Ahern:
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Am 27.04.2022 um 13:21 schrieb Thiago Adams:
> <snip>
>>> If a C programmer has concerns about qsort performance he just
>>> make it inline removing the function comparison pointer and using
>>> the function directly.
>>
>> 1. You can't inline a function with arbitrary recursion.
>
> The same is true in C++.
>
>> 2. The comparison-function for qsort() is never inlined.
>
> That's a toolchain issue. Nothing prevents C environments from inlining
> qsort and it's callback; ...

The qsort-function is a static piece of code and the comparison
-function is never inlined.

Juha Nieminen

unread,
Apr 28, 2022, 2:34:05 AM4/28/22
to
In comp.lang.c++ William Ahern <wil...@25thandclement.com> wrote:
> Unsurprising that inlining can improve performance.

The performance boost doesn't come from inlining. One single function call
wouldn't make a difference when it comes to sorting. It doesn't matter if
the std::sort() function gets inlined into the calling code or whether
the compiler decides to make it a separate function. That would only be
a difference of a few clock cycles at most.

The difference comes from what could perhaps be called "reverse inlining"
(which happens thanks to std::sort() being a template). In other words,
rather than std::sort() being embedded into the calling code, data from
the calling code gets embedded into the std::sort() implementation.
(In other words, we are doing the "inlining" kind of in reverse.)

This means that when std::sort() needs to, for example, compare two
elements, it knows exactly the type of the element and can do the
comparison "inlined" in its own code, rather than having to call some
external function provided by the caller.

> Your raw numbers are
> useless, though, as you're comparing different algorithms (C++ header
> template vs libc)

That's not a difference in algorithm. That's a difference in implementation.
For all we know std::sort() and qsort() could both use the exact same
sorting algorithm.

Note that making std::sort() a template isn't done primarily because it
makes it more efficient (it's a really nice bonus, but it's not the primary
reason why it's done). The main reason why it's a template is because
swapping elements in C++ may not always be possible by merely swapping
the underlying bytes of those elements. In other words, the swapping
is type-dependent. Achieving this is easiest with a template.

William Ahern

unread,
Apr 28, 2022, 6:00:21 AM4/28/22
to
Juha Nieminen <nos...@thanks.invalid> wrote:
> In comp.lang.c++ William Ahern <wil...@25thandclement.com> wrote:
>> Unsurprising that inlining can improve performance.
>
> The performance boost doesn't come from inlining. One single function call
> wouldn't make a difference when it comes to sorting. It doesn't matter if
> the std::sort() function gets inlined into the calling code or whether
> the compiler decides to make it a separate function. That would only be
> a difference of a few clock cycles at most.

I posted two sets of benchmarks, one for qsort-inline and another for
qsort-extern. qsort-inline clearly has a substantially different performance
profile, and that could only happen if the comparison callback itself was
inlined into the sort routine. See below for the disassembly.

> The difference comes from what could perhaps be called "reverse inlining"
> (which happens thanks to std::sort() being a template). In other words,
> rather than std::sort() being embedded into the calling code, data from
> the calling code gets embedded into the std::sort() implementation.
> (In other words, we are doing the "inlining" kind of in reverse.)
>
> This means that when std::sort() needs to, for example, compare two
> elements, it knows exactly the type of the element and can do the
> comparison "inlined" in its own code, rather than having to call some
> external function provided by the caller.

This is exactly what is happening in the qsort-inline trial I posted: the
comparison callback function is inlined into the qsort implementation. The
reason I couldn't just use -flto with the C runtime qsort is that -flto in
clang and GCC require the intermediate code for both the comparison callback
*and* the qsort implementation itself, and that information isn't included
in Alpine's C runtime--neither the dynamic libc.so nor static libc.a. In
theory the intermediate code *could* be included, just like a system C
runtime can ship with debug symbols.

Below is the dissassembly of some relevant portions of qsort-inline proving
inlining. After recompiling qsort-inline with debug symbols (-g), the
fragment was generated using `objdump -drwC -S -l`, which interleaves
assembly blocks with the originating source block.

This first fragment begins inside qsort (at openbsd-qsort.c line 147, inside
the inner introsort routine) and ends with code from the comparator lambda
in the post C++ benchmark code (at qsort.cc line 51, +/- a couple lines from
the original posted code), without any interveaning calls (only jumps),
showing that the comparator function is inlined into the sorting
implementation itself:

/tmp/comp.lang.c-qsort/openbsd-qsort.c:147
if (n > 7) {
1e40: 48 83 fb 07 cmp $0x7,%rbx
1e44: 74 2c je 1e72 <introsort.constprop.0+0x72>
introsort.constprop.0():
/tmp/comp.lang.c-qsort/qsort.cc:51
return l < r ? -1 : l > r ? 1 : 0;
1e46: 8b 08 mov (%rax),%ecx
1e48: 8b 32 mov (%rdx),%esi

This second fragment begins inside the timedSort lambda, showing that
qsort itself is inlined into the lambda:

operator()<main()::<lambda(unsigned int*, unsigned int*)> >():
/tmp/comp.lang.c-qsort/qsort.cc:35
while( rounds-- )
1315: 4c 8d 78 ff lea -0x1(%rax),%r15
1319: 4c 89 fd mov %r15,%rbp
131c: 0f 1f 40 00 nopl 0x0(%rax)
unsigned int* std::__copy_move<false, true, std::random_access_iterator_tag>::__copy_m<unsigned int>(unsigned int const*, unsigned int const*, unsigned int*):
/usr/include/c++/10.3.1/bits/stl_algobase.h:426
__builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
1320: 48 8b 7c 24 30 mov 0x30(%rsp),%rdi
1325: 48 89 da mov %rbx,%rdx
1328: e8 03 fd ff ff call 1030 <memmove@plt>
__gnu_cxx::__normal_iterator<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > >::__normal_iterator(unsigned int* const&):
/usr/include/c++/10.3.1/bits/stl_iterator.h:979
: _M_current(__i) { }
132d: 48 8b 7c 24 30 mov 0x30(%rsp),%rdi
openbsd_qsort():
/tmp/comp.lang.c-qsort/openbsd-qsort.c:225
{
size_t i, maxdepth = 0;
int swaptype;

/* Approximate 2*ceil(lg(n + 1)) */
for (i = n; i > 0; i >>= 1)
1332: 4c 89 e8 mov %r13,%rax

The OpenBSD source code used can be found at

https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/qsort.c?annotate=1.18
https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/stdlib/heapsort.c?annotate=1.11

William Ahern

unread,
Apr 28, 2022, 6:30:20 AM4/28/22
to
See disassembly posted elsethread. I've given all the information needed to
quickly reproduce this for yourself, assuming some minimal familiarity with
GCC.

Alpine Linux isn't even needed, it's just what I started with out of
caution. I just now reproduced the distinctive benchmark profiles between
qsort-inline and qsort-extern on macOS 12.3.1 after removing -Wl,-rpath from
CFLAGS. (The macOS/mach-O linker doesn't recognize the -rpath option, but
it's linker embeds and uses the original location by default, anyhow.) I
don't know off-hand how to get a proper disassembly on macOS (-flto
complicates things considerably), but there's no reason to believe that the
comparator wasn't inlined like it was on Linux.

As another poster mentioned, templates are good for much more than
performance inlining. But I've always had a gripe with the qsort vs
std::qsort meme in particular: it confuses C toolchains and historical
linking semantics with the language itself. It should be clear, now, that C
language semantics don't prevent inlining of a comparator passed to a
sorting function; now even one defined externally to the invoking source
unit. (Unless somehow I've missed that a standard routine can't be inlined
without violating the standard--e.g. that the symbol has to have a unique
address.)

Malcolm McLean

unread,
Apr 28, 2022, 6:47:32 AM4/28/22
to
That an interesting one. The comparison function can have side-effects,
so it could do something with its own address, like printing it out. Does
that address have to match the address of the fucntion when taken globally?
I think it does.
That would make inlining a lot more complicated, just to catch one almost
useless exceptional case.

Thiago Adams

unread,
Apr 28, 2022, 7:30:21 AM4/28/22
to
What I mean by " removing the function comparison pointer and using
the function directly." is to create a copy of the code replacing the
function pointer. Like instantiating the template function manually.
The idea C++ is faster than C is always very superficial. C is not
a language to expect a big framework to compare against something else
(samples like qsort x std::sort are rare), instead C is a language what you
do whatever you want given you fine control about performance.

Dolores Filandro

unread,
May 8, 2022, 12:45:53 AM5/8/22
to

> {
> unsigned
> &l = *(unsigned *)left,
> &r = *(unsigned *)right;
> return l < r ? -1 : l > r ? 1 : 0;
> } );


(l > r ? 1 : 0) means exactly the same thing as (l > r)

Andrey Tarasevich

unread,
May 8, 2022, 4:56:21 AM5/8/22
to
... which is why one usually expresses the above as the idiomatic

return (l > r) - (l < r);

--
Best regards,
Andrey

Kenny McCormack

unread,
May 8, 2022, 11:06:50 AM5/8/22
to
In article <t580j5$hlc$1...@dont-email.me>,
Which is why I've always thought it was dumb that the standard library
(both in C and in some "C-like" languages) doesn't provide the sgn()
function. It means that everybody has to re-invent the above idiom.

Incidentally, is any of this necessary? I have not been following this
discussion, so I may have missed something, but if we are talking about the
qsort() function, then it is not necessary to return exactly one of -1, 0,
or 1. Any negative number will do, as will any positive one. So, all you
need is (l-r).

--
The single most important statistic in the US today - the one that explains all the
others - is this: 63 million people thought it was a good idea to vote for this clown
(and will probably do so again). Everything else is secondary to that. Everything else
could be fixed if we can revert this one statistic. Nothing can be fixed until we do.

Andrey Tarasevich

unread,
May 8, 2022, 1:08:13 PM5/8/22
to
On 5/8/2022 8:06 AM, Kenny McCormack wrote:
> In article <t580j5$hlc$1...@dont-email.me>,
> Andrey Tarasevich <andreyta...@hotmail.com> wrote:
>> On 5/7/2022 9:45 PM, Dolores Filandro wrote:
>>>
>>>> {
>>>> unsigned
>>>> &l = *(unsigned *)left,
>>>> &r = *(unsigned *)right;
>>>> return l < r ? -1 : l > r ? 1 : 0;
>>>> } );
>>>
>>>
>>> (l > r ? 1 : 0) means exactly the same thing as (l > r)
>>
>> ... which is why one usually expresses the above as the idiomatic
>>
>> return (l > r) - (l < r);
>
> Which is why I've always thought it was dumb that the standard library
> (both in C and in some "C-like" languages) doesn't provide the sgn()
> function. It means that everybody has to re-invent the above idiom.
>
> Incidentally, is any of this necessary? I have not been following this
> discussion, so I may have missed something, but if we are talking about the
> qsort() function, then it is not necessary to return exactly one of -1, 0,
> or 1. Any negative number will do, as will any positive one. So, all you
> need is (l-r).

No. `l - r` is a major anti-pattern and is often a bug.

Firstly, this will not work at all for unsigned types as in the original
example. Secondly, in general this will not work for signed types since
it can/will overflow.

The only case when you can get away with `l - r` is smaller-than-int
types (which are promoted to `int` and can't overflow), and situations
where "you are absolutely sure" the values are so small that overflow
can't happen.

But it is not really worth it, when one has `(l > r) - (l < r)`.

--
Best regards,
Andrey

Andrey Tarasevich

unread,
May 8, 2022, 1:13:58 PM5/8/22
to
Of course, one can always argue that `(l > r) - (l < r)` is also a
variation of `l - r` where the values are pre-normalized to ensure they
"are so small that overflow can't happen".

:O)

--
Best regards,
Andrey

Dolores Filandro

unread,
May 9, 2022, 1:00:51 PM5/9/22
to
But what I was getting at, is that
for OP representing any relational or equality operator, ==, !=, >, <, et cetera,
and with any text substitutions for A and B
the expression ((A) OP (B)) means *Exactly* the same thing as the expression ((A) OP (B) ? 1 : 0)
Same type
Same value
Same side effects
to the extent that a compiler can always render the exact same op code for either expression.

Malcolm McLean

unread,
May 9, 2022, 2:47:35 PM5/9/22
to
The relational operators don't have to be overloaded to return a boolean.
It's normal to do so, but not required.
I suppose a use case would be a fuzzy logic type system whereby you could have
full or partial equality, and == returned a real on 0.0 - 1.0.

Keith Thompson

unread,
May 9, 2022, 2:59:55 PM5/9/22
to
Malcolm McLean <malcolm.ar...@gmail.com> writes:
[...]
> The relational operators don't have to be overloaded to return a boolean.
> It's normal to do so, but not required.
> I suppose a use case would be a fuzzy logic type system whereby you could have
> full or partial equality, and == returned a real on 0.0 - 1.0.

C doesn't have operator overloading. The equality and relational
operators return a result of type int with the value 0 or 1.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Bonita Montero

unread,
May 10, 2022, 12:20:39 PM5/10/22
to
The result of the comparison-callback of qsort() is -1, 0 or 1.

Scott Lurndal

unread,
May 10, 2022, 12:38:01 PM5/10/22
to
Neither of the relevant standards agree with you.

C99 (n1256.pdf)

The comparison function pointed to by compar is called with two arguments that point
to the key object and to an array element, in that order. The function shall return an
integer less than, equal to, or greater than zero if the key object is considered,
respectively, to be less than, to match, or to be greater than the array element. The array
shall consist of: all the elements that compare less than, all the elements that compare
equal to, and all the elements that compare greater than the key object, in that order.264)

POSIX:

The contents of the array shall be sorted in ascending order according to a comparison
function. The compar argument is a pointer to the comparison function, which is
called with two arguments that point to the elements being compared. The
application shall ensure that the function returns an integer less than,
equal to, or greater than 0, if the first argument is considered respectively
less than, equal to, or greater than the second. If two members compare
as equal, their order in the sorted array is unspecified.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/qsort.html

Andrey Tarasevich

unread,
May 10, 2022, 1:23:11 PM5/10/22
to
What point are you trying to make? The result of `(l > r ? 1 : 0)` is 1
or 0. The result of `(l > r)` is also 1 or 0. So?

P.S. Notwithstanding that callback result does not have to be "-1, 0 or 1"

--
Best regards,
Andrey

Bonita Montero

unread,
May 11, 2022, 1:16:22 AM5/11/22
to
Am 10.05.2022 um 18:37 schrieb Scott Lurndal:
> Bonita Montero <Bonita....@gmail.com> writes:
>> Am 08.05.2022 um 06:45 schrieb Dolores Filandro:
>>>
>>>> {
>>>> unsigned
>>>> &l = *(unsigned *)left,
>>>> &r = *(unsigned *)right;
>>>> return l < r ? -1 : l > r ? 1 : 0;
>>>> } );
>>>
>>>
>>> (l > r ? 1 : 0) means exactly the same thing as (l > r)
>>
>> The result of the comparison-callback of qsort() is -1, 0 or 1.
>>
>
> Neither of the relevant standards agree with you.

I'm not referring to the standard but what I implement.

Bonita Montero

unread,
May 11, 2022, 1:17:02 AM5/11/22
to
Am 10.05.2022 um 19:22 schrieb Andrey Tarasevich:
> On 5/10/2022 9:21 AM, Bonita Montero wrote:
>> Am 08.05.2022 um 06:45 schrieb Dolores Filandro:
>>>
>>>> {
>>>> unsigned
>>>> &l = *(unsigned *)left,
>>>> &r = *(unsigned *)right;
>>>> return l < r ? -1 : l > r ? 1 : 0;
>>>> } );
>>>
>>>
>>> (l > r ? 1 : 0)  means exactly the same thing as  (l > r)
>>
>> The result of the comparison-callback of qsort() is -1, 0 or 1.
>>
>
> What point are you trying to make? The result of `(l > r ? 1 : 0)` is 1
> or 0. The result of `(l > r)` is also 1 or 0. So?

Of course, but isn't a sufficient result for the qsort-callback.

Öö Tiib

unread,
May 11, 2022, 5:15:01 AM5/11/22
to
The qsort does not require you to write redundant code in callback.
The "l > r ? 1 : 0" you are discussing is subexpression that is exactly
equivalent with "l > r" so the "? 1 : 0" part of it is superfluous.

james...@alumni.caltech.edu

unread,
May 11, 2022, 10:26:32 AM5/11/22
to
No one suggested that it was. The original code was

return l < r ? -1 : l > r ? 1 : 0;

Doloros was pointing out that the sub-expression l > r ? 1 : 0 was
exactly equivalent to l > r, implying that the code should be changed to

return l < r ? -1 : l > r;

No on was suggesting that it be changed to

return l > r;

Bonita Montero

unread,
May 11, 2022, 10:52:51 AM5/11/22
to
> The qsort does not require you to write redundant code in callback.
> The "l > r ? 1 : 0" you are discussing is subexpression that is exactly
> equivalent with "l > r" so the "? 1 : 0" part of it is superfluous.

Right, but that's not what qsort needs. qsort doesn't need a less
-predicate like std::sort but a ternary result which is either
negative, zero or positive.

james...@alumni.caltech.edu

unread,
May 11, 2022, 12:13:54 PM5/11/22
to
Which is precisely what you get with either

return l < r ? -1 : l > r ? 1 : 0;,

as you originally wrote it, or with l > r ? 1 : 0 simplified to l > r, giving

return l < r ? -1 : l > r;

as was implicitly by Dolores' comment.

Bonita Montero

unread,
May 11, 2022, 12:44:23 PM5/11/22
to
The optimizer generates the same code and mine is more readable.

james...@alumni.caltech.edu

unread,
May 11, 2022, 10:40:39 PM5/11/22
to
On Wednesday, May 11, 2022 at 12:44:23 PM UTC-4, Bonita Montero wrote:
> Am 11.05.2022 um 18:13 schrieb james...@alumni.caltech.edu:
...
> > Which is precisely what you get with either
> >
> > return l < r ? -1 : l > r ? 1 : 0;,
> >
> > as you originally wrote it, or with l > r ? 1 : 0 simplified to l > r, giving
> >
> > return l < r ? -1 : l > r;
> >
> > as was implicitly by Dolores' comment.
> The optimizer generates the same code and mine is more readable.

Readability is a judgment call, and I've learned there's not much point in
arguing about judgment calls. Dolores' is simpler, and to me, at least, more
readable. Code like yours would make me wonder whether the author was
so unfamiliar with C that she didn't realize the extra complication was
unnecessary. It wouldn't occur to me to think she might have written it
because she found the unnecessarily complicated code more readable.

Bonita Montero

unread,
May 12, 2022, 12:21:31 AM5/12/22
to
> Readability is a judgment call, ...

I think most people generally understanding code would argue like that.

> Dolores' is simpler, ...

Less code, not simpler.

> Code like yours would make me wonder whether the author was
> so unfamiliar with C that she didn't realize the extra complication

It's less complication.

0 new messages