binary_search accoring to its name

Bonita Montero

unread,

Nov 27, 2021, 9:39:12 AM11/27/21

to

This is a somewhat more efficient binary_search, but it requires
that all key in the array are unique. It's more efficient because
it doesn't need two predicate compares to test for equality since
it uses a predicate which returns a stong_ordering-object.
If you're debugging it tests whether the object left and right
from the result are less or greater to ensure uniqueness of the
key-value.

template<typename RandomIt, typename T, typename Pred>
inline
RandomIt xbinary_search( RandomIt begin, RandomIt end, T const &key,
Pred pred )
requires std::random_access_iterator<RandomIt>
&&
requires( Pred pred, typename
std::iterator_traits<RandomIt>::value_type &elem, T const &key )
{
{ pred( elem, key ) } -> std::convertible_to<std::strong_ordering>;
}
{
using namespace std;
size_t lower = 0, upper = end - begin, mid;
strong_ordering so;
while( lower != upper )
{
mid = (lower + upper) / 2;
so = pred( begin[mid], key );
if( so == 0 )
{
assert(!mid || pred( begin[mid - 1], key ) < 0);
assert(begin + mid + 1 == end || pred( begin[mid + 1], key ) > 0);
return begin + mid;
}
if( so > 0 )
upper = mid;
else
lower = mid + 1;
}
return end;
}

Another advantage is, that the supplied key can have a differnt type
than the elements in the array; so you can f.e. check for an member
of an element.

Marcel Mueller

unread,

Nov 28, 2021, 6:28:02 AM11/28/21

to

Am 27.11.21 um 15:38 schrieb Bonita Montero:

> This is a somewhat more efficient binary_search, but it requires
> that all key in the array are unique. It's more efficient because
> it doesn't need two predicate compares to test for equality since
> it uses a predicate which returns a stong_ordering-object.

This is well known. But it only improves the best case if a key in an
unique list happens to match early. In general the algorithm is still
O(log n). Amortized you save at most 1 comparison for very large containers.
This is done at the expense that for non unique keys the behavior is no
longer well defined.

Assuming than a three way comparison is often a bit more expensive, it
might even be less efficient for some collections.

Of course, there might be use cases where this is a good advice, but not
in general.

> Another advantage is, that the supplied key can have a differnt type
> than the elements in the array; so you can f.e. check for an member
> of an element.

Asymmetric comparers are another feature. This can be quite useful.
Typically in case of associative containers of objects that have an
intrinsic immutable key.
Normally i write small container classes exactly for this purpose.
Something like keyed_set<T,KEY_EXTRACTOR>.

Marcel

Bonita Montero

unread,

Nov 28, 2021, 11:03:20 AM11/28/21

to

Am 28.11.2021 um 12:27 schrieb Marcel Mueller:

> Assuming than a three way comparison is often a bit more expensive,
> it might even be less efficient for some collections.

The result is often stored inside the CPU-flags.

Juha Nieminen

unread,

Nov 29, 2021, 12:57:18 AM11/29/21

to

Marcel Mueller <news.5...@spamgourmet.org> wrote:
> Assuming than a three way comparison is often a bit more expensive, it
> might even be less efficient for some collections.

And not only does it require a slower three-way comparison, it also uses two
conditionals instead of the one that std::binary_search/std::lower_bound
uses. Depending on the situation the additional conditional may actually
make the code slightly slower (especially since it's likely that the CPU
won't predict it correctly in about half the iterations, on average).
Unpredictable conditionals are performance killers.

Bonita Montero

unread,

Nov 29, 2021, 2:26:36 AM11/29/21

to

Am 29.11.2021 um 06:57 schrieb Juha Nieminen:

> And not only does it require a slower three-way comparison, ...

Check the generated code: the result of the three way comparison is
in the flags, depending on the type.

> ..., it also uses two conditionals instead of the one that
> std::binary_search/std::lower_bound uses.

The ==-conditional is predictable almost any time since it matches
only one time in the loop. But you're right: in most cases this
code is slower than with lower bound.
But I've developed a lower_bound alternative which is slightly
faster than the lower-bound alternative:

template<typename RandomIt, typename T, typename Pred>
inline

RandomIt xlower_bound( RandomIt begin, RandomIt end, T const &key, Pred

pred )
requires std::random_access_iterator<RandomIt>
&&

requires( Pred pred, decltype(*RandomIt()) &elem, T const &key )

{
{ pred( elem, key ) } -> std::convertible_to<std::strong_ordering>;
}
{
using namespace std;

assert(end >= begin);
size_t hit = end - begin, lower = 0, upper = hit, mid;

while( lower != upper )
{

mid = lower + (upper - lower) / 2;
if( pred( begin[mid], key ) >= 0 )
hit = mid,

upper = mid;
else
lower = mid + 1;
}

return begin + hit;
}

On a vector of 1, 4 million string_view-objects I get
about 3% more performance on my computer with clang 13.

Bonita Montero

unread,

Nov 29, 2021, 8:03:01 AM11/29/21

to

This is a small benchmark:

auto bench = [&]<typename SearchFn>( SearchFn searchFn ) -> double
requires requires( SearchFn searchFn, vsv_it iterator, string_view
const &key )
{
{ searchFn( iterator, iterator, key ) } -> convertible_to<vsv_it>;
}
{
size_t sumExist = 0;
auto start = high_resolution_clock::now();
for( size_t it = ITERATIONS; it; --it )
sumExist += searchFn( strings.begin(), strings.end(),
strings[uidIndex( mt )] ) != strings.end();
double ns = (double)(int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / ITERATIONS;
aSumExist += sumExist;
return ns;
};
double lb = bench( []( vsv_it begin, vsv_it end, string_view const &key
) -> vsv_it
{
vsv_it it = lower_bound( begin, end, key );
if( it == end && *it != key )
return end;
return it;
} );
cout << "lb: " << lb << endl;
double xlb = bench( []( vsv_it begin, vsv_it end, string_view const
&key ) -> vsv_it
{
vsv_it it = xlower_bound( begin, end, key,
[]( string_view &elem, string_view const &key ) -> strong_ordering {
return elem <=> key; } );
if( it == end && *it != key )
return end;
return it;
} );
cout << "xlb: " << xlb << endl;
double xbs = bench( []( vsv_it begin, vsv_it end, string_view const &key )
{
return xbinary_search( begin, end, key,
[]( string_view &elem, string_view const &key ) -> strong_ordering {
return elem <=> key; } );
} );
cout << "xbs: " << xbs << endl;

vsv_it is vector<string_view>::iterator

144,420,73 sorted string_view items, avgerage length: 8.68825 characters
lower_bound: 1835.47ns
xlower_bound: 1741.33ns
xbinary_search: 1845.06

I think with such a large dataset the time isn't decided upon the
algorithm but upon the random memory-accesses.