Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

calls to hash function in unordered_set

37 views
Skip to first unread message

Frank Tetzel

unread,
Jun 11, 2018, 8:49:10 AM6/11/18
to
Hi,

can anybody give me a reasonable explanation why the hash function for
a simple unordered_set is called twice as much as I would expect.

Example:


#include <iostream>
#include <unordered_set>

struct MyHash{
static int numberOfCalls;

size_t operator()(int i) const noexcept{
++numberOfCalls;
//return std::hash<int>{}(i);
return i;
}
};
int MyHash::numberOfCalls = 0;

int main(){
std::unordered_set<int,MyHash> set;

std::cout << MyHash::numberOfCalls << '\n';
set.reserve(1024);
std::cout << MyHash::numberOfCalls << '\n';
for(int i=0; i<1024; ++i){
set.emplace(i);
}
std::cout << MyHash::numberOfCalls << '\n';

return 0;
}




output:
2047

It seems to always be (inserted_elements * 2) - 1.
When adding 2 elements, it hashes 3 times. When adding 1 element, it
only hashes once. What is happening here? Why is the hash calculated
twice? What is so special about the last or first element?

Btw, I'm running gcc 8.1.1.

Best regards,
Frank

Ike Naar

unread,
Jun 11, 2018, 1:17:45 PM6/11/18
to
Here, with g++ 4.8.4 it also hashes 2047 times,
but if 'noexcept' is removed from MyHash::operator() it hashes 1024 times.

Öö Tiib

unread,
Jun 11, 2018, 4:13:23 PM6/11/18
to
On Monday, 11 June 2018 15:49:10 UTC+3, Frank Tetzel wrote:
>
> It seems to always be (inserted_elements * 2) - 1.
> When adding 2 elements, it hashes 3 times. When adding 1 element, it
> only hashes once. What is happening here? Why is the hash calculated
> twice? What is so special about the last or first element?
>
> Btw, I'm running gcc 8.1.1.

Standard-library cashes potentially slow and throwing hashes.
For example in first standard library that happened under my hand
right now (and it seems at least 4 years old) I see in its
bits/hashtable.h:

template<typename _Tp, typename _Hash>
using __cache_default
= __not_<__and_<// Do not cache for fast hasher.
__is_fast_hash<_Hash>,
// Mandatory to make local_iterator default
// constructible and assignable.
is_default_constructible<_Hash>,
is_copy_assignable<_Hash>,
// Mandatory to have erase not throwing.
__detail::__is_noexcept_hash<_Tp, _Hash>>>;


So it does not cache your hashes and instead calls the hash function
when it needs to. It needs to do so so few times because you did
reserve, otherwise it would likely need to call it more times.

Also if you want your MyHash to be "slow" but noexcept then in the
standard library that I see it can be done by disabling its
copy-assignment:

MyHash& operator=(const MyHash&) = delete;

Good to know when you happen to have relatively slow non-throwing
hasher. ;)

Frank Tetzel

unread,
Jun 12, 2018, 3:53:50 AM6/12/18
to
> > It seems to always be (inserted_elements * 2) - 1.
> > When adding 2 elements, it hashes 3 times. When adding 1 element, it
> > only hashes once. What is happening here? Why is the hash calculated
> > twice? What is so special about the last or first element?
> >
> > Btw, I'm running gcc 8.1.1.
>
> Here, with g++ 4.8.4 it also hashes 2047 times,
> but if 'noexcept' is removed from MyHash::operator() it hashes 1024
> times.

Interesting. I also asked on the gcc-help mailing list and got pointed
to the following documentation:
https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html#containers.unordered.cache

It is caching the hash values as erase() and swap() are not allowed to
throw by the standard. So, it cannot call a potential throwing hash
function and has to cash the hash values.

Why it needs and generates the hash value of one element multiple times
during insertion is still unclear to me. I guess, I just have to step
through the code with a debugger and try to understand it.

Frank Tetzel

unread,
Jun 12, 2018, 3:58:20 AM6/12/18
to
> Standard-library cashes potentially slow and throwing hashes.
> For example in first standard library that happened under my hand
> right now (and it seems at least 4 years old) I see in its
> bits/hashtable.h:
>
> template<typename _Tp, typename _Hash>
> using __cache_default
> = __not_<__and_<// Do not cache for fast hasher.
> __is_fast_hash<_Hash>,
> // Mandatory to make local_iterator default
> // constructible and assignable.
> is_default_constructible<_Hash>,
> is_copy_assignable<_Hash>,
> // Mandatory to have erase not throwing.
> __detail::__is_noexcept_hash<_Tp, _Hash>>>;
>
>
> So it does not cache your hashes and instead calls the hash function
> when it needs to. It needs to do so so few times because you did
> reserve, otherwise it would likely need to call it more times.
>
> Also if you want your MyHash to be "slow" but noexcept then in the
> standard library that I see it can be done by disabling its
> copy-assignment:
>
> MyHash& operator=(const MyHash&) = delete;
>
> Good to know when you happen to have relatively slow non-throwing
> hasher. ;)


libstdc++ of gcc has a type trait as an extension to mark a hash
function as fast or slow, but your way might also work for other STL
implementations.

https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html#containers.unordered.cache

Öö Tiib

unread,
Jun 12, 2018, 4:26:22 PM6/12/18
to
On Tuesday, 12 June 2018 10:58:20 UTC+3, Frank Tetzel wrote:
>
> libstdc++ of gcc has a type trait as an extension to mark a hash
> function as fast or slow, but your way might also work for other STL
> implementations.
>
> https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html#containers.unordered.cache

Exactly! The libstdc++ code that I posted nicely explained in comments
that since it can't easily make assignable iterators without
non-assignable hasher it picks to cache the hashes.

Sure, we can use (when everything else fails) implementation-specific
thing like the specialization in your link:

namespace std
{
template<> struct __is_fast_hash<MyHash> : std::false_type { };
}

That works when libstdc++ is the *only* library used. Otherwise we have
to make it conditionally compiled side-by-side with similar hacks for
other library implementations to avoid some cryptic noise
about that "__is_fast_hash" being wrong specialization of unknown
template.

OTOH making hasher non-assignable is totally portable and will challenge
every standard library implementation with similar issue. ;)

0 new messages