Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

hash table and map

30 views
Skip to first unread message

Christopher J. Pisz

unread,
Jan 23, 2017, 5:16:47 PM1/23/17
to
phone screen today asked about hastables, what they are, how they work, etc.

I answered a kind of "well I have never had to really create one from
scratch in C++, I just use the built in data structures like map if I
want an associative container"

I know there is a hash that is defined by some has function on some but
of data, that provides fast lookup...

I told them that I was under the impression that map used a hash table
under the hood, and then blabbed about it being related to a
tree..failsauce.

so, std::map is implemented as a tree and std::unordered_map is
implemented as a hastable?

What are the differences, advantages, etc?

Adam C. Emerson

unread,
Jan 23, 2017, 7:03:18 PM1/23/17
to
On 2017-01-23, Christopher J. Pisz <cp...@austin.rr.com> wrote:
[snip]
> so, std::map is implemented as a tree and std::unordered_map is
> implemented as a hastable?
>
> What are the differences, advantages, etc?

Well! The main answer is implicit in the name. std::map, as a tree,
can be traversed in order. What do I mean by order?

One of the parameters to std::map is a functor class that, when called
with two Keys, will return true if the first is less than the
second. By default this functor is std::less specialized on the key
type.

When you iterate over a std::map, you traverse the elements in order
from Least to Greatest. std::map inserts or looks up elements in
logarithmic time. It's implemented as a tree. A Red-Black tree
specifically in GNU libstdc++ and CLang libc++.

std::unordered_map is a hash table. Instead of a functor telling
whether one key is less than another, it takes one functor that hashes
objects (taking an object it returns a value convertible to
std::size_t) and another defining equivalence that returns true if two
supplied objects are equal.

By default it uses std::hash and std::equal_to specialized on the Key
type. std::unordered_map does lookups and insertions, on average, in
constant time. If your hash function is really bad or your object
distribution is really pathological, it degrades to linear. it does
/not/ traverse objects from least to greatest and there's basically
nothing useful you can say about the order of traversal except that
you do traverse all objects.

If you don't need ordered traversal, you likely want a
std::unordered_map. As always, if performance is critical, benchmark
your use-case with your library.

If you wish to avoid allocations and you can insert your items in
order, you may want boost::container::flat_map.

If the entire set of keys will be known at compile time, consider
using a Perfect Hash Function like those generated by gperf.

Öö Tiib

unread,
Jan 24, 2017, 10:48:21 AM1/24/17
to
On Tuesday, 24 January 2017 02:03:18 UTC+2, Adam C. Emerson wrote:
> On 2017-01-23, Christopher J. Pisz <cp...@austin.rr.com> wrote:
>
> By default it uses std::hash and std::equal_to specialized on the Key
> type. std::unordered_map does lookups and insertions, on average, in
> constant time. If your hash function is really bad or your object
> distribution is really pathological, it degrades to linear. it does
> /not/ traverse objects from least to greatest and there's basically
> nothing useful you can say about the order of traversal except that
> you do traverse all objects.
>
> If you don't need ordered traversal, you likely want a
> std::unordered_map. As always, if performance is critical, benchmark
> your use-case with your library.

One other point worth mentioning here is the automatic rehashing.
The rebalance of map is so cheap that it is hard to notice.
Rehashing of unordered_map is more pricy. So unordered_map
is likely faster in average but single insert now or then can
take noticeable lag. One has to consider that when testing
performance since sometimes uniform performance is more
important than average efficiency.

Ben Bacarisse

unread,
Jan 24, 2017, 11:05:18 AM1/24/17
to
嘱 Tiib <oot...@hot.ee> writes:

> On Tuesday, 24 January 2017 02:03:18 UTC+2, Adam C. Emerson wrote:
>> On 2017-01-23, Christopher J. Pisz <cp...@austin.rr.com> wrote:
>>
>> By default it uses std::hash and std::equal_to specialized on the Key
>> type. std::unordered_map does lookups and insertions, on average, in
>> constant time. If your hash function is really bad or your object
>> distribution is really pathological, it degrades to linear. it does
>> /not/ traverse objects from least to greatest and there's basically
>> nothing useful you can say about the order of traversal except that
>> you do traverse all objects.
>>
>> If you don't need ordered traversal, you likely want a
>> std::unordered_map. As always, if performance is critical, benchmark
>> your use-case with your library.
>
> One other point worth mentioning here is the automatic rehashing.

You don't necessarily have to rehash when the table size changes if you
store the hash in the table. You have to /re-key/ the items, but that
can be as cheap as a mask operation (and of course they move to a larger
table).

<snip>
--
Ben.
0 new messages