Comparing emplace() for Absl::flat_hash_map vs std::unordered_map

1,910 views
Skip to first unread message

Mean Dog

unread,
Aug 21, 2020, 1:19:00 PM8/21/20
to Abseil.io
I've been playing around with absl flat hash map trying to compare it with std unordered map.
I wrote a small program, included below.   It's a simple program that adds 10 entries into a map.  Then it enters a large loop where it erases the oldest entry and inserts a new entry.  Each time the key is incremented by 1.  At no time during the loop do I expect a rehash to occur.  I am using std::chrono for timing and invoking the program with "taskset -c 2"

This was compiled using g++-8 version 8.2.0 on ubuntu linux with 8 cores.
I ran this many times with both std::unordered_map and the absl::flat_hash_map.
The results show that std::unordered_map wins by 4 to 1 for emplace().

What am I doing wrong?

#include <iostream>

#include <type_traits>

#include <memory>

#include <algorithm>

#include <unordered_map>

#include <chrono>

#include <limits>

#include "absl/container/flat_hash_map.h"

#include "absl/container/node_hash_map.h"


template< typename KeyT, typename ValueT>

using UnorderedMap = absl::flat_hash_map<KeyT, ValueT>;

//using UnorderedMap = std::unordered_map<KeyT, ValueT>;

//using UnorderedMap = std::conditional_t< (sizeof(ValueT) <= sizeof(std::unique_ptr<int>)),

//                           absl::flat_hash_map<KeyT, ValueT>, absl::node_hash_map<KeyT, ValueT> >;


// This is a marker to help located positions in assembly code, should I want to do that.

void __attribute__((noinline)) bookmark() {__asm__("");}


constexpr int InitialSize = 1024;


int main() {


    UnorderedMap<unsigned int, unsigned int> myMap(InitialSize * .25);


    unsigned int startingPoint(10000000);  // some large arbitrary number so I don't start at 0.

    unsigned int index(startingPoint);


    std::cout << "my Bucket Count: " << myMap.bucket_count() << std::endl;


    for( ; index < startingPoint + 10; ++index )  // Add 10 entries

    {

        myMap.emplace( index, index+1);

    }


    // Verify that the bucket count has not changed after inserting a few elements

    std::cout << "my Bucket Count: " << myMap.bucket_count() << std::endl;


    int subTotal = 0;


    long int totalTime(std::chrono::duration<int>::zero().count());

    long int highTime(std::chrono::duration<int>::zero().count());

    long int lowTime(std::chrono::duration<int>::max().count());

    std::vector<unsigned int> times;


    std::cout << "Total Time: " << totalTime << std::endl;


    // Here is the bulk of the work. We are going to remove an item and insert a new item so that

    // a rehash shouldn't occur.  And we are going to do it a large number of times.

    // Along the way, let's clock how long an insert in the table takes.

    for(; index < startingPoint + 200000; ++index )

    {

        myMap.erase(index-10);  // remove an old entry

        bookmark();

        auto start = std::chrono::high_resolution_clock::now();

        myMap.emplace(index, index - 10);

        auto stop = std::chrono::high_resolution_clock::now();

        bookmark();


        // Now calc the the time and store it, before repeating

        auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(stop-start);

        auto nanos = duration.count();

        times.push_back(nanos);

        totalTime += nanos;

        if( nanos < lowTime)

            lowTime = nanos;

        else if( nanos > highTime )

            highTime = nanos;

    }

 

    std::sort(times.begin(), times.end());

    long int median(0);

    if( times.size()%2 == 0 )

    {

        median = (times[times.size()/2] + times[times.size()/2 - 1]) / 2;

    }

    else

        median = (times[(times.size()+1) / 2]);


    std::cout << "my Bucket Count: " << myMap.bucket_count() << std::endl

              << "highTime " << highTime

              << " lowTime " << lowTime

              << " average " << totalTime/times.size()

              << " median " << median << std::endl;


    std::cout << "size of myMap: " << myMap.size() << std::endl;

    return 0;

}

Cherry Nguyễn

unread,
Aug 21, 2020, 1:37:28 PM8/21/20
to Mean Dog, Abseil.io


Vào Th 7, 22 thg 8, 2020 lúc 00:19 Mean Dog <meand...@gmail.com> đã viết:
--


You received this message because you are subscribed to the Google Groups "Abseil.io" group.


To unsubscribe from this group and stop receiving emails from it, send an email to abseil-io+...@googlegroups.com.


To view this discussion on the web visit https://groups.google.com/d/msgid/abseil-io/9bbdd071-0468-412a-a9d4-1571b20d6975n%40googlegroups.com.


--
Nguyen thi Ngoc hue

Samuel Benzaquen

unread,
Aug 21, 2020, 2:03:20 PM8/21/20
to Cherry Nguyễn, Mean Dog, Abseil.io
On Fri, Aug 21, 2020 at 1:37 PM Cherry Nguyễn <cryng...@gmail.com> wrote:


Vào Th 7, 22 thg 8, 2020 lúc 00:19 Mean Dog <meand...@gmail.com> đã viết:
I've been playing around with absl flat hash map trying to compare it with std unordered map.
I wrote a small program, included below.   It's a simple program that adds 10 entries into a map.  Then it enters a large loop where it erases the oldest entry and inserts a new entry.  Each time the key is incremented by 1.  At no time during the loop do I expect a rehash to occur.

flat_hash_map uses open addressing with tombstones.
This pattern can lead to rehashes if the number of tombstones gets too large.
I don't know if it applies in this case, though.
 
  I am using std::chrono for timing and invoking the program with "taskset -c 2"

Timing this way is not recommended. Just getting the current clock can be expensive enough to skew the measurements.
Prefer using a framework optimized for micro-benchmarking like https://github.com/google/benchmark.
This benchmark does not necessarily represent real life behavior. It is really hard to make micro benchmarks that show what will happen in real production code.

In particular, it is using a single small table repeatedly, which will ensure the table is always hot in memory. One of the problems of unordered_map is the extra cache lines required which will not affect this benchmark. Real life code usually has tables that are not always hot in the L1 cache.
One way to solve this in the benchmark is to use many tables and iterate through them to guarantee you are always reading cold memory.

Another non-representative problem here is that the keys are sequential. Sequential keys will generate zero collisions in common unordered_map implementations.
This skews the results in favor of unordered_map. With a more real key distribution the probing of unordered_map generates significant collisions.

You can look https://github.com/google/hashtable-benchmarks for a very comprehensive suite of benchmarks for the most common hash table implementations, including folly's F14.

Cherry Nguyễn

unread,
Aug 21, 2020, 2:23:22 PM8/21/20
to Samuel Benzaquen, Abseil.io, Mean Dog
Thanks!

Vào Th 7, 22 thg 8, 2020 lúc 01:03 Samuel Benzaquen <sbe...@google.com> đã viết:
Reply all
Reply to author
Forward
0 new messages