unordered_map hash_table memory allocator

saka...@me.com

unread,

May 23, 2017, 6:43:53 AM5/23/17

to ISO C++ Standard - Discussion

TL;DR:

unordered's reserve() function and size constructor are useless because every new node allocates stuff
two systems, hash and buckets, under impossible to scope allocator

Hello! FP on the group. I encountered an issue trying to optimize my allocators (xcode LLVM library, but issue is also on GCC). I have ≥30k objects to initialize and find through an unordered_map container, where I initialize elements and if I already encountered them, I just use that version, or else I create a new member. Thanks for try_emplace on C++17 by the way!

Now there is a reserve() command and a size-aware constructor, which should work awesomely, where I reserve the number of elements I might have, it creates all the bucket RAM it needs at once.

The problem is there is also a node allocation made for every single node, even when all the buckets are created.

Now I might be mistaken, I'm not a guru of the scoped allocators, but I didn't find a single portable way to make sure I initialize everything at once. Even worse, there's no portable way I found to make sure I can use a normal allocator for the buckets and a pool allocator for the stray node allocators.

The workaround I found is to abuse heavily the scoped allocators through implementation-dependent overrides (code below). I know, this code could be shortened, but the gist should stay the same. Went through the gauntlet for this.

I think there should be a way to both bucket allocate and node allocate through different allocators, or modify the current algorithm to remove the node allocators, or at least expose cleanly some of the implementation details required to node-allocate and bucket-allocate so nutcases like me can figure ways to implement different allocators on both through portable means.

Any thoughts?

Thanks,

Michel

My implementation for llvm:

namespace unordered_map_allocimpl
{
   template <typename T>
   using stdAlloc = std::allocator<T>;

   template <typename T>
   using fastAlloc = fast_pool_allocator<T>;

   // This is our data type and its allocator. It's the simplest one
   template <typename T, typename U>
   using data_type = std::pair<const T, U>;

   template <typename T, typename U>
   using data_type_allocator = stdAlloc<data_type<T, U>>;

   // We must specifically instantiate every single rebind so we don't miss any
   template <typename T, typename U, typename Up> struct rebind_selector : public std::false_type {};

   template <typename T, typename U, typename Up> struct rebind_selector_checked : public rebind_selector<T, U, Up>
   {
      static_assert(rebind_selector<T, U, Up>::value, "Rebind not valid. Specialize rebind_selector");
   };

   // Our allocator, it sets itself as the data_type_allocator by default, and rebinds everything underneath
   template <typename T, typename U, typename AsA = data_type_allocator<T, U> >
   class allocator : public AsA
   {
   public:
      template<typename Up>
      using rebind = rebind_selector_checked<T, U, Up>;
   };

   // These are all the overrides required for the rebind of the current version of std::unordered_map. Fun, isn't it?
   template<typename T, typename U>
   struct rebind_selector<T, U, data_type<T, U>> : public std::true_type {using other = allocator<T, U, data_type_allocator<T,U>>;};


   template <typename T, typename U>
   using hash_value_type = typename std::__hash_value_type<T, U>;

   template <typename T, typename U>
   using hash_value_type_allocator = fastAlloc<hash_value_type<T, U>>;

   template<typename T, typename U>
   struct rebind_selector<T, U, hash_value_type<T, U>> : public std::true_type {using other = allocator<T, U, hash_value_type_allocator<T,U>>;};


   template <typename T, typename U>
   using hash_node_type = typename std::__make_hash_node_types<U, typename std::allocator_traits<data_type_allocator<T, U>>::void_pointer>::type::__node_type;

   template <typename T, typename U>
   using hash_node_type_allocator = fastAlloc<hash_node_type<T, U>>;

   template<typename T, typename U>
   struct rebind_selector<T, U, hash_node_type<T, U>> : public std::true_type {using other = allocator<T, U, hash_node_type_allocator<T,U>>;};


   template <typename T, typename U>
   using hash_node = typename std::__hash_node<hash_value_type<T, U>, typename std::allocator_traits<data_type_allocator<T, U>>::void_pointer>;

   template <typename T, typename U>
   using hash_node_allocator = fastAlloc<hash_node<T, U>>;

   template<typename T, typename U>
   struct rebind_selector<T, U, hash_node<T, U>> : public std::true_type {using other = allocator<T, U, hash_node_allocator<T,U>>;};


   template <typename T, typename U>
   using hash_node_base = typename std::__hash_node_base<hash_node<T, U>*>;

   template <typename T, typename U>
   using hash_node_base_allocator = fastAlloc<hash_node_base<T, U>>;

   template <typename T, typename U>
   using hash_node_base_allocator_ptr = stdAlloc<hash_node_base<T, U>*>;   // This one gets reserved for the buckets. Keep it as normal allocator

   template<typename T, typename U>
   struct rebind_selector<T, U, hash_node_base<T, U>> : public std::true_type {using other = allocator<T, U, hash_node_base_allocator<T, U>>;};

   template<typename T, typename U>
   struct rebind_selector<T, U, hash_node_base<T, U>*> : public std::true_type {using other = allocator<T, U, hash_node_base_allocator_ptr<T, U>>;};
}

template <typename T, typename U>
using unordered_map = std::unordered_map<T, U, std::hash<T>, std::equal_to<T>, unordered_map_allocimpl::allocator<T, U>>;

Nicol Bolas

unread,

May 23, 2017, 10:33:03 AM5/23/17

to ISO C++ Standard - Discussion

On Tuesday, May 23, 2017 at 6:43:53 AM UTC-4, Michel Donais wrote:

TL;DR:
unordered's reserve() function and size constructor are useless because every new node allocates stuff
two systems, hash and buckets, under impossible to scope allocator

Hello! FP on the group. I encountered an issue trying to optimize my allocators (xcode LLVM library, but issue is also on GCC). I have ≥30k objects to initialize and find through an unordered_map container, where I initialize elements and if I already encountered them, I just use that version, or else I create a new member. Thanks for try_emplace on C++17 by the way!

Now there is a reserve() command and a size-aware constructor, which should work awesomely, where I reserve the number of elements I might have, it creates all the bucket RAM it needs at once.

The problem is there is also a node allocation made for every single node, even when all the buckets are created.

Now I might be mistaken, I'm not a guru of the scoped allocators, but I didn't find a single portable way to make sure I initialize everything at once.

I'm not 100% up on scoped allocators, but I'm not sure that's their purpose.

As I understood it, scoped allocators are what allow you to have a container of containers, such that the inner containers use the same allocator as the outer ones. Whether this allows you to allocate all memory used by the container of containers at once is a separate issue, defined entirely by the container behavior and your use of them.

Even worse, there's no portable way I found to make sure I can use a normal allocator for the buckets and a pool allocator for the stray node allocators.

Yes, that's true. Unfortunately, the standard does not recognize a way for allocators to determine the purpose of an allocation: nodes, arrays, arrays of some type other than the `T` the original allocator is meant for, etc. The allocator is only told to allocate some number of `U`s, not the meaning of the `U` itself.

Michel Donais

unread,

May 23, 2017, 11:18:00 AM5/23/17

to ISO C++ Standard - Discussion

> Yes, that's true. Unfortunately, the standard does not
> recognize a way for allocators to determine the purpose
> of an allocation: nodes, arrays, arrays of some type
> other than the `T` the original allocator is meant for, etc.
> The allocator is only told to allocate some number of
> `U`s, not the meaning of the `U` itself.

True. Hence the broad questioning. There have been other similar topics in the past, however, I got onto a case where:
- unordered_map is mostly made of another private subpart (hash table)
- hash_table is made from two other parts, the buckets and the allocation chained list, both private
- then, there's our data itself.

As code is getting into more of a building blocks system, it then makes sense to question whether this is the exception (and then, why is it an exception) or if will eventually become a rule.

Also, all the other containers are relatively self-explanatory and react simply to allocators but these two are of a totally different architecture. IMHO again, since unordered_* are meant mostly for hard core optimizers, as there's little incentive for most people to understand anything else than regular set,map,multi*. So it beats the purpose.

Nicol Bolas

unread,

May 23, 2017, 12:33:21 PM5/23/17

to ISO C++ Standard - Discussion

On Tuesday, May 23, 2017 at 11:18:00 AM UTC-4, Michel Donais wrote:

> Yes, that's true. Unfortunately, the standard does not
> recognize a way for allocators to determine the purpose
> of an allocation: nodes, arrays, arrays of some type
> other than the `T` the original allocator is meant for, etc.
> The allocator is only told to allocate some number of
> `U`s, not the meaning of the `U` itself.
True. Hence the broad questioning. There have been other similar topics in the past, however, I got onto a case where:
- unordered_map is mostly made of another private subpart (hash table)
- hash_table is made from two other parts, the buckets and the allocation chained list, both private
- then, there's our data itself.
As code is getting into more of a building blocks system, it then makes sense to question whether this is the exception (and then, why is it an exception) or if will eventually become a rule.
Also, all the other containers are relatively self-explanatory and react simply to allocators but these two are of a totally different architecture.

It's not as different as you might think. The unordered container implementations use an allocation pattern very similar to many `deque` implementations. You have an array whose type and size are independent of the `T`. This array can grow or shrink. And you have some nodes whose size is dependent on the `T`, and as you insert/remove more items, you create or destroy these nodes.

The same general allocation structure could handle both kinds of containers.

IMHO again, since unordered_* are meant mostly for hard core optimizers,

I don't agree with that. There are plenty of times when you can predict beforehand that your particular needs require a hash table. If you have a large data set, and you need fast access, and you're not interested in an in-order traversal, there's no reason you shouldn't use `unordered_` containers. The allocation patterns may not be your primary performance issues.

Hard core optimizers need tight control over allocations. But there are plenty of people who just need a fast hash table, and `unordered_*` provides that.

Michel Donais

unread,

May 23, 2017, 1:56:13 PM5/23/17

to ISO C++ Standard - Discussion

Le mardi 23 mai 2017 12:33:21 UTC-4, Nicol Bolas a écrit :

It's not as different as you might think. The unordered container implementations use an allocation pattern very similar to many `deque` implementations. You have an array whose type and size are independent of the `T`. This array can grow or shrink. And you have some nodes whose size is dependent on the `T`, and as you insert/remove more items, you create or destroy these nodes.

The same general allocation structure could handle both kinds of containers.

Exactly! And this wouldn't be much of a problem if the building blocks were exposed and if I could actually pass them as parameters, such as all the other containers, where I can decide to use a deque or a vector or whatnot as a building block from them. So this would allow me to actually provide implementations with different allocators and I'd be merry.

IMHO again, since unordered_* are meant mostly for hard core optimizers,

I don't agree with that. There are plenty of times when you can predict beforehand that your particular needs require a hash table. If you have a large data set, and you need fast access, and you're not interested in an in-order traversal, there's no reason you shouldn't use `unordered_` containers. The allocation patterns may not be your primary performance issues.

Hard core optimizers need tight control over allocations. But there are plenty of people who just need a fast hash table, and `unordered_*` provides that.

I live in a world where most people use a vector to do deque jobs. So I'd say most stl is actually hardcore for most people. But even with more knowledgeable people, the unordered_* containers are later additions and are only starting to get recognition. It's awesome as they only require hash and equal operators, and no less operator, which can be debatable (what colour would be < another one in a RGBA...). But they require knowledge of hashes, of containers, of why it's faster/better/different than map/set, ... -- So I stay on my idea it's something to optimize algorithms. But then who cares, really :) You bring out awesome points too, let's agree to disagree, but agree unordered_* should be used more often than they currently are.