--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
The implementation and performance necessitates it. std::map is
implemented as a binary tree of some kind (generally a red-black
tree), so all currently existing elements do not move in memory when
you add a new element. You just add a new node, and update some links.
unordered_map is implemented as a hash map. "Effectively", there's a
big array of lists of elements. To maintain good performance, the hash
map must grow this array of lists when a certain load factor /
threshold is reached. Thus inserting an element could cause a
reallocation, meaning all of the old iterators now point to memory
which is no longer being used by the hash map.
Yes, but the iterators are supposed to point to nodes of the
individual lists. These nodes do not have to move in memory when the
big array gets resized (nodes only get unlinked from their former
lists and spliced to new lists). Moreover, one can additionally splice
all nodes to a global list of all elements in the map so that stepping
from an element to an element during enumeration (calling
hash_map::iterator::operator++()) would be a constant-time operation,
like in std::map.
Just for your information, both SGI and Dinkumware provide hash_map as
an extension to the standard library. Both the implementations follow
the same rules about iteration invalidation like std::map. In
Dinkumware's implementation, hash_map::iterator::operator++() is not
guaranteed to be O( 1 ) though. I do not know how iteration is
implemented in SGI's hash_map.
-Marek
To do as you want, each node would have to store extra information.
Each node would have a pointer to the hash map, know which bucket it's
in, and know the offset into the bucket (or an equivalent). Probably 3
words worth of memory. This increases the memory footprint Big O
(number of items) (and by extension increased cache misses, etc.). It
also adds an amortized constant runtime cost to every insert (both to
set this information on insert, and to reset all of this information
on each grow).
I guess they decided this extra cost wasn't worth the iterator
guarantee.
Sorry, no. You do not need this information. I have implemented this
before.
-Marek
I apologize. It was just an off the cuff comment. Now, it's late so
I'm sorry if I'm missing the obvious quick solution. If I am, please
enlighten me. Here's a partial list of the ways it can be done off the
top of my head.
An iterator which becomes invalid after an insert could be
- pointer to node. Nodes point to adjacent nodes. [Hash map updates
links on insert.]
- pointer to hash map, an int for which bucket, and an int for the
offset into the bucket.
An iterator which stays valid after an insert could be
- pointer to node. Nodes point to adjacent nodes. [Hash map updates
links on insert and grow.]
- pointer to node. Nodes point to their containing hash map.
- pointer to node. Nodes point to their containing hash map. Nodes
cache their hash value.
- pointer to node. Nodes point to their containing hash map. Iterators
cache the hash value.
- pointer to node and pointer to hash map.
- pointer to node and pointer to hash map. Nodes cache their hash
value.
- pointer to node and pointer to hash map. Iterators cache the hash
value.
Some are obviously cheaper. Most involve cost comparisons involve a
balancing of the cost to insert and to iterate and the memory
footprint.
This appears to definitely be getting into the area where one should
measure and not make a snap judgment. So, why does the TR1 hash map
invalidate its iterators? I don't know. I'm sorry I answered. I should
not have.
However, by forcing me to think about it, I have started to share you
curiosity as to why it was done with iterators that are invalidated on
insert.