When a large number of elements are deleted and a few more elements
inserted, the sparsehash can decide to "grow" to remove the deleted
elements, but when it calculates how many buckets it needs for the new
table without the deleted elements, it actually ends up with fewer
buckets than before.
In my case, I'm expecting a bunch of insertions followed by a bunch of
deletions followed by a bunch of insertions and so on, and the number
of buckets is constantly shrinking and growing. I presume that means
the hash function must be called for each element to find its new
bucket whenever the number of buckets changes.
Would it be possible for resize_delta to clean up deleted elements
without changing the number of buckets?
} Would it be possible for resize_delta to clean up deleted elements } without changing the number of buckets?
I think the right way to do this is to call max_load_factor(xxx) and min_load_factor(xxx)
You can do min_load_factor(0.0) so we never resize smaller. That means we'll never clear out deleted elements. I don't know if there's a simple way to do that without also resizing the hashtable. I guess you could do something like: sparse_hash_map<...> new_map(mymap.size()); for (it = mymap.begin(); it != mymap.end(); ++it) new_map->insert(*it); mymap.swap(new_map);
This is more or less what the code does internally on a resize.
But I don't know how necessary such behavior is. My guess is that will be slower than just calling min_load_factor(0.0) and just keeping doing your inserts and deletes. If you find you're getting too clogged with deleted keys, you can call mymap.resize(0) after you've done a bunch of insertions (so you're at the high end of the spectrum), to keep the hashtable from getting smaller.
On Nov 3, 7:05 pm, Craig Silverstein <csilv...@google.com> wrote:
> } Would it be possible for resize_delta to clean up deleted elements
> } without changing the number of buckets?
> I think the right way to do this is to call
> max_load_factor(xxx)
> and
> min_load_factor(xxx)
> You can do min_load_factor(0.0) so we never resize smaller. That
> means we'll never clear out deleted elements.
...
Hi Craig,
sparse_hashtable can shrink even with min_load_factor(0); I've seen
this in action. If the number of deleted items make up more than half
the number of elements in the table, the logic of resize_delta can in
fact choose a smaller number of buckets than were allocated
previously. It's this behaviour that I'm trying to prevent.
Here's a patch that prevents sparse_hashtable from reducing the number
of buckets if it's resizing `just to get rid of all the "deleted"
buckets that are clogging up the hashtable.' Currently, inserting an
element can cause the number of buckets to shrink.
> On Nov 3, 7:05 pm, Craig Silverstein <csilv...@google.com> wrote:> } Would it be possible for resize_delta to clean up deleted elements
> > } without changing the number of buckets?
> > I think the right way to do this is to call
> > max_load_factor(xxx)
> > and
> > min_load_factor(xxx)
> > You can do min_load_factor(0.0) so we never resize smaller. That
> > means we'll never clear out deleted elements.
> ...
> Hi Craig,
> sparse_hashtable can shrink even with min_load_factor(0); I've seen
> this in action. If the number of deleted items make up more than half
> the number of elements in the table, the logic of resize_delta can in
> fact choose a smaller number of buckets than were allocated
> previously. It's this behaviour that I'm trying to prevent.
} Here's a patch that prevents sparse_hashtable from reducing the } number of buckets if it's resizing `just to get rid of all the } "deleted" buckets that are clogging up the hashtable.' Currently, } inserting an element can cause the number of buckets to shrink.
Insert causing a shrink is desired behavior: it's because we never resize a hashtable on delete (in order to preserve iterators an pointers while deleting).
So I think your patch would mean, in effect, we never shrink the hashtable regardless of the min_load_factor, which isn't ideal.
I think, though, it makes sense to add a check there that never resizes lower when min_load_factor is 0. I'll try to do something like that for the next release.
} So I think your patch would mean, in effect, we never shrink the } hashtable regardless of the min_load_factor, which isn't ideal.
Never mind, I looked at the code more closely and see that your fix is right, and my objection is wrong. I'll do a bit of testing, and look to get this into the next release.