sparsehash can shrink even with min_load_factor(0)

35 views
Skip to first unread message

ShaunJ

unread,
Nov 3, 2009, 9:19:45 PM11/3/09
to google-sparsehash
When a large number of elements are deleted and a few more elements
inserted, the sparsehash can decide to "grow" to remove the deleted
elements, but when it calculates how many buckets it needs for the new
table without the deleted elements, it actually ends up with fewer
buckets than before.

In my case, I'm expecting a bunch of insertions followed by a bunch of
deletions followed by a bunch of insertions and so on, and the number
of buckets is constantly shrinking and growing. I presume that means
the hash function must be called for each element to find its new
bucket whenever the number of buckets changes.

Would it be possible for resize_delta to clean up deleted elements
without changing the number of buckets?

Cheers,
Shaun

Craig Silverstein

unread,
Nov 3, 2009, 10:05:26 PM11/3/09
to google-s...@googlegroups.com
} Would it be possible for resize_delta to clean up deleted elements
} without changing the number of buckets?

I think the right way to do this is to call
max_load_factor(xxx)
and
min_load_factor(xxx)

You can do min_load_factor(0.0) so we never resize smaller. That
means we'll never clear out deleted elements. I don't know if there's
a simple way to do that without also resizing the hashtable. I guess
you could do something like:
sparse_hash_map<...> new_map(mymap.size());
for (it = mymap.begin(); it != mymap.end(); ++it)
new_map->insert(*it);
mymap.swap(new_map);

This is more or less what the code does internally on a resize.

But I don't know how necessary such behavior is. My guess is that
will be slower than just calling min_load_factor(0.0) and just keeping
doing your inserts and deletes. If you find you're getting too
clogged with deleted keys, you can call
mymap.resize(0)
after you've done a bunch of insertions (so you're at the high end of
the spectrum), to keep the hashtable from getting smaller.

craig

ShaunJ

unread,
Nov 4, 2009, 12:41:47 PM11/4/09
to google-sparsehash
On Nov 3, 7:05 pm, Craig Silverstein <csilv...@google.com> wrote:
> } Would it be possible for resize_delta to clean up deleted elements
> } without changing the number of buckets?
>
> I think the right way to do this is to call
>    max_load_factor(xxx)
> and
>    min_load_factor(xxx)
>
> You can do min_load_factor(0.0) so we never resize smaller.  That
> means we'll never clear out deleted elements.
...

Hi Craig,

sparse_hashtable can shrink even with min_load_factor(0); I've seen
this in action. If the number of deleted items make up more than half
the number of elements in the table, the logic of resize_delta can in
fact choose a smaller number of buckets than were allocated
previously. It's this behaviour that I'm trying to prevent.

Cheers,
Shaun

ShaunJ

unread,
Nov 4, 2009, 1:20:44 PM11/4/09
to google-sparsehash
Hi Craig,

Here's a patch that prevents sparse_hashtable from reducing the number
of buckets if it's resizing `just to get rid of all the "deleted"
buckets that are clogging up the hashtable.' Currently, inserting an
element can cause the number of buckets to shrink.

Cheers,
Shaun

2009-11-04 Shaun Jackman <sjac...@gmail.com>

* google/sparsehash/sparsehashtable.h (resize_delta): Do not
shrink the number of buckets when inserting an element.

--- sparsehash-1.5.2/src/google/sparsehash/sparsehashtable.h.orig
2009-05-12 14:22:48.000000000 -0700
+++ sparsehash-1.5.2/src/google/sparsehash/sparsehashtable.h
2009-11-04 10:17:21.058413000 -0800
@@ -555,7 +555,7 @@
const size_type needed_size = min_size(table.num_nonempty() +
delta, 0);
if ( needed_size > bucket_count() ) { // we don't have
enough buckets
const size_type resize_to = min_size(table.num_nonempty() -
num_deleted
- + delta, 0);
+ + delta, bucket_count());
sparse_hashtable tmp(MoveDontCopy, *this, resize_to);
swap(tmp); // now we are tmp

Craig Silverstein

unread,
Nov 4, 2009, 1:39:26 PM11/4/09
to google-s...@googlegroups.com
} Here's a patch that prevents sparse_hashtable from reducing the
} number of buckets if it's resizing `just to get rid of all the
} "deleted" buckets that are clogging up the hashtable.' Currently,
} inserting an element can cause the number of buckets to shrink.

Insert causing a shrink is desired behavior: it's because we never
resize a hashtable on delete (in order to preserve iterators an
pointers while deleting).

So I think your patch would mean, in effect, we never shrink the
hashtable regardless of the min_load_factor, which isn't ideal.

I think, though, it makes sense to add a check there that never
resizes lower when min_load_factor is 0. I'll try to do something
like that for the next release.

craig

Craig Silverstein

unread,
Nov 4, 2009, 1:44:29 PM11/4/09
to google-s...@googlegroups.com
} So I think your patch would mean, in effect, we never shrink the
} hashtable regardless of the min_load_factor, which isn't ideal.

Never mind, I looked at the code more closely and see that your fix is
right, and my objection is wrong. I'll do a bit of testing, and look
to get this into the next release.

craig

ShaunJ

unread,
Nov 4, 2009, 2:45:04 PM11/4/09
to google-sparsehash
Great. Thanks, Craig.
Reply all
Reply to author
Forward
0 new messages