Insert and delete near the high watermark

6 views
Skip to first unread message

ShaunJ

unread,
Nov 4, 2009, 4:58:08 PM11/4/09
to google-sparsehash
When the number of elements in the table is exactly at the high water
mark (num_elements - num_deleted == bucket_count() *
enlarge_resize_percent) and there's one deleted element (num_deleted
== 1), inserting a new element will cause the table to clean itself
of deleted items but not enlarge itself, because

num_elements + delta > enlarge_threshold
but
num_elements - num_deleted + delta <= enlarge_threshold

After removing the deleted element, an expensive operation that
required rehashing every element in the table, the new element is
added. In a subsequent operation, an element is deleted, and a new
element is inserted, causing the table to again sweep deleted
elements.

Any thoughts on how this situation could be prevented? It seems as
though the table should allocate not just enough buckets for the
elements in the table, but also a reasonable number of deleted
elements, perhaps based on a high water mark of num_deleted.

Cheers,
Shaun

Craig Silverstein

unread,
Nov 4, 2009, 5:16:17 PM11/4/09
to google-s...@googlegroups.com
} When the number of elements in the table is exactly at the high
} water mark (num_elements - num_deleted == bucket_count() *
} enlarge_resize_percent) and there's one deleted element (num_deleted
} == 1), inserting a new element will cause the table to clean itself
} of deleted items but not enlarge itself, because

Interestingly, someone else had this exact same problem recently.
I've checked in a fix, but not yet pushed it to the svn. I will try
to get to that sometime this week.

In the meantime, here's the fix, which I'm afraid you'll have to apply
manually: add this code after the 'const size_type resize_to' line you
just changed, in resize_delta:

if (resize_to < needed_size) {
// This situation means that we have enough deleted elements,
// that once we purge them, we won't actually have needed to
// grow. But we may want to grow anyway: if we just purge one
// element, say, we'll have to grow anyway next time we
// insert. Might as well grow now, since we're already going
// through the trouble of copying (in order to purge the
// deleted elements).
if (table.num_nonempty() - num_deleted + delta >=
static_cast<size_type>(resize_to*2 * shrink_resize_percent)) {
// Good, we won't be below the shrink threshhold even if we double.
resize_to *= 2;
}
}

As part of this change, you'll need to make resize_to non-const.

Let me know how this works for you.

craig

Reply all
Reply to author
Forward
0 new messages