erase doesn't return an iterator (thankfully)

5 views
Skip to first unread message

ShaunJ

unread,
Nov 9, 2009, 2:38:14 PM11/9/09
to google-sparsehash
The C++ standard says that unordered_set::erase returns an iterator to
the next element:
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-closed.html#579

erase should be O(1) and in the worst case O(size()), whereas, for
most hash table implementations ++it is O(bucket_count()). In
particular, if the has table only has one element, removing it
requires checking every single bucket in the data structure! Ick.

I've posted a bug on GCC's bugzilla on this topic:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41975

It seems the best suggestion is for erase to return a proxy to an
iterator that only performs the ++it operation if necessary.

Since the interface of sparse_hash_set largely mirrors unordered_set,
I was curious if you had any opinion on the topic.

Cheers,
Shaun

Craig Silverstein

unread,
Nov 9, 2009, 4:10:52 PM11/9/09
to google-s...@googlegroups.com
} Since the interface of sparse_hash_set largely mirrors
} unordered_set, I was curious if you had any opinion on the topic.

I should probably mirror unordered_set, shouldn't I? In
sparse_hash_set, ++ is constant-time, so there's no reason not to
support it, even if the spec doesn't require it. For dense_hash_set,
it's a different story.

I agree with the "Toronto consensus" that if needed, the iterator
could be its own proxy. I may do that for dense_hash_map; it adds a
bit of code complexity, but not that much.

craig

Craig Silverstein

unread,
Nov 9, 2009, 6:29:40 PM11/9/09
to google-s...@googlegroups.com
btw, I saw your post about this at
http://archives.free.net.ph/message/20091106.155438.7022db19.en.html

In the situation you describe, you can get around the cost of it++ by
calling erase(*it) instead of erase(it). It means an extra hash, but
if that's faster than the ++it it may be a better choice.

craig

ShaunJ

unread,
Nov 10, 2009, 1:48:35 PM11/10/09
to google-sparsehash
On Nov 9, 1:10 pm, Craig Silverstein <csilv...@google.com> wrote:
> } Since the interface of sparse_hash_set largely mirrors
> } unordered_set, I was curious if you had any opinion on the topic.
>
> I should probably mirror unordered_set, shouldn't I?  In
> sparse_hash_set, ++ is constant-time, so there's no reason not to
> support it, even if the spec doesn't require it.  For dense_hash_set,
> it's a different story.

++it is not constant time for sparse_hash_set in the worst case, where
it is
O(bucket_count() - size())

For my application, to prevent resizing the hash table, I set the
number of buckets to an appropriate size for the data set before
starting the algorithm. However, if the data set is sorted, it inserts
one element then removes one, then inserts one, and so forth, which
takes the worst case for every erase!

Cheers,
Shaun

ShaunJ

unread,
Nov 10, 2009, 1:54:09 PM11/10/09
to google-sparsehash
Thanks for the tip. It is a usable workaround, but it's not ideal. I
dislike having to hurt the runtime of the typical case to prevent
disaster in the worst case.

Cheers,
Shaun

ShaunJ

unread,
Dec 31, 2009, 2:44:49 PM12/31/09
to google-sparsehash

------- Comment #10 From Paolo Carlini 2009-11-29 08:34 [reply]
-------

Stefan is right. The issue, in full generality, isn't trivial at all,
there is
now a new discussion on the library reflector. I'm under the
impression that
for C++0x we are not going to standardize the minimum load factor
suggested by
Matt, seems much more likely that erase will be just changed to return
void,
there is a growing consensus about that.

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41975#c10

Craig Silverstein

unread,
Jan 4, 2010, 4:22:51 PM1/4/10
to google-s...@googlegroups.com
OK, it sounds like it's not settled if erase will return an iterator
or not. For now, I'll leave it returning void, and come back to visit
it later when things are more settled.

Thanks for the update.

craig

ShaunJ

unread,
Feb 17, 2010, 1:04:26 PM2/17/10
to google-sparsehash
The erase function of boost's unordered_set returned an iterator.
They've added erase_return_void as a temporary work-around.
http://beta.boost.org/doc/libs/1_42_0/doc/html/boost/unordered_set.html#id1341161-bb

> This is a temporary workaround for the inefficient erase method. Hopefully, in a future version the signature of erase will be changed and this will be deprecated.

Cheers,
Shaun

ShaunJ

unread,
Mar 10, 2010, 12:55:30 PM3/10/10
to google-sparsehash
GCC has reverted the patch that changes the return type of erase to
void, so GCC once again returns an iterator:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41975

It looks as though GCC will follow the lead of boost, adding a
function erase_return_void. Perhaps Google sparsehash could follow
suit.

Cheers,
Shaun

Craig Silverstein

unread,
Mar 10, 2010, 12:56:51 PM3/10/10
to google-s...@googlegroups.com
} It looks as though GCC will follow the lead of boost, adding a
} function erase_return_void. Perhaps Google sparsehash could follow
} suit.

It sounds like things are still in flux. I'm going to wait for
everything to settle down a bit before changing sparsehash.

craig

Reply all
Reply to author
Forward
0 new messages