Iteration over a Set where elements may be deleted in the process -- bug?

1,046 views
Skip to first unread message

David P. Sanders

unread,
Feb 27, 2014, 10:54:58 AM2/27/14
to julia...@googlegroups.com
I need to iterate over a Set whose elements will be deleted in the process.
However, some deleted elements are included in the iteration, e.g. the element 4 in the code below.

Is this a bug, or am I misunderstanding?

Thanks.

julia> s = Set(5, 3, 4, 6)
Set{Int64}({5,4,6,3})

julia> for i in s
       println("i=$i  s=$s")
       delete!(s, i-1)
       println("now s=$s\n")
       end
i=5  s=Set{Int64}({5,4,6,3})
now s=Set{Int64}({5,6,3})

i=4  s=Set{Int64}({5,6,3})
now s=Set{Int64}({5,6})

i=6  s=Set{Int64}({5,6})
now s=Set{Int64}({6})



David P. Sanders

unread,
Feb 27, 2014, 11:02:15 AM2/27/14
to julia...@googlegroups.com


El jueves, 27 de febrero de 2014 09:54:58 UTC-6, David P. Sanders escribió:
I need to iterate over a Set whose elements will be deleted in the process.
However, some deleted elements are included in the iteration, e.g. the element 4 in the code below.

According to the following, it is enough to check if the element is `in` the Set.
It seems wrong that this should be necessary, though!
 

julia> for i in s
       println("i=$i  s=$s  $(i in s)")
       delete!(s, i-1)
       println("now s=$s\n")
       end
i=5  s=Set{Int64}({5,4,6,3})  true
now s=Set{Int64}({5,6,3})

i=4  s=Set{Int64}({5,6,3})  false
now s=Set{Int64}({5,6})

i=6  s=Set{Int64}({5,6})  true
now s=Set{Int64}({6})

Stefan Karpinski

unread,
Feb 27, 2014, 11:23:45 AM2/27/14
to Julia Users
Modifying a collection while iterating over it is likely to cause strange things to happen. I have worried about this from time to time but I'm not sure how to fix it. One approach is to advance the state in next before returning the element, then always return an element that you know must be in the collection currently (rather than blindly assuming that the state value points to a valid element that's in the collection and then maintaining that invariant afterwards). But that still doesn't help if so many elements are added or removed from a hash (which is what Sets are based on) that it gets grown or shrunk and rehashed. In that case all hell can break loose.

One approach is to just say that this is how it is and you shouldn't modify a collection that you are iterating. That seems like asking for trouble. People *are* going to modify collections that they are iterating over. The open question is how it should behave and how it should be implemented.

Mauro

unread,
Feb 27, 2014, 11:43:56 AM2/27/14
to julia...@googlegroups.com
In Python it's bad to insert/delete items as well. And people do get
bitten. That doesn't mean it shouldn't be fixed though.
--

Kevin Squire

unread,
Feb 27, 2014, 12:15:19 PM2/27/14
to julia...@googlegroups.com
One fix would be to use the immutable Set type in the FunctionalCollections.jl package. 

Cheers, Kevin

Jameson Nash

unread,
Feb 27, 2014, 1:05:23 PM2/27/14
to julia...@googlegroups.com
Some interactive protocols allow deleting the last selected item. Although this can have a side effect of invalidating all other iterators on the same object so it is not pure. 

andrew cooke

unread,
Feb 27, 2014, 6:59:47 PM2/27/14
to julia...@googlegroups.com

true; but in python you get an exception.  julia is silently trashing the data...

andrew

Kevin Squire

unread,
Feb 27, 2014, 7:45:35 PM2/27/14
to julia...@googlegroups.com
So, no one has really explained why this is happening.

Sets are built on top of hash tables.  The hash table itself is just an array, many entries of which are not being used.

As with all iteration in Julia, for iteration over a Set, start, next, and done functions are defined (see http://docs.julialang.org/en/latest/stdlib/base/#iteration).

start(s::Set) iterates over the hash table and returns the index of the first used entry (no values are returned yet from the Set).

next(s::Set, i) returns (s.hash_table[i], i_next), where s.hash_table[i] is the value for the current iteration, and i_next is the location of the following entry in the hash table.

done(s::Set, i) returns true if we're done iterating (which will specifically be true when i > length(s.hash_table))

(Most of this is actually defined in Dict, upon which Set is defined, but the essentials are correct and easier to understand above, IMHO.)

The problem with modifying the table is two-fold:
  1. next(s::Set, i) has already returned the location of the next entry in the hash table; if you delete that entry, the next location is actually invalid
  2. adding values to the hash table may trigger a resize (removing does not)

Problem 1. could be fixed by having next() check the next value before returning it, which would allow removing elements from the Set.  Any change would likely make iteration over sets slightly less efficient.  Do we want to support this?

Adding elements to the Set will never be a good idea.

For now, I'll take back my recommendation about PersistentSets in the FunctionalCollections.jl package, as they seem to have other issues, and I'm not sure now if they'll actually work.

Kevin

Kevin Squire

unread,
Feb 27, 2014, 7:49:34 PM2/27/14
to julia...@googlegroups.com
true; but in python you get an exception.  julia is silently trashing the data...

Actually, no; in this case it's just not aware that the element has been removed. 


Kevin

andrew cooke

unread,
Feb 27, 2014, 8:18:49 PM2/27/14
to julia...@googlegroups.com


eh?  the code tries to delete all entries and, instead of an exception, data remains.

are you saying that's not "trashing" because the remaining value was there initially?

[big lebowski springs to mind]

i guess what i meant was something more along the lines of - the output from the routine is unpredictable and inconsistent with the expected semantics.  it would be safer if an exception was raised.

is that ok?

andrew cooke

unread,
Feb 27, 2014, 8:21:28 PM2/27/14
to julia...@googlegroups.com

oh, i'm sorry.  no, i;m wrong.  i missed the "-1".  ive completely misunderstood the thread.  :o(

sorry again,
andrew

David P. Sanders

unread,
Feb 28, 2014, 8:28:36 AM2/28/14
to julia...@googlegroups.com


El jueves, 27 de febrero de 2014 18:45:35 UTC-6, Kevin Squire escribió:
So, no one has really explained why this is happening.

Sets are built on top of hash tables.  The hash table itself is just an array, many entries of which are not being used.

As with all iteration in Julia, for iteration over a Set, start, next, and done functions are defined (see http://docs.julialang.org/en/latest/stdlib/base/#iteration).

start(s::Set) iterates over the hash table and returns the index of the first used entry (no values are returned yet from the Set).

next(s::Set, i) returns (s.hash_table[i], i_next), where s.hash_table[i] is the value for the current iteration, and i_next is the location of the following entry in the hash table.

done(s::Set, i) returns true if we're done iterating (which will specifically be true when i > length(s.hash_table))

(Most of this is actually defined in Dict, upon which Set is defined, but the essentials are correct and easier to understand above, IMHO.)

The problem with modifying the table is two-fold:
  1. next(s::Set, i) has already returned the location of the next entry in the hash table; if you delete that entry, the next location is actually invalid
  2. adding values to the hash table may trigger a resize (removing does not)

Problem 1. could be fixed by having next() check the next value before returning it, which would allow removing elements from the Set.  Any change would likely make iteration over sets slightly less efficient.  Do we want to support this?

Adding elements to the Set will never be a good idea.

For now, I'll take back my recommendation about PersistentSets in the FunctionalCollections.jl package, as they seem to have other issues, and I'm not sure now if they'll actually work.

OK, this seems to be a much deeper issue than I thought.

I am coming from C++, where this seems (?) to be a solved problem, in the sense that the standard specifies that iterators are not affected by deleting other elements.
(Though now I am doubting this.)

But this depends crucially on the underlying implementation apparently.

A solution, I guess, is to copy the Set into an array and iterate over the array, checking membership of each element.
But this seems to be very expensive.

David.

Stefan Karpinski

unread,
Feb 28, 2014, 12:39:21 PM2/28/14
to Julia Users
I've avoided this exact problem by copying and sometimes deep-copying dicts in https://github.com/JuliaLang/julia/blob/master/base/pkg/query.jl. Bugs caused by this were subtle and I can imagine this stumping someone for a long time if they don't understand how both dicts and iteration work. It would be nice if we could at least raise an error when someone attempts this. One possibility is to keep the some record of the initial dict state in the state of the iterator (allocated table size + number of used slots?) and raise an error in done if that has changed. Actually, maybe having an ever-increasing state counter in the dict structure that gets bumped every time the dict is modified a good way to go. Otherwise there are sequences of operations that can leave both the table size and the number of used slots the same – e.g. delete a key and add a different one.

Note that this problem is not restricted to dicts although it is worse there. If you modify the contents of an array while iterating it, unexpected things may happen. Of course, for arrays it's pretty straightforward – the state is the index into the array, so regardless of how you modify the array, you're at that index. I'm not sure what else one could expect this to do with a mutable data structure.

Toivo Henningsson

unread,
Feb 28, 2014, 1:41:59 PM2/28/14
to julia...@googlegroups.com
+1 for giving an error when trying to use an iterator where the underlying dict has changed.

Kevin Squire

unread,
Feb 28, 2014, 5:00:42 PM2/28/14
to julia...@googlegroups.com
On Thu, Feb 27, 2014 at 5:21 PM, andrew cooke <and...@acooke.org> wrote:

oh, i'm sorry.  no, i;m wrong.  i missed the "-1".  ive completely misunderstood the thread.  :o(

No worries at all!

Cheers,

   Kevin

Pierre-Yves Gérardy

unread,
Feb 28, 2014, 5:41:20 PM2/28/14
to julia...@googlegroups.com
To realize what you want to do, you can add the elements to be removed to another set.

On each iteration, verify that the current element isn't already in the set of elements to be removed, and after the loop, clean the first set using the elements of the second one.

—Pierre-Yves

David P. Sanders

unread,
Feb 28, 2014, 5:47:29 PM2/28/14
to julia...@googlegroups.com


El viernes, 28 de febrero de 2014 16:41:20 UTC-6, Pierre-Yves Gérardy escribió:
To realize what you want to do, you can add the elements to be removed to another set.

On each iteration, verify that the current element isn't already in the set of elements to be removed, and after the loop, clean the first set using the elements of the second one.


Ah, that's a neat solution, thanks! 

David.
Reply all
Reply to author
Forward
0 new messages