Hi all,
I ran into some performance degradation in one of my packages yesterday moving from vectors to sets and after some experimentation and research I think I've figured it out. I'd appreciate an extra set of eyes on this; it might also be a good candidate for the docs.
(Possibly flawed) assertions:
A) vectors: O(1) insertion, O(n) search (though this can be optimized to O(logN) in some cases?), O(1) delete (by index), O(n) traversal/iteration
B) sets: O(logN) insertion, O(1) search, O(1) delete (by value), O(?) traversal/iteration [q1]
C) dicts: O(1) insertion, O(n) search [q2], O(n) delete (by value), O(1) delete by key, O(?) traversal/iteration [q3]
D) Sets are just dicts where the key is some (automatic) hash of the value.
[q1] - This is where my original performance issues cropped up, in iterating over a set. We spent time in dict.jl, in skip_deleted().
[q2] - not sure whether this is correct.
[q3] - iteration over a dict seems expensive relative to a vector. This is related to [q1] as well because of assertion D).
Thank you for any guidance/feedback.