Hi,
I have noticed a performance anomaly with ObjectOpenHashSet, when adding elements to a new ObjectOpenHashSet in the order we get them back from a different ObjectOpenHashSet. If the new ObjectOpenHashSet is pre-sized to be large enough, there is not a problem, but if it isn't, performance can get very bad (many orders of magnitude worse than using the original insertion order).
In the attached test, things work great up to a collection size of about 8,300,000 elements. We can add the generated Strings to an ObjectOpenHashSet in 2-3 seconds, regardless of the order of insertion. But when we try 8,400,000 elements, things go bad - we can add the generated Strings in about 3.6 seconds, but if we try to insert them into another ObjectOpenHashSet in the order the iterator returns them to us, insertion time goes up to 542 seconds!
If I'm patient enough to wait for the 8,500,000 element test to complete, we see that insertion time goes up to 2,472 seconds (41 minutes); for 8,600,000 elements, it goes up to 3,914 seconds (65 minutes) . I understand that I "should" pre-size the Set appropriately to start with; in my particular use case, I'm loading data from an external data source, and I can't know ahead of time how many entries there will be. I expect the performance not to depend too heavily on the order of insertion.
I suspect the poor performance may be related to hash collisions, but I am failing to understand why the insertion order makes THAT much of a difference.
I am running this test under Java 17, with a 4 GB heap. I don't see heap usage go higher than about 1.2 GB. I'm using fastutil-8.5.11.
Here is the output from the test:
Machine OS: Mac OS X 12.6.3 (x86_64)
Java JDK: Azul Systems, Inc. 17.0.6+10-LTS
Adding 1000000 Strings to the set...
Added to new default-sized set in 363 ms
Iterated the set in 18 ms
Created a new set via constructor in 47 ms
Added in iteration order to a new pre-sized set in 46 ms
Added in iteration order to a new default-sized set in 147 ms
Adding 2000000 Strings to the set...
Added to new default-sized set in 628 ms
Iterated the set in 32 ms
Created a new set via constructor in 132 ms
Added in iteration order to a new pre-sized set in 135 ms
Added in iteration order to a new default-sized set in 446 ms
Adding 4000000 Strings to the set...
Added to new default-sized set in 1386 ms
Iterated the set in 61 ms
Created a new set via constructor in 170 ms
Added in iteration order to a new pre-sized set in 157 ms
Added in iteration order to a new default-sized set in 538 ms
Adding 8000000 Strings to the set...
Added to new default-sized set in 3088 ms
Iterated the set in 188 ms
Created a new set via constructor in 445 ms
Added in iteration order to a new pre-sized set in 350 ms
Added in iteration order to a new default-sized set in 1198 ms
Adding 8100000 Strings to the set...
Added to new default-sized set in 3055 ms
Iterated the set in 131 ms
Created a new set via constructor in 412 ms
Added in iteration order to a new pre-sized set in 310 ms
Added in iteration order to a new default-sized set in 1211 ms
Adding 8200000 Strings to the set...
Added to new default-sized set in 2784 ms
Iterated the set in 131 ms
Created a new set via constructor in 344 ms
Added in iteration order to a new pre-sized set in 340 ms
Added in iteration order to a new default-sized set in 1484 ms
Adding 8300000 Strings to the set...
Added to new default-sized set in 3098 ms
Iterated the set in 140 ms
Created a new set via constructor in 497 ms
Added in iteration order to a new pre-sized set in 462 ms
Added in iteration order to a new default-sized set in 2274 ms
Adding 8400000 Strings to the set...
Added to new default-sized set in 3324 ms
Iterated the set in 152 ms
Created a new set via constructor in 509 ms
Added in iteration order to a new pre-sized set in 491 ms
Added in iteration order to a new default-sized set in 542879 ms
Adding 8500000 Strings to the set...
Added to new default-sized set in 3635 ms
Iterated the set in 145 ms
Created a new set via constructor in 413 ms
Added in iteration order to a new pre-sized set in 348 ms
Added in iteration order to a new default-sized set in 2472647 ms
Adding 8600000 Strings to the set...
Added to new default-sized set in 4200 ms
Iterated the set in 154 ms
Created a new set via constructor in 621 ms
Added in iteration order to a new pre-sized set in 565 ms
Added in iteration order to a new default-sized set in 3914421 ms
--
You received this message because you are subscribed to the Google Groups "fastutil" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fastutil+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fastutil/78543EBF-7398-4E7D-8482-FA33D19093EA%40comcast.net.
--
You received this message because you are subscribed to the Google Groups "fastutil" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fastutil+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fastutil/78543EBF-7398-4E7D-8482-FA33D19093EA%40comcast.net.
--
You received this message because you are subscribed to the Google Groups "fastutil" group.
To unsubscribe from this group and stop receiving emails from it, send an email to fastutil+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fastutil/3F4BCDFA-F792-4312-8E3A-36D2C4FB25FD%40unimi.it.
To view this discussion on the web visit https://groups.google.com/d/msgid/fastutil/67F43F57-545A-4B44-825C-353E5C1FAF7F%40comcast.net.
>With regard to HPPC, and if you're curious, this is what's currently
>implemented (with due thanks to Vincent Sonnier and Bruno Roustant):
>https://issues.carrot2.org/browse/HPPC-186
OMG NOBODY TOLD ME ABOUT THIS 😭
because it guards against the collision avalanche (great term, by the way!) regardless of where the input was coming from (the result of iterating another HashSet, or some other mechanism that accidentally put things in a "bad" insertion order).