Hi all,
If you are using Guava
11.0 or
11.0.1,
and using
BloomFilter, or considering to, be warned: it is very much
broken! It will be fixed in an upcoming 11.0.2, stay tuned. Moreover, we won't be supporting deserializing serialized BloomFilter instances of those versions of Guava, but we will be supporting instances from 11.0.2 and beyond.
The problem is a very serious performance bug: the internal hash function generated O(N) bits, instead of O(logN) bits, for a BloomFilter of size N. That means, it's exponentially slower than what it should have been. In extreme cases, this number can overflow, and the construction of the BF will fail with an exception.
Unfortunately, since technically the BloomFilter continued working, and since I wasn't testing with huge BloomFilters, this error got unnoticed. And it didn't help that we had no performance tests, or not tried it yet in production. In hindsight, a naive caliper benchmark just to see how the structure performance scales would easily have caught this.
We, and I personally, are very sorry for pushing this code out early, and the mess this caused. We didn't realize the risk. In fact, this bug was introduced at my very last touch in the file, just before having it released in Guava, while trying to rush a feature in, that I thought central in BloomFilters: support for serialization. Ironically, I got the serialization part subtly wrong too (it was functional, but definitely not something we would want to live with).
Looking forward for the 11.0.2 release!
Thanks,
Dimitris