BloomFilter - *broken* in guava 11.0 and 11.0.1

315 views
Skip to first unread message

Dimitris Andreou

unread,
Feb 2, 2012, 4:49:37 PM2/2/12
to guava-discuss
Hi all,

If you are using Guava 11.0 or 11.0.1and using BloomFilter, or considering to, be warned: it is very much broken! It will be fixed in an upcoming 11.0.2, stay tuned. Moreover, we won't be supporting deserializing serialized BloomFilter instances of those versions of Guava, but we will be supporting instances from 11.0.2 and beyond.

The problem is a very serious performance bug: the internal hash function generated O(N) bits, instead of O(logN) bits, for a BloomFilter of size N. That means, it's exponentially slower than what it should have been. In extreme cases, this number can overflow, and the construction of the BF will fail with an exception.

Unfortunately, since technically the BloomFilter continued working, and since I wasn't testing with huge BloomFilters, this error got unnoticed. And it didn't help that we had no performance tests, or not tried it yet in production. In hindsight, a naive caliper benchmark just to see how the structure performance scales would easily have caught this.

We, and I personally, are very sorry for pushing this code out early, and the mess this caused. We didn't realize the risk. In fact, this bug was introduced at my very last touch in the file, just before having it released in Guava, while trying to rush a feature in, that I thought central in BloomFilters: support for serialization. Ironically, I got the serialization part subtly wrong too (it was functional, but definitely not something we would want to live with).

Looking forward for the 11.0.2 release!

Thanks,
Dimitris

Kevin Bourrillion

unread,
Feb 2, 2012, 4:58:52 PM2/2/12
to Dimitris Andreou, guava-discuss
Echoed apologies... please help us spread the word that anyone using BF really must upgrade to the imminent 11.0.2.

When Guava was young, I had a clear notion that we would hold things back from it until they were widely adopted and battle-tested inside the company first.  I figured that would give the public users the highest possible confidence.  Obviously I moved away from that idea... getting our stuff out where you all can use it is just, I suppose, too exciting to resist.

In this case, uptake for such a specialized thing as this doesn't happen overnight, and the users of our ancient legacy BloomFilter (you don't want to see it) were in no particular rush to migrate. So no one tripped the bug.

Thanks to the SO user who reported it:



--
Kevin Bourrillion @ Google
Java Core Libraries Team
http://guava-libraries.googlecode.com

Reply all
Reply to author
Forward
0 new messages