] I've written an implementation of Bloom filters, using the BitSet package.
] You can find it at
] As usual, comments welcome.
Hello, a while ago I made something similar:
I guess the main differences are that the hash algorithm is
configurable when creating the filter and that instead of providing
the number of hashes up front and estimating the false positive
probability, you provide the probability and it calculates the number
of hashes it needs. I think users of the package have an easier time
getting a handle on p than k.
Apart from that, BitSet is nice if it had existed a few months ago
I could have avoided a lot of messy arithmetic :)
] Actually, I do neither :) I just allow a way to estimate the FP rate, given
] n (cardinality of the set), m (size of the bloom filter in bits) and k (the
] number of hashes).
Yes, I see. It's just that (m, k) which need to be provided to the
intialiser are not what I think the average user of a bloom filter
will have in hand - they're more likely to be able to give reasonable
values for (n, p). Just a bit of algebra to convert them, but
worthwhile because it saves users of the library having to look up and
implement the formula.
] Are you using your Bloom filter for anything in production?
Nah. I have some back-burner plans to try to speed up some graph
traversal things, but they remain on the back burner. Feel free to
take any parts of my implementation you might find useful.
Hello, I seem to be unable to Unmarshal a bitset over size 32..
I've written an implementation of Bloom filters, using the BitSet package. You can find it at