Stable Bloom Filter

Anthony May

unread,

May 18, 2016, 5:17:26 PM5/18/16

to algebird

Greetings,

For a particular streaming use case I require a Stable Bloom Filter, i.e. a Bloom Filter that has a maximum memory footprint that, once the limit is reached, can still be added to. Does algebird support this?

Anthony

P. Oscar Boykin

unread,

May 18, 2016, 11:12:38 PM5/18/16

to Anthony May, algebird

Algebird's bloomfilter by default uses javaewah for compressed bit sets.

It is going to grow as you add more elements, but there is some upper bound for any given set of parameters.

Do you want something that takes the number of bytes as a parameter? We don't have that.

--
You received this message because you are subscribed to the Google Groups "algebird" group.
To unsubscribe from this group and stop receiving emails from it, send an email to algebird+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
P. Oscar Boykin, Ph.D. | http://twitter.com/posco | http://pobox.com/~boykin

Anthony May

unread,

May 19, 2016, 5:04:43 PM5/19/16

to P. Oscar Boykin, algebird

Thanks for your reply Oscar. The use case is deduplicating messages from kafka in a spark streaming context. Taking a number of bytes as a parameter is not a requirement but being able to fit in memory indefinitely is. Obviously space and therefore time is a limitation for this deduplicating process, but if it were true that the Algebird bloom filter size was always in proportion to the dataset then it would grow without bound in a streaming dataset. It's good to know that isn't the case.

So now the question becomes do we have any idea of what the upper bounds are and what the probabilities are for various sets of parameters?

Anthony May

unread,

May 19, 2016, 8:21:33 PM5/19/16

to P. Oscar Boykin, algebird

Also Stable Bloom Filters avoid ballooning error rates on streamed data by randomly evicting data from the filter, in our case it would be older data which would require a time component. Does Algebird support this scenario?

Reply all

Reply to author

Forward