So made some research
There are a lot of interesting implementations of BloomFilters(
hadoop,
breeze, and interesting
bloom-filter-scala), also there is interesting
blog post with comparison different implementations by size and speed(including algebird version ).
Also since version 2.0 Apache Spark also has its own bloom filter (similar to
guava/bloomFilter)
But the are all suffer from huge memory size.
For example Spark BloomFilter for 150 millions elements weight more than 1G
And now I am confused, since I have some Spark streaming application that need to check some condition in real time.
Oscar, can you please tell more about using distributed implementation?
Or may be I need to choose other solution not BloomFilter?
понедельник, 31 октября 2016 г., 20:39:14 UTC+3 пользователь P. Oscar Boykin написал: