--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/GyZ6XOlKDMkJ.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
> What would the interface (the command set) look like?
I think an hypothetical interface could be able to do this with just
two commands one doing SETBIT in the specified bit range (always using
SHA1 or alike against the given string), and one for mass-testing bits
given a string and just return a bool.
So it's like:
BLOOM SET keyname <filter size> "whatever string"
BLOOM TEST keyname <filter size> "whatever string"
The first command will hash the string and set the bits accordingly.
The second command will test all the bits returning 0 or 1.
Also a "BLOOM FLUSH" could be useful as well.
Basically no new data type is added, this just operates on strings.
This also means you can get your filter using just GET <keyname>.
The implementation should be trivial. However there is always the
"problem" that scripting would solve this very well, we even have
Redis.sha1() in Lua.
Worth opening an issue to reason about it.
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
The thing is, bloom filters are sufficiently nuanced in the way they
work (in particular how many bits to set, what will be the FP rate
given a total number of insertions, etc.) that just having the
commands may be more confusing than not.
At least with leaving it to 3rd party libraries (which can use
pipelining now, scripting later), they can include all of the useful
utility functions for calculating proper bloom filter size and number
of bits to set.
Regards,
- Josiah
Right, the interface should have an additional argument indeed.
> The thing is, bloom filters are sufficiently nuanced in the way they
> work (in particular how many bits to set, what will be the FP rate
> given a total number of insertions, etc.) that just having the
> commands may be more confusing than not.
>
> At least with leaving it to 3rd party libraries (which can use
> pipelining now, scripting later), they can include all of the useful
> utility functions for calculating proper bloom filter size and number
> of bits to set.
Usually I tend to think exactly along on this lines, however I think
that in this specific case it is worth testing the performances we can
achieve using Lua scripting, since testing for, for instance, 128
bits, can be pretty slow without scripting, can be much faster with
scripting, but still may be one or even two order of magnitudes slower
than if done inside Redis.
I really think the only way to know this is by testing it.
However I guess that the commands footprint should be like:
BLOOM SET key <bits> <size> <string>
BLOOM TEST key <bits> <size> <string>
BLOOM CLEAR key
It does not seems like a too complex interface to me.
SET/TEST would use the hash function in "counter mode" so that it's
not a problem to have more bits than the output size.
No new type is needed so this is completely backward / forward
compatible from the point of view of RDB files.
Sounds like a very easy task... I'm sure not more than 100 lines of
code, so if the difference in performances is as extreme as I think
may be worth it. Also I've the feeling that providing direct support
would boost the adoption of such a great data structure.
Salvatore
If we're in the subject of new data structures, I'd like to re-raise my request for a simple random access vector structure.sure, the functionality can be achieved with a string, a list or a sorted sets, but each have their drawbacks in terms of memory, flexibility and performance.I know I'd personally use it.
nessDB's bloom filter implementation(c code) https://github.com/shuttler/nessDB/blob/v1.7/src/bloom.c
BohuTANG
> nessDB's bloom filter implementation(c code)
> https://github.com/shuttler/nessDB/blob/v1.7/src/bloom.c
Writing the code is the simplest step ;)
The hard part is understanding if it can be avoided at all, and
designing the interface well if it can not be avoided.
Salvatore
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.