BloomFilters

Naveen Koorakula

unread,

May 1, 2008, 5:14:24 AM5/1/08

to hyperta...@googlegroups.com

Hello Doug, Luke,

Just wrote up a design spec for using BloomFilters in CellStores to reduce disk accesses when the key(s) being queried for are specified by the query. Please could you take a look and send me any comments / suggestions ?

http://code.google.com/p/hypertable/wiki/BloomFilters

One decision I left open is whether the usage of bloom filters should be configurable at a schema level. Any opinions ?

Thanks,

--Naveen

Doug Judd

unread,

May 1, 2008, 12:19:14 PM5/1/08

to hyperta...@googlegroups.com

Hi Naveen,

This looks fantastic! I think the best thing to do is to make the size of the bloom filter variable depending on the number of keys that it covers. You can imagine that an access group that stores columns of 4-byte integers are going to have many more keys that an access group that contains crawl data, for example. That would optimized storage and allow for a consistent error rate.

You might want to have the bloom filter configurable at the schema level. I would opt for having them on by default, but maybe there might be a need to disable them in certain scenarios. Maybe an error rate or bits/key tuning option might be useful as well.

- Doug

Gordon

unread,

May 1, 2008, 12:56:31 PM5/1/08

to hyperta...@googlegroups.com

Naveen,

Well done, this is an exceptionally nice piece of engineering work ...

Gordon

Reply all

Reply to author

Forward