Well done, this is an exceptionally nice piece of engineering work ...
On Thu, May 1, 2008 at 9:19 AM, Doug Judd <d
...@zvents.com> wrote:
> Hi Naveen,
> This looks fantastic! I think the best thing to do is to make the size of
> the bloom filter variable depending on the number of keys that it covers.
> You can imagine that an access group that stores columns of 4-byte integers
> are going to have many more keys that an access group that contains crawl
> data, for example. That would optimized storage and allow for a consistent
> error rate.
> You might want to have the bloom filter configurable at the schema level.
> I would opt for having them on by default, but maybe there might be a need
> to disable them in certain scenarios. Maybe an error rate or bits/key
> tuning option might be useful as well.
> - Doug
> On Thu, May 1, 2008 at 2:14 AM, Naveen Koorakula <nave...@gmail.com>
> wrote:
> > Hello Doug, Luke,
> > Just wrote up a design spec for using BloomFilters in CellStores to
> > reduce disk accesses when the key(s) being queried for are specified by the
> > query. Please could you take a look and send me any comments / suggestions ?
> > http://code.google.com/p/hypertable/wiki/BloomFilters
> > One decision I left open is whether the usage of bloom filters should be
> > configurable at a schema level. Any opinions ?
> > Thanks,
> > --Naveen