Hi Naveen,
This looks fantastic! I think the best thing to do is to make the size of the bloom filter variable depending on the number of keys that it covers. You can imagine that an access group that stores columns of 4-byte integers are going to have many more keys that an access group that contains crawl data, for example. That would optimized storage and allow for a consistent error rate.
You might want to have the bloom filter configurable at the schema level. I would opt for having them on by default, but maybe there might be a need to disable them in certain scenarios. Maybe an error rate or bits/key tuning option might be useful as well.
- Doug