On May 2, 8:29 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Steve Howell <showel...@yahoo.com> writes:Doesn't cdb do at least one disk seek as well? In the diagram on this
> > Thanks. That's definitely in the spirit of what I'm looking for,
> > although the non-64 bit version is obviously geared toward a slightly
> > smaller data set. My reading of cdb is that it has essentially 64k
> > hash buckets, so for 3 million keys, you're still scanning through an
> > average of 45 records per read, which is about 90k of data for my
> > record size. That seems actually inferior to a btree-based file
> > system, unless I'm missing something.
> 1) presumably you can use more buckets in a 64 bit version; 2) scanning
page, it seems you would need to do a seek based on the value of the
initial pointer (from the 256 possible values):
> >http://thomas.mangin.com/data/source/cdb.pyYup, I don't think I want to incur the extra overhead. Do you have
> > Unfortunately, it looks like you have to first build the whole thing
> > in memory.
> It's probably fixable, but I'd guess you could just use Bernstein's
> Alternatively maybe you could use one of the *dbm libraries,
any first hand experience pushing dbm to the scale of 6Gb or so? My
take on dbm is that its niche is more in the 10,000-record range.
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.