The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Newsgroups: comp.lang.python
From: Steve Howell <showel...@yahoo.com>
Date: Wed, 2 May 2012 21:08:35 -0700 (PDT)
Local: Thurs, May 3 2012 12:08 am
Subject: Re: key/value store optimized for disk storage
On May 2, 8:29 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Steve Howell <showel...@yahoo.com> writes:
Doesn't cdb do at least one disk seek as well? In the diagram on this
> > Thanks. That's definitely in the spirit of what I'm looking for, > > although the non-64 bit version is obviously geared toward a slightly > > smaller data set. My reading of cdb is that it has essentially 64k > > hash buckets, so for 3 million keys, you're still scanning through an > > average of 45 records per read, which is about 90k of data for my > > record size. That seems actually inferior to a btree-based file > > system, unless I'm missing something. > 1) presumably you can use more buckets in a 64 bit version; 2) scanning
page, it seems you would need to do a seek based on the value of the initial pointer (from the 256 possible values): > >http://thomas.mangin.com/data/source/cdb.py
Yup, I don't think I want to incur the extra overhead. Do you have
> > Unfortunately, it looks like you have to first build the whole thing > > in memory. > It's probably fixable, but I'd guess you could just use Bernstein's
> Alternatively maybe you could use one of the *dbm libraries,
any first hand experience pushing dbm to the scale of 6Gb or so? My take on dbm is that its niche is more in the 10,000-record range. You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
| ||||||||||||||