Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Blog post detailing efforts on improving BDB JE for Voldemort
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  9 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Vinoth Chandar  
View profile  
 More options Sep 14 2012, 10:28 pm
From: Vinoth Chandar <mail.vinoth.chan...@gmail.com>
Date: Fri, 14 Sep 2012 19:28:57 -0700 (PDT)
Local: Fri, Sep 14 2012 10:28 pm
Subject: Blog post detailing efforts on improving BDB JE for Voldemort

I have tried to cover all the issues here in detail

http://distributeddreams.blogspot.com/2012/09/improving-bdb-je-storag...

Thanks
Vinoth


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois  
View profile  
 More options Sep 15 2012, 3:32 am
From: Francois <francois.vai...@ezcgroup.net>
Date: Sat, 15 Sep 2012 00:32:39 -0700 (PDT)
Local: Sat, Sep 15 2012 3:32 am
Subject: Re: Blog post detailing efforts on improving BDB JE for Voldemort

Hello Vinoth, as I said diskmap is a bit tweaked to our needs, but that's
not so much work to make it generic. Maybe it can be a starting point for a
generic hash based storage, purely KV ? It implements a log forward needle
structure with cleaning process, a chained bucket table, separate chaining
and needle cache, and bucket commit on the fly (slow commit) with a shuttle
mechanism. We are not so much used to Github (sorry svn fans) but if you
want to give it a try, we can organize a quick phone + zipped mail session

François

Le samedi 15 septembre 2012 04:28:57 UTC+2, Vinoth Chandar a écrit :


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vinoth Chandar  
View profile  
 More options Sep 17 2012, 1:17 pm
From: Vinoth Chandar <mail.vinoth.chan...@gmail.com>
Date: Mon, 17 Sep 2012 10:17:50 -0700 (PDT)
Local: Mon, Sep 17 2012 1:17 pm
Subject: Re: Blog post detailing efforts on improving BDB JE for Voldemort

Hi Francois,

It will be great if you send some pointers to architecture documents or
presentations. Is this cachestore http://code.google.com/p/cachestore/ ?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois  
View profile  
 More options Sep 21 2012, 8:23 am
From: Francois <francois.vai...@ezcgroup.net>
Date: Fri, 21 Sep 2012 05:23:06 -0700 (PDT)
Local: Fri, Sep 21 2012 8:23 am
Subject: Re: Blog post detailing efforts on improving BDB JE for Voldemort

No this is different. Cachestore uses a hashmap, we use a hash function and
a bucket table, chaining records on disk.

So the memory footprint is lower for large number of keys: we couldn't live
with bdb nor cachestore for RAM reasons.

The bucket table is committed from time to start, but a cold start can be
done by rolling back all log files (data is log forward only). We also
implement a cost benefit cleaning on log files. Needles behind the same
bucket are simply chained on disk, pointer caching allowing faster access
on recent ones.

3 basic type of objects: the bucket table, needle files, and needle
pointers which are the needles without data. needles and needle pointers
are cached with Guava ( typically you cache as much pointers as you can,
and few needles because of the linux buffers ).
bucket table is backed up by a concurrent hashmap just for the time of the
commit by a shuttle that walks down the bucket table in two passes ( which
can be very slow on a hard drive for 4Gkeys, but is fast on SSD ), when the
bucket table is partially dirty.

No doc so far (there will a post on our blog when we find the time), but
basically this is it.

Francois

Am Montag, 17. September 2012 19:17:50 UTC+2 schrieb Vinoth Chandar:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vinoth Chandar  
View profile  
 More options Sep 24 2012, 11:04 pm
From: Vinoth Chandar <mail.vinoth.chan...@gmail.com>
Date: Mon, 24 Sep 2012 20:04:33 -0700 (PDT)
Local: Mon, Sep 24 2012 11:04 pm
Subject: Re: Blog post detailing efforts on improving BDB JE for Voldemort

Sounds very interesting. Actually, the idea of chaining on disk sounds
familiar. see SkimpyStash <http://dl.acm.org/citation.cfm?id=1989327%20> .
I did not get a clear picture about the disk layout though.
Looking forward to your blog!

Thanks
Vinoth


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois  
View profile  
 More options Sep 27 2012, 8:10 am
From: Francois <francois.vai...@ezcgroup.net>
Date: Thu, 27 Sep 2012 05:10:07 -0700 (PDT)
Local: Thurs, Sep 27 2012 8:10 am
Subject: Re: Blog post detailing efforts on improving BDB JE for Voldemort
Basically (in our case), data is stored in numbered files ( numbers
always go forward ), in needles, containing:

Control data & MD5, magic tags

Key, version
Pointer to the next needle on the same bucket: file nbr + needle
offset (reduced to chunks of 256/512/1024 according to your taste, to
make the pointer smaller)
Data

To find data you walk down the chain from the bucket table which only
contains a fileNbr and needle offset, which goes fast thanks to a
cache of needle pointers

To add/delete you find back your way, break the chain and insert the
new data, rebuild the smallest side of the chain

Bucket table commit is the tricky part, because you have to save dirty
writes during commit in a more classic map

That's it.

You can play with needle pointer cache size, full needle cache size,
caching strategy, second level caching on ssd (one of our plans), ...
to take best advantage of the system.
Then you can play by hacking the hashing function of Voldemort to get
related data ( that you mostly need together ) behind the same node,
behind the same diskmap bucket when that's relevant

Concurrent versions could be store on same bucket, in our case
versions known as concurrent throw obsolete at write time to optimize
space ( we have few versionned keys, and many single ones, so we can
manage post read resync )

Cleaning is reading files and rewriting, removing file after
operation. You do it when the cost/benefit is good (=rewrites/dirty)

Francois

On Sep 25, 5:04 am, Vinoth Chandar <mail.vinoth.chan...@gmail.com>
wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vinoth Chandar  
View profile  
 More options Oct 8 2012, 3:43 pm
From: Vinoth Chandar <mail.vinoth.chan...@gmail.com>
Date: Mon, 8 Oct 2012 12:43:32 -0700 (PDT)
Local: Mon, Oct 8 2012 3:43 pm
Subject: Re: Blog post detailing efforts on improving BDB JE for Voldemort

I am still not quite grasping the needle part of it.  By needle, do you
just mean a pointer to the other keys-value blocks that hashed to the same
hash bucket?

I guess I will wait for your blog to be out and ask more questions. Seems
easier than this.

Thanks
Vinoth


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois  
View profile  
 More options Oct 12 2012, 4:13 am
From: Francois <francois.vai...@ezcgroup.net>
Date: Fri, 12 Oct 2012 01:13:16 -0700 (PDT)
Local: Fri, Oct 12 2012 4:13 am
Subject: Re: Blog post detailing efforts on improving BDB JE for Voldemort
Yes, Needle = block of data, with key, vector clock, data, md5, magic
numbers, and linked to the previous KV blocked that hashed to the same
bucket. Hard to find the time to write a decent article (and we'll
have other to write first to support the launch of the platform which
goes way beyond just file sharing), so feel free to mail or phone if
you need more info, we don't even have time to push on Git!

On 8 oct, 21:43, Vinoth Chandar <mail.vinoth.chan...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vinoth Chandar  
View profile  
 More options Oct 16 2012, 2:27 pm
From: Vinoth Chandar <mail.vinoth.chan...@gmail.com>
Date: Tue, 16 Oct 2012 11:27:49 -0700 (PDT)
Local: Tues, Oct 16 2012 2:27 pm
Subject: Re: Blog post detailing efforts on improving BDB JE for Voldemort

I will be on vacation for a while. Will sync up once back. Thanks!


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »