Basically (in our case), data is stored in numbered files ( numbers
always go forward ), in needles, containing:
Control data & MD5, magic tags
Key, version
Pointer to the next needle on the same bucket: file nbr + needle
offset (reduced to chunks of 256/512/1024 according to your taste, to
make the pointer smaller)
Data
To find data you walk down the chain from the bucket table which only
contains a fileNbr and needle offset, which goes fast thanks to a
cache of needle pointers
To add/delete you find back your way, break the chain and insert the
new data, rebuild the smallest side of the chain
Bucket table commit is the tricky part, because you have to save dirty
writes during commit in a more classic map
That's it.
You can play with needle pointer cache size, full needle cache size,
caching strategy, second level caching on ssd (one of our plans), ...
to take best advantage of the system.
Then you can play by hacking the hashing function of Voldemort to get
related data ( that you mostly need together ) behind the same node,
behind the same diskmap bucket when that's relevant
Concurrent versions could be store on same bucket, in our case
versions known as concurrent throw obsolete at write time to optimize
space ( we have few versionned keys, and many single ones, so we can
manage post read resync )
Cleaning is reading files and rewriting, removing file after
operation. You do it when the cost/benefit is good (=rewrites/dirty)
Francois
On Sep 25, 5:04 am, Vinoth Chandar <
mail.vinoth.chan...@gmail.com>
wrote:
> Sounds very interesting. Actually, the idea of chaining on disk sounds
> familiar. see SkimpyStash <
http://dl.acm.org/citation.cfm?id=1989327%20> .
> >>>>
http://distributeddreams.blogspot.com/2012/09/improving-bdb-je-storag...
>
> >>>> Thanks
> >>>> Vinoth