Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
New store / Build in concurrency resolver
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  8 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Francois  
View profile  
 More options Mar 17 2012, 1:02 pm
From: Francois <francois.vai...@ezcgroup.net>
Date: Sat, 17 Mar 2012 10:02:53 -0700 (PDT)
Local: Sat, Mar 17 2012 1:02 pm
Subject: New store / Build in concurrency resolver
After looking at cache store (a nice implementation, that we like
because hash makes sense), we are building a new type of store called
DiskMap , for  1k to 5k/s Ops/s in write and > 10 k Ops/s in read on
cheap machines with a very low ratio RAM/Disk. The use case is tons of
data, but only 5% live ones.

Our need is not the fastest performance, but a very good, consistent
performance even with 1 to 10 GKeys. This is based on a logged needle
structure (vaguely inspired from haystack)+ a disk mapped hash table
with a dirty cache and an elevator pattern for committing the index.
Unit tests are promising with results on a test workstation with a
very slow disk being  > 10 times BDB. In fact as fast as BDB or cache
store when all index fits in memory, which is impossible in our case
with a target of 160TB drive arrays on a 24GB RAM machine (like the
Backblaze storage pod).
With the hash file residing on a low cost ( < 256GB ) SSD, we hope to
get a crazy write performance.

Our app does not need to keep versions, this is done at a higher level
using different keys.

So to optimize this storage strictly dedicated to Voldemort, would it
create problems to clean versions on the fly during puts ?
- Eliminating obsolete versions at write time, like in the bdb
implementation
- Eliminating concurrent versions based on timestamp, unlike the bdb
implementation

Our goal is to keep the hash chain as short as possible to minimize
disk seeks. Any danger doing this ?

Francois


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois  
View profile  
 More options May 10 2012, 8:42 am
From: Francois <francois.vai...@ezcgroup.net>
Date: Thu, 10 May 2012 05:42:52 -0700 (PDT)
Local: Thurs, May 10 2012 8:42 am
Subject: Re: New store / Build in concurrency resolver
Our diskmap storage demonstrated a 5000r/w ops/s on a single Core VM
running on a host with entry level disks in RAID 1 (so not the optimal
situation).

Unlike BdB, the minimum memory requirement is between 8 and 16 bytes /
key depending on the hashmap loading factor (10 runs OK). The rest is
dedicated to needles and needlepointer cache.

Performance can be better with one small SSD for the bucket table, but
this runs fine on 2 HDD (you just need two different axis).

It has been designed to be modified to be partition aware (sub-
partitionning), and if 100% dedicated to Voldemort.

Use case is good performance for very large data sets with 10% live
data (so that last access based caching is efficient). We need a small
effort to make it generic, but if someone has the same use case feel
free to contact me.

François

On 17 mar, 19:02, Francois <francois.vai...@ezcgroup.net> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Greg Moulliet  
View profile  
 More options May 10 2012, 12:04 pm
From: Greg Moulliet <gr...@conducivetech.com>
Date: Thu, 10 May 2012 09:04:34 -0700
Local: Thurs, May 10 2012 12:04 pm
Subject: Re: [project-voldemort] Re: New store / Build in concurrency resolver
Sounds neat Francois.  Can you share the code via github?

Thanks,

Greg


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois  
View profile  
 More options May 11 2012, 4:35 am
From: Francois <francois.vai...@ezcgroup.net>
Date: Fri, 11 May 2012 01:35:12 -0700 (PDT)
Local: Fri, May 11 2012 4:35 am
Subject: Re: New store / Build in concurrency resolver
Until it's shared, you can directly call me.

On 10 mai, 18:04, Greg Moulliet <gr...@conducivetech.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois  
View profile  
 More options May 12 2012, 3:20 am
From: Francois <francois.vai...@ezcgroup.net>
Date: Sat, 12 May 2012 00:20:49 -0700 (PDT)
Local: Sat, May 12 2012 3:20 am
Subject: Re: New store / Build in concurrency resolver
BTW we're walking away from the Storage Pod model, and go for blades
powered by ATOM processors. With the low cost of ATOM motherboards and
2 X 3 or 4GB disks, the motherboard/box cost overhead is not much and
the form factor us smaller (better for maintenance, + no RAID to
setup). And 4GB RAM is more than enough to drive 8TB storage (at least
with our access pattern).

It's also a good candidate for large blob storage slitted in chunks:
by modifying the Voldemort hashing function to introduce a sub key
which is transparent for the ring and for the DiskMap bucket, you end
up chaining chunks on the same node which are contiguous on disk.

Ultimate goal would be with a sequencial number per sub-partition (=
one disk) to facilitate automatic single disk rebuild without Merkle
trees, but we're not yet there. This would allow to increase the
number of disks per CPU without any RAID overhead (because RAID0 would
generate too large disks sets, with an unmanageable  rebuild time).

We're running a bit after time for and that's not yet on GitHub, but
we're looking for partners with similar needs to further developp and
optimize this cost driven concept as a sub-project.

If some are interested with also have a retry manager, which replaces
the updateAction loop on obsolete writes by adding a increasing and
random wait so that processes stop fighting in case of high
concurrency: this is a must if you have some long transactions (e.g.:
20-30ms). Same principle here as CSMA/CD network retries.

On 11 mai, 10:35, Francois <francois.vai...@ezcgroup.net> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois  
View profile  
 More options May 12 2012, 3:23 am
From: Francois <francois.vai...@ezcgroup.net>
Date: Sat, 12 May 2012 00:23:08 -0700 (PDT)
Local: Sat, May 12 2012 3:23 am
Subject: Re: New store / Build in concurrency resolver
And sorry for typos, I didn't find the message edit function !

On 12 mai, 09:20, Francois <francois.vai...@ezcgroup.net> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeff Clites  
View profile  
 More options May 12 2012, 4:50 pm
From: Jeff Clites <jeffs...@gmail.com>
Date: Sat, 12 May 2012 13:50:12 -0700
Local: Sat, May 12 2012 4:50 pm
Subject: Re: [project-voldemort] Re: New store / Build in concurrency resolver
To your original question about resolving concurrent versions at put
time: I think that part of the reason Voldemort doesn't currently do
this is to allow custom conflict-resolution strategies which operate
client-side, where there may be more information available. (For
instance, a custom conflict resolver might consult some other data
source or use business-logic code not conveniently available
server-side.)

So I think that's the point of the current implementation. But, I
don't know how common it is to actually take advantage of that;
probably most people just use the default conflict resolver and fall
back to timestamps as you are planning to do, so it may not matter.

On the other hand, if your implementation is able to support
duplicates and you are just trying to minimize the data stored, then
you may want to go ahead and keep the concurrent versions as is done
today, on the theory that it's rare to actually have such concurrent
writes, so it would amount to only a tiny amount of data anyway. (Not
sure if that's actually true, but it seems likely.)

JEff


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Francois  
View profile  
 More options May 13 2012, 3:14 am
From: Francois <francois.vai...@ezcgroup.net>
Date: Sun, 13 May 2012 00:14:43 -0700 (PDT)
Local: Sun, May 13 2012 3:14 am
Subject: Re: New store / Build in concurrency resolver
@Jeff: we made the same conclusion and  finally did a timestamp based
conflict resolution directly in DiskMap, because in our use case this
is fine. We could keep duplicates, but this would have a negative
impact on performance ( we chain N needles behind the same hash
bucket, so allowing duplicates means you can't stop walking the chain
at the first one, and even with chains of 1 to 4 needles there's a
performance impact when you try to kill disk seeks ). That's what I
meant about "we need to make it more generic". Today this storage is
really designed for our use case, and performance ( to be seen as a
cost factor ) and ease of operations ( log forward structure ) where
the main criteria. If fact in case we can't resolve the conflict, an
exception is thrown and the client deals with it.

Francois

On 12 mai, 22:50, Jeff Clites <jeffs...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions Older topic »