Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Pull or push? (was RE: [project-voldemort] Hinted Handoff proposal, initial implementation and testplan)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Alex Feinberg  
View profile  
 More options Aug 10 2010, 4:01 pm
From: Alex Feinberg <feinb...@gmail.com>
Date: Tue, 10 Aug 2010 13:01:00 -0700
Subject: Re: Pull or push? (was RE: [project-voldemort] Hinted Handoff proposal, initial implementation and testplan)
Hi Mark,

> What are the objectives of this HH implementation?  It appears as if
> the main goal is to provide a background mechanism of synchronization
> without requiring a read repair.  Are there other goals this
> implementation hopes to accomplish?
> These two statements are true today without the addition of new code
> for HH, so I am unclear as to what HH made better.  The situation that
> exists today still exists after this implementation -- a PUT operation
> returns an error that states it failed when in fact it may have
> succeeded without the application being able to tell what the state of
> the request is.

This is correct, the main goal is to provide a way to do
synchronization without a read repair. A read repair requires a quorum
read, which is going to be expensive in a multi-datacenter scenario.
The situation where the network is partitioned off and one partition
is behind another is also much more likely to happen in a
multi-datacenter environment.

What this also accomplishes is when a required-writes succeeds for W
out of N nodes (return success to the application), but some of the
background asynchronous writes fail the put requests for those nodes
are queued up and replayed when the nodes are back online.

> From the wiki, it says that "if required-writes aren’t met by a strict
> quorum, the request is still considered failed (even if hinted handoff
> succeeds".  This statement appears to state that:
>  - If too few copies were written from the primaries, PUT will return
> an error even though the data was written and replicated to the slop
> stores;
>  - Since the handoff nodes are not queried, a GET is unreliable after
> a partial PUT operation.
> I am not sure that HH works for DELETE operations.  Imagine a scenario
> with 3 replica/2 required writes store.  The DELETE operation
> completes successfully on nodes A and B but not on node C, meaning the
> key still exists there but is now marked as DELETED in a slop store.
> What prevents that key from being read-repaired from C to A and B
> before the Slop Store kicks in to update C?

A delete is done with a version. This way, the delete will eventually
be processed. That being said, in a fully distributed system like
Voldemort deletes are "best-effort", meant only for space reclamation.
Applications should instead use soft deletes, by marking the data as
deleted e.g., a boolean flag in the definition of a value.

> How does this HH implementation work with rebalancing?  Are the Slop
> Stores updated with the new node to rebalance to?

I believe this should be handled by the existing metadata invalidation
mechanism, if it isn't, it shouldn't be difficult to fix. I'll make
sure to test this, although I am not sure if that will be a common
scenario (adding nodes to one side of a long standing partition).

> I believe to make an effective HH implementation, nodes must be able
> to determine who their handoff replicas are.  There needs to be
> periodic chatter (at both startup and run-time) between the nodes and
> their HH partners to determine what keys and versions the partner is
> storing for the given node.  If a node determines that it is out-of-
> date for a given key through this chatter, it does not respond
> directly to the request but instead proxies the request from its HH
> partner.

Problem with assigning peer replicas for a node is that this may lead
to cascading failures / degradation when the original node fails. I do
like the idea of chatter i.e., having nodes periodically pull and find
*who* their partners are and then requests the versions from them
rather than wait for a push.

Thanks,
- Alex


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.