Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion Do deletes use tombstones? (was RE: Pull or push? (was RE: [project-voldemort] Hinted Handoff proposal, initial implementation and testplan))

Received: by 10.231.17.193 with SMTP id t1mr2496148iba.14.1281485234760;
        Tue, 10 Aug 2010 17:07:14 -0700 (PDT)
X-BeenThere: project-voldemort@googlegroups.com
Received: by 10.231.125.77 with SMTP id x13ls2728670ibr.1.p; Tue, 10 Aug 2010 
	17:07:13 -0700 (PDT)
Received: by 10.231.161.78 with SMTP id q14mr2585911ibx.10.1281485233902;
        Tue, 10 Aug 2010 17:07:13 -0700 (PDT)
Received: by 10.231.161.78 with SMTP id q14mr2585910ibx.10.1281485233874;
        Tue, 10 Aug 2010 17:07:13 -0700 (PDT)
Return-Path: <tim.free...@hp.com>
Received: from g6t0186.atlanta.hp.com (g6t0186.atlanta.hp.com [15.193.32.63])
        by gmr-mx.google.com with ESMTP id cn5si4322984ibb.6.2010.08.10.17.07.13;
        Tue, 10 Aug 2010 17:07:13 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of tim.free...@hp.com designates 15.193.32.63 as permitted sender) client-ip=15.193.32.63;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of tim.free...@hp.com designates 15.193.32.63 as permitted sender) smtp.mail=tim.free...@hp.com
Received: from G3W0631.americas.hpqcorp.net (g3w0631.americas.hpqcorp.net [16.233.59.15])
	(using TLSv1 with cipher RC4-MD5 (128/128 bits))
	(No client certificate requested)
	by g6t0186.atlanta.hp.com (Postfix) with ESMTPS id 234412D069
	for <project-voldemort@googlegroups.com>; Wed, 11 Aug 2010 00:07:13 +0000 (UTC)
Received: from G3W0058.americas.hpqcorp.net (16.232.1.3) by
 G3W0631.americas.hpqcorp.net (16.233.59.15) with Microsoft SMTP Server (TLS)
 id 8.2.176.0; Wed, 11 Aug 2010 00:04:09 +0000
Received: from GVW0432EXB.americas.hpqcorp.net ([16.234.32.145]) by
 G3W0058.americas.hpqcorp.net ([16.232.1.3]) with mapi; Wed, 11 Aug 2010
 00:04:09 +0000
From: "Freeman, Tim" <tim.free...@hp.com>
To: "project-voldemort@googlegroups.com" <project-voldemort@googlegroups.com>
Date: Wed, 11 Aug 2010 00:04:08 +0000
Subject: Do deletes use tombstones? (was RE: Pull or push? (was RE:
 [project-voldemort] Hinted Handoff 	proposal, initial implementation and
 testplan))
Thread-Topic: Do deletes use tombstones? (was RE: Pull or push? (was RE:
 [project-voldemort] Hinted Handoff 	proposal, initial implementation and
 testplan))
Thread-Index: Acs4xs6cgyetxRuWRYW1wfgVChEy0wAH+Fwg
Message-ID: <59DD1BA8FD3C0F4C90771C18F2B5B53A63AA475...@GVW0432EXB.americas.hpqcorp.net>
References: <AANLkTiniWMiMK_jTxbo5RBs8ru8kHOVExSLwJhWcB...@mail.gmail.com>
	<59DD1BA8FD3C0F4C90771C18F2B5B53A63A761C...@GVW0432EXB.americas.hpqcorp.net>
	<AANLkTikn=M8JFspxSGpMOF2R_GGZowAu8pHNMbSA7...@mail.gmail.com>
	<bab79fc9-a98d-4a64-8ddf-24ee12ffe...@l32g2000prn.googlegroups.com>
 <AANLkTikFhqxfYpkmJwWc61pi1GL4-id=GdtD7WGE+...@mail.gmail.com>
In-Reply-To: <AANLkTikFhqxfYpkmJwWc61pi1GL4-id=GdtD7WGE+...@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
acceptlanguage: en-US
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

From: Alex Feinberg
>A delete is done with a version. This way, the delete will eventually
>be processed. That being said, in a fully distributed system like
>Voldemort deletes are "best-effort", meant only for space reclamation.
>Applications should instead use soft deletes, by marking the data as
>deleted e.g., a boolean flag in the definition of a value.

The javadoc at http://project-voldemort.com/javadoc/all/voldemort/client/St=
oreClient.html#delete%28K%29 says that delete will "Delete any version of t=
he given key which [is] equal to or less than the current versions", which =
is different from and more useful for the application than what you said ab=
ove.

You could implement "delete" to do what the javadoc says.  Voldemort could =
manage the boolean flag for the application by implementing delete so it wr=
ites a tombstone.  Then "delete" would then really delete the record from t=
he application's point of view, and then some batch process comes by occasi=
onally and removes tombstones from the index if they are old enough that th=
ey should have propagated to all of the nodes.

I don't immediately see a good way to get reliable results where delete mak=
es a best-effort attempt to immediately reclaim all storage for the record,=
 which is what I think you're saying.  What if someone reads the record nea=
r the time it's deleted?  Suppose N=3D3, R=3DW=3D2, and deletion has delete=
d one of three copies at the time we read.  Is read going to see that one c=
opy is missing and "repair" it, thus leaving it around forever when the del=
etion completes?  If not, how is this avoided?

Tim Freeman
Email: tim.free...@hp.com
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536


-----Original Message-----
From: project-voldemort@googlegroups.com [mailto:project-voldemort@googlegr=
oups.com] On Behalf Of Alex Feinberg
Sent: Tuesday, August 10, 2010 1:01 PM
To: project-voldemort@googlegroups.com
Subject: Re: Pull or push? (was RE: [project-voldemort] Hinted Handoff prop=
osal, initial implementation and testplan)

Hi Mark,

> What are the objectives of this HH implementation? =A0It appears as if
> the main goal is to provide a background mechanism of synchronization
> without requiring a read repair. =A0Are there other goals this
> implementation hopes to accomplish?

> These two statements are true today without the addition of new code
> for HH, so I am unclear as to what HH made better.  The situation that
> exists today still exists after this implementation -- a PUT operation
> returns an error that states it failed when in fact it may have
> succeeded without the application being able to tell what the state of
> the request is.

This is correct, the main goal is to provide a way to do
synchronization without a read repair. A read repair requires a quorum
read, which is going to be expensive in a multi-datacenter scenario.
The situation where the network is partitioned off and one partition
is behind another is also much more likely to happen in a
multi-datacenter environment.

What this also accomplishes is when a required-writes succeeds for W
out of N nodes (return success to the application), but some of the
background asynchronous writes fail the put requests for those nodes
are queued up and replayed when the nodes are back online.

> From the wiki, it says that "if required-writes aren't met by a strict
> quorum, the request is still considered failed (even if hinted handoff
> succeeds". =A0This statement appears to state that:
> =A0- If too few copies were written from the primaries, PUT will return
> an error even though the data was written and replicated to the slop
> stores;
> =A0- Since the handoff nodes are not queried, a GET is unreliable after
> a partial PUT operation.
> I am not sure that HH works for DELETE operations. =A0Imagine a scenario
> with 3 replica/2 required writes store. =A0The DELETE operation
> completes successfully on nodes A and B but not on node C, meaning the
> key still exists there but is now marked as DELETED in a slop store.
> What prevents that key from being read-repaired from C to A and B
> before the Slop Store kicks in to update C?

A delete is done with a version. This way, the delete will eventually
be processed. That being said, in a fully distributed system like
Voldemort deletes are "best-effort", meant only for space reclamation.
Applications should instead use soft deletes, by marking the data as
deleted e.g., a boolean flag in the definition of a value.

> How does this HH implementation work with rebalancing? =A0Are the Slop
> Stores updated with the new node to rebalance to?

I believe this should be handled by the existing metadata invalidation
mechanism, if it isn't, it shouldn't be difficult to fix. I'll make
sure to test this, although I am not sure if that will be a common
scenario (adding nodes to one side of a long standing partition).
> I believe to make an effective HH implementation, nodes must be able
> to determine who their handoff replicas are. =A0There needs to be
> periodic chatter (at both startup and run-time) between the nodes and
> their HH partners to determine what keys and versions the partner is
> storing for the given node. =A0If a node determines that it is out-of-
> date for a given key through this chatter, it does not respond
> directly to the request but instead proxies the request from its HH
> partner.

Problem with assigning peer replicas for a node is that this may lead
to cascading failures / degradation when the original node fails. I do
like the idea of chatter i.e., having nodes periodically pull and find
*who* their partners are and then requests the versions from them
rather than wait for a push.

Thanks,
- Alex

--=20
You received this message because you are subscribed to the Google Groups "=
project-voldemort" group.
To post to this group, send email to project-voldemort@googlegroups.com.
To unsubscribe from this group, send email to project-voldemort+unsubscribe=
@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/projec=
t-voldemort?hl=3Den.