Message from discussion
Pull or push? (was RE: [project-voldemort] Hinted Handoff proposal, initial implementation and testplan)
Received: by 10.231.161.82 with SMTP id q18mr999737ibx.11.1280767300647;
Mon, 02 Aug 2010 09:41:40 -0700 (PDT)
X-BeenThere: project-voldemort@googlegroups.com
Received: by 10.231.193.98 with SMTP id dt34ls476446ibb.0.p; Mon, 02 Aug 2010
09:41:38 -0700 (PDT)
Received: by 10.231.149.83 with SMTP id s19mr988308ibv.2.1280767298661;
Mon, 02 Aug 2010 09:41:38 -0700 (PDT)
Received: by 10.231.149.83 with SMTP id s19mr988307ibv.2.1280767298578;
Mon, 02 Aug 2010 09:41:38 -0700 (PDT)
Return-Path: <tim.free...@hp.com>
Received: from g5t0009.atlanta.hp.com (g5t0009.atlanta.hp.com [15.192.0.46])
by gmr-mx.google.com with ESMTP id cn5si3765835ibb.6.2010.08.02.09.41.38;
Mon, 02 Aug 2010 09:41:38 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of tim.free...@hp.com designates 15.192.0.46 as permitted sender) client-ip=15.192.0.46;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: best guess record for domain of tim.free...@hp.com designates 15.192.0.46 as permitted sender) smtp.mail=tim.free...@hp.com
Received: from G1W0401.americas.hpqcorp.net (g1w0401.americas.hpqcorp.net [16.236.31.6])
(using TLSv1 with cipher RC4-MD5 (128/128 bits))
(No client certificate requested)
by g5t0009.atlanta.hp.com (Postfix) with ESMTPS id 0E06B30423
for <project-voldemort@googlegroups.com>; Mon, 2 Aug 2010 16:41:37 +0000 (UTC)
Received: from G4W1853.americas.hpqcorp.net (16.234.97.231) by
G1W0401.americas.hpqcorp.net (16.236.31.6) with Microsoft SMTP Server (TLS)
id 8.2.176.0; Mon, 2 Aug 2010 16:41:00 +0000
Received: from GVW0432EXB.americas.hpqcorp.net ([16.234.32.146]) by
G4W1853.americas.hpqcorp.net ([16.234.97.231]) with mapi; Mon, 2 Aug 2010
16:41:01 +0000
From: "Freeman, Tim" <tim.free...@hp.com>
To: "project-voldemort@googlegroups.com" <project-voldemort@googlegroups.com>
Date: Mon, 2 Aug 2010 16:40:59 +0000
Subject: RE: Pull or push? (was RE: [project-voldemort] Hinted Handoff
proposal, initial implementation and testplan)
Thread-Topic: Pull or push? (was RE: [project-voldemort] Hinted Handoff
proposal, initial implementation and testplan)
Thread-Index: AcswytWyggo324IOTR6660Z/dEOV4wBk2mqw
Message-ID: <59DD1BA8FD3C0F4C90771C18F2B5B53A63A761C...@GVW0432EXB.americas.hpqcorp.net>
References: <AANLkTiniWMiMK_jTxbo5RBs8ru8kHOVExSLwJhWcB...@mail.gmail.com>
<59DD1BA8FD3C0F4C90771C18F2B5B53A63A761C...@GVW0432EXB.americas.hpqcorp.net>
<AANLkTikn=M8JFspxSGpMOF2R_GGZowAu8pHNMbSA7...@mail.gmail.com>
In-Reply-To: <AANLkTikn=M8JFspxSGpMOF2R_GGZowAu8pHNMbSA7...@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: multipart/alternative;
boundary="_000_59DD1BA8FD3C0F4C90771C18F2B5B53A63A761C575GVW0432EXBame_"
MIME-Version: 1.0
--_000_59DD1BA8FD3C0F4C90771C18F2B5B53A63A761C575GVW0432EXBame_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
>Perhaps measuring the response time on a push and using that as an indicat=
or. It's really cheap and tune-able.
How exactly would we tune this?
The consequences of tuning it wrong seem bad. If nodes are too willing to =
postpone pushing because of a slow response time from the recipient, then t=
he push never happens. If they are not willing enough to postpone pushing =
because of a slow response time from the recipient, then the recipient is u=
nusable. The right tuning parameters would seem to depend on the cluster s=
ize, since if the cluster is really big, a recently rebooted recipient migh=
t have crashed by the time the sender's experiment to determine the respons=
e time has completed, since all the other nodes are doing the same experime=
nt at the same time.
Not knowing how to tune the parameter is exactly why I was suggesting a pul=
l model.
Hmm. The problem with the pull model is the n**2 behavior, and that seems =
fixable. Suppose the node containing a partition P is down. The proposal=
at
http://wiki.github.com/voldemort/voldemort/hinted-handoff
says to use a random node to store slop when we observe that a node is dow=
n. Suppose that instead, we rehash P a bunch of times, getting a sequence =
of partition numbers P1, P2, P3, and so forth. Truncate the list after som=
e reasonable number of probes T, which might be 10 or so. We try storing t=
he slop on the node containing P1, and if that node is down we try the node=
containing P2, and continue on down the list until we succeed or we get to=
the end. When we move partitions between nodes during rebalancing, we mov=
e the associated pending slop with them. When a node comes back up, it can=
enumerate its partitions and then for each partition derive the same list =
and query the nodes with those partitions for slop. This way, if our nodes=
don't have a model of network partitions, we don't have N nodes looking at=
N nodes, we just have N nodes looking at R*T nodes, where R is the number =
of partitions per node and R<<N and T is around 10.
Tim Freeman
Email: tim.free...@hp.com<mailto:tim.free...@hp.com>
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536
From: project-voldemort@googlegroups.com [mailto:project-voldemort@googlegr=
oups.com] On Behalf Of spence
Sent: Saturday, July 31, 2010 8:59 AM
To: project-voldemort@googlegroups.com
Subject: Re: Pull or push? (was RE: [project-voldemort] Hinted Handoff prop=
osal, initial implementation and testplan)
Perhaps measuring the response time on a push and using that as an indicato=
r. It's really cheap and tune-able.
On Jul 30, 2010 7:42 PM, "Freeman, Tim" <tim.free...@hp.com<mailto:tim.free=
m...@hp.com>> wrote:
I assume one issue is ensuring that the broken node doesn't get flooded rec=
eiving hinted handoffs when it comes back up.
To deal with that, you could have the newly recovered node request data fro=
m the slop store of each node that might have some data, rather than having=
all nodes with potentially useful data trying to push it to the newly reco=
vered node.
If we assume that nodes never have sure knowledge of a network partition, t=
his approach would imply that every node periodically asks every other node=
whether there's data for it in the slop store. In the normal case, there'=
s no data there and this ought to be quick, so maybe the n**2 traffic is ok=
ay. Or maybe we do have a good model of when network partitions start and =
end so the n**2 behavior is not required. I haven't dug through the code e=
nough to know.
Tim Freeman
Email: tim.free...@hp.com<mailto:tim.free...@hp.com>
Desk in Palo Alto: (650) 857-2581
Home: (408) 774-1298
Cell: (408) 348-7536
-----Original Message-----
From: project-voldemort@googlegroups.com<mailto:project-voldemort@googlegro=
ups.com> [mailto:project-voldemort@googlegroups.com<mailto:project-voldemor=
t@googlegroups.com>] On Behalf Of Alex Feinberg
Sent: Thursday, July 29, 2010 2:53 PM
To: project-voldemort@googlegroups.com<mailto:project-voldemort@googlegroup=
s.com>
Subject: [project-voldemort] Hinted Handoff proposal, initial implementatio=
n and testplan
Hey guys,
I have a working initial implementation of Hinted Handoff. It is
presently undergoing testing. I wrote up a wiki on the implementation
and the test plan, please take a look.
http://wiki.github.com/voldemort/voldemort/hinted-handoff
Thanks,
- Alex
--
You received this message because you are subscribed to the Google Groups "=
project-voldemort" group.
To post to this group, send email to project-voldemort@googlegroups.com<mai=
lto:project-voldemort@googlegroups.com>.
To unsubscribe from this group, send email to project-voldemort+unsubscribe=
@googlegroups.com<mailto:project-voldemort%2Bunsubscribe@googlegroups.com>.
For more options, visit this group at http://groups.google.com/group/projec=
t-voldemort?hl=3Den.
--
You received this message because you are subscribed to the Google Groups "=
project-voldemort" group.
To post to this group, send email to project-voldemort@googlegroups.com<mai=
lto:project-voldemort@googlegroups.com>.
To unsubscribe from this group, send email to project-voldemort+unsubscribe=
@googlegroups.com<mailto:project-voldemort%2Bunsubscribe@googlegroups.com>.
For more options, visit this group at http://groups.google.com/group/projec=
t-voldemort?hl=3Den.
--
You received this message because you are subscribed to the Google Groups "=
project-voldemort" group.
To post to this group, send email to project-voldemort@googlegroups.com.
To unsubscribe from this group, send email to project-voldemort+unsubscribe=
@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/projec=
t-voldemort?hl=3Den.
--_000_59DD1BA8FD3C0F4C90771C18F2B5B53A63A761C575GVW0432EXBame_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3DContent-Type content=3D"text/html; charset=3Dus-ascii">
<meta name=3DGenerator content=3D"Microsoft Word 12 (filtered medium)">
<style>
<!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p
{mso-style-priority:99;
mso-margin-top-alt:auto;
margin-right:0in;
mso-margin-bottom-alt:auto;
margin-left:0in;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3DEN-US link=3Dblue vlink=3Dpurple>
<div class=3DWordSection1>
<p>>Perhaps measuring the response time on a push and using that as an
indicator. It's really cheap and tune-able.<o:p></o:p></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'>How exactly would we tune this?<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'>The consequences of tuning it wrong seem bad. If nodes=
are
too willing to postpone pushing because of a slow response time from the
recipient, then the push never happens. If they are not willing enoug=
h to
postpone pushing because of a slow response time from the recipient, then t=
he
recipient is unusable. The right tuning parameters would seem to depe=
nd
on the cluster size, since if the cluster is really big, a recently reboote=
d
recipient might have crashed by the time the sender's experiment to determi=
ne
the response time has completed, since all the other nodes are doing the sa=
me
experiment at the same time.<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'>Not knowing how to tune the parameter is exactly why I was
suggesting a pull model.<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'>Hmm. The problem with the pull model is the n**2 behav=
ior,
and that seems fixable. Suppose the node containing a partition=
P
is down. The proposal at <o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal> <a
href=3D"http://wiki.github.com/voldemort/voldemort/hinted-handoff" target=
=3D"_blank">http://wiki.github.com/voldemort/voldemort/hinted-handoff</a><b=
r>
<br>
<span style=3D'font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1=
F497D'><o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'>says to use a random node to store slop when we observ=
e
that a node is down. Suppose that instead, we rehash P a bunch of tim=
es,
getting a sequence of partition numbers P1, P2, P3, and so forth. Tru=
ncate
the list after some reasonable number of probes T, which might be 10 or so.=
We try storing the slop on the node containing P1, and if that node is down=
we
try the node containing P2, and continue on down the list until we succeed =
or we
get to the end. When we move partitions between nodes during rebalanc=
ing,
we move the associated pending slop with them. When a node comes back=
up,
it can enumerate its partitions and then for each partition derive the same
list and query the nodes with those partitions for slop. This way, if=
our
nodes don't have a model of network partitions, we don't have N nodes looki=
ng
at N nodes, we just have N nodes looking at R*T nodes, where R is the numbe=
r of
partitions per node and R<<N and T is around 10.<o:p></o:p></span></p=
>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:10.0pt;color:#1F497D'>Tim Fre=
eman<br>
Email: <a href=3D"mailto:tim.free...@hp.com">tim.free...@hp.com</a><br>
Desk in Palo Alto: (650) 857-2581<br>
Home: (408) 774-1298<br>
Cell: (408) 348-7536<o:p></o:p></span></p>
<p class=3DMsoNormal><span style=3D'font-size:11.0pt;font-family:"Calibri",=
"sans-serif";
color:#1F497D'><o:p> </o:p></span></p>
<div style=3D'border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in =
0in 0in'>
<p class=3DMsoNormal><b><span style=3D'font-size:10.0pt;font-family:"Tahoma=
","sans-serif"'>From:</span></b><span
style=3D'font-size:10.0pt;font-family:"Tahoma","sans-serif"'> project-volde=
mort@googlegroups.com
[mailto:project-voldemort@googlegroups.com] <b>On Behalf Of </b>spence<br>
<b>Sent:</b> Saturday, July 31, 2010 8:59 AM<br>
<b>To:</b> project-voldemort@googlegroups.com<br>
<b>Subject:</b> Re: Pull or push? (was RE: [project-voldemort] Hinted Hando=
ff
proposal, initial implementation and testplan)<o:p></o:p></span></p>
</div>
<p class=3DMsoNormal><o:p> </o:p></p>
<p>Perhaps measuring the response time on a push and using that as an
indicator. It's really cheap and tune-able.<o:p></o:p></p>
<blockquote style=3D'margin-top:5.0pt;margin-bottom:5.0pt'>
<p class=3DMsoNormal style=3D'margin-bottom:12.0pt'>On Jul 30, 2010 7:42 PM=
,
"Freeman, Tim" <<a href=3D"mailto:tim.free...@hp.com">tim.free=
m...@hp.com</a>>
wrote:<br>
<br>
I assume one issue is ensuring that the broken node doesn't get flooded
receiving hinted handoffs when it comes back up.<br>
<br>
To deal with that, you could have the newly recovered node request data fro=
m
the slop store of each node that might have some data, rather than having a=
ll
nodes with potentially useful data trying to push it to the newly recovered
node.<br>
<br>
If we assume that nodes never have sure knowledge of a network partition, t=
his
approach would imply that every node periodically asks every other node whe=
ther
there's data for it in the slop store. In the normal case, there's no
data there and this ought to be quick, so maybe the n**2 traffic is okay.
Or maybe we do have a good model of when network partitions start and=
end
so the n**2 behavior is not required. I haven't dug through the code
enough to know.<br>
<br>
Tim Freeman<br>
Email: <a href=3D"mailto:tim.free...@hp.com">tim.free...@hp.com</a><br>
Desk in Palo Alto: (650) 857-2581<br>
Home: (408) 774-1298<br>
Cell: (408) 348-7536<br>
<br>
<br>
-----Original Message-----<br>
From: <a href=3D"mailto:project-voldemort@googlegroups.com">project-voldemo=
rt@googlegroups.com</a>
[mailto:<a href=3D"mailto:project-voldemort@googlegroups.com">project-volde=
mort@googlegroups.com</a>]
On Behalf Of Alex Feinberg<br>
Sent: Thursday, July 29, 2010 2:53 PM<br>
To: <a href=3D"mailto:project-voldemort@googlegroups.com">project-voldemort=
@googlegroups.com</a><br>
Subject: [project-voldemort] Hinted Handoff proposal, initial implementatio=
n
and testplan<br>
<br>
Hey guys,<br>
<br>
I have a working initial implementation of Hinted Handoff. It is<br>
presently undergoing testing. I wrote up a wiki on the implementation<br>
and the test plan, please take a look.<br>
<br>
<a href=3D"http://wiki.github.com/voldemort/voldemort/hinted-handoff"
target=3D"_blank">http://wiki.github.com/voldemort/voldemort/hinted-handoff=
</a><br>
<br>
Thanks,<br>
- Alex<br>
<br>
--<br>
You received this message because you are subscribed to the Google Groups
"project-voldemort" group.<br>
To post to this group, send email to <a
href=3D"mailto:project-voldemort@googlegroups.com">project-voldemort@google=
groups.com</a>.<br>
To unsubscribe from this group, send email to <a
href=3D"mailto:project-voldemort%2Bunsubscribe@googlegroups.com">project-vo=
ldemort+unsubscribe@googlegroups.com</a>.<br>
For more options, visit this group at <a
href=3D"http://groups.google.com/group/project-voldemort?hl=3Den" target=3D=
"_blank">http://groups.google.com/group/project-voldemort?hl=3Den</a>.<br>
<span style=3D'color:#888888'><br>
--<br>
You received this message because you are subscribed to the Google Groups
"project-voldemort" group.<br>
To post to this group, send email to <a
href=3D"mailto:project-voldemort@googlegroups.com">project-voldemort@google=
groups.com</a>.<br>
To unsubscribe from this group, send email to <a
href=3D"mailto:project-voldemort%2Bunsubscribe@googlegroups.com">project-vo=
ldemort+unsubscribe@googlegroups.com</a>.<br>
For more options, visit this group at <a
href=3D"http://groups.google.com/group/project-voldemort?hl=3Den" target=3D=
"_blank">http://groups.google.com/group/project-voldemort?hl=3Den</a>.</spa=
n><o:p></o:p></p>
</blockquote>
<p class=3DMsoNormal>-- <br>
You received this message because you are subscribed to the Google Groups
"project-voldemort" group.<br>
To post to this group, send email to project-voldemort@googlegroups.com.<br=
>
To unsubscribe from this group, send email to
project-voldemort+unsubscribe@googlegroups.com.<br>
For more options, visit this group at
http://groups.google.com/group/project-voldemort?hl=3Den.<o:p></o:p></p>
</div>
</body>
</html>
--_000_59DD1BA8FD3C0F4C90771C18F2B5B53A63A761C575GVW0432EXBame_--