Message from discussion
3 node cluster, seeing very heavy loads on one of the nodes???
Received: by 10.204.129.15 with SMTP id m15mr885292bks.2.1334021112981;
Mon, 09 Apr 2012 18:25:12 -0700 (PDT)
X-BeenThere: pycassa-discuss@googlegroups.com
Received: by 10.205.128.6 with SMTP id hc6ls5483128bkc.3.gmail; Mon, 09 Apr
2012 18:25:12 -0700 (PDT)
Received: by 10.204.131.75 with SMTP id w11mr886248bks.0.1334021112300;
Mon, 09 Apr 2012 18:25:12 -0700 (PDT)
Received: by 10.204.131.75 with SMTP id w11mr886247bks.0.1334021112270;
Mon, 09 Apr 2012 18:25:12 -0700 (PDT)
Return-Path: <ty...@datastax.com>
Received: from mail-lb0-f175.google.com (mail-lb0-f175.google.com [209.85.217.175])
by gmr-mx.google.com with ESMTPS id cs9si20114653bkb.3.2012.04.09.18.25.12
(version=TLSv1/SSLv3 cipher=OTHER);
Mon, 09 Apr 2012 18:25:12 -0700 (PDT)
Received-SPF: pass (google.com: domain of ty...@datastax.com designates 209.85.217.175 as permitted sender) client-ip=209.85.217.175;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of ty...@datastax.com designates 209.85.217.175 as permitted sender) smtp.mail=ty...@datastax.com
Received: by lbky2 with SMTP id y2so2078436lbk.6
for <pycassa-discuss@googlegroups.com>; Mon, 09 Apr 2012 18:25:11 -0700 (PDT)
d=google.com; s=20120113;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:content-type:x-gm-message-state;
bh=qr/i4lchDqBazS+EXwZ7H8qjIuLb/wN/YjTp/4huDRA=;
b=TT7dXlvdJczEV2MR0CmPkEibfb5EI7vcQLL5cVzlp8W1Wd0/8dda+NJYxtqpn/hpMF
iDXr45Oa6qG9WJdJa8JdtfYmQDcV2tw+JntyKyeDqfXyGFmtdro9fXIvWwlYWehqz7yW
jN7caCOICQq1OMEZ4VJzjVBIY+cS277/57yQjKJRCWS3VBOnEv4cY4jt+mlnV31Pw4RJ
q9GCL+/+Ck9xEvyt2ocmCCB1tiYaYZGIiagpRQBnMIft3UwcRt72dTGE6M+eqVgC0FB/
nVzeUAYVBzZeM9Ng61nR7K9dlP+0auGTanJNZSF+mEVm20zUICs8Cvhe+/ZAVCTdjB3H
cJqQ==
MIME-Version: 1.0
Received: by 10.152.113.229 with SMTP id jb5mr13393582lab.45.1334021111766;
Mon, 09 Apr 2012 18:25:11 -0700 (PDT)
Received: by 10.112.86.131 with HTTP; Mon, 9 Apr 2012 18:25:11 -0700 (PDT)
In-Reply-To: <33094183.430.1333993506993.JavaMail.geo-discussion-forums@yngr3>
References: <33094183.430.1333993506993.JavaMail.geo-discussion-forums@yngr3>
Date: Mon, 9 Apr 2012 20:25:11 -0500
Message-ID: <CAAam9st7At9HBvVuPY7NgUzJyDbJhm7uaQXNhvO=hqAwQ2a...@mail.gmail.com>
Subject: Re: 3 node cluster, seeing very heavy loads on one of the nodes???
From: Tyler Hobbs <ty...@datastax.com>
To: pycassa-discuss@googlegroups.com
Content-Type: multipart/alternative; boundary=f46d04088d8b7d5c7204bd48fe09
X-Gm-Message-State: ALoCoQkjYfGRkTQEiCDsCF4ygeZk9W/tCC7BVVzT0B9HsD/hHzUpKkBksKN6cgRzMFJp3fV11gPG
--f46d04088d8b7d5c7204bd48fe09
Content-Type: text/plain; charset=ISO-8859-1
What parameters are you using when creating the connection pool?
On Mon, Apr 9, 2012 at 12:45 PM, stantonk <stant...@gmail.com> wrote:
> I have a 3 node cluster setup with RandomPlacement strategy. We run a
> weekly job that produces heavy load for several hours while reading data
> for all our users and producing summary data. When that happens, I'm seeing
> a CPU load of ~10 on one of our nodes, and only loads of 2-3 on the other
> two.
>
> We are using PyCassa as our client to read all of the data to generate the
> reports, and I'm wondering if there is some way PyCassa could be the
> culprit? I looked at the source and saw the random.shuffle() call, so it
> doesn't seem like that would be the issue.
>
> Provided PyCassa (or the way in which we're using it) is not the culprit,
> that leaves me with a couple other ideas:
>
> 1) The data isn't evenly distributed across the nodes?
>
> Running "nodetool ring" produces the following output:
>
> Address DC Rack Status State Load
> Owns Token
>
> 113427455640312821154458202477256070485
> 10.0.1.2 datacenter1 rack1 Up Normal 25.77 GB 33.33%
> 0
> 10.0.1.3 datacenter1 rack1 Up Normal 24.76 GB 33.33%
> 56713727820156410577229101238628035242
> 10.0.1.4 datacenter1 rack1 Up Normal 23.74 GB 33.33%
> 113427455640312821154458202477256070485
>
> So, at a high level at least, this doesn't seem to be the issue. The data
> is written using a row-key of user_id, and the data sets being pulled back
> for each user_id should be about the same size.
>
> 2) The seed node in a cluster naturally sees more load?
>
> Though I'm not entirely sure why. But the node that is experiencing a load
> of 10 is the seed node.
>
> Can anyone shed some light on this? Is it a concern?
>
> Thanks,
> Kevin
>
>
>
--
Tyler Hobbs
DataStax <http://datastax.com/>
--f46d04088d8b7d5c7204bd48fe09
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
What parameters are you using when creating the connection pool?<br><br><di=
v class=3D"gmail_quote">On Mon, Apr 9, 2012 at 12:45 PM, stantonk <span dir=
=3D"ltr"><<a href=3D"mailto:stant...@gmail.com">stant...@gmail.com</a>&g=
t;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I have a 3 node cluster setup with RandomPla=
cement strategy. We run a weekly job that produces heavy load for several h=
ours while reading data for all our users and producing summary data. When =
that happens, I'm seeing a CPU load of ~10 on one of our nodes, and onl=
y loads of 2-3 on the other two.<div>
<br></div><div>We are using PyCassa as our client to read all of the data t=
o generate the reports, and I'm wondering if there is some way PyCassa =
could be the culprit? I looked at the source and saw the random.shuffle() c=
all, so it doesn't seem like that would be the issue.</div>
<div><br></div><div>Provided PyCassa (or the way in which we're using i=
t) is not the culprit, that leaves me with a couple other ideas:</div><div>=
<br></div><div>1) The data isn't evenly distributed across the nodes?</=
div>
<div><br></div><div>Running "nodetool ring" produces the followin=
g output:</div><div><font face=3D"'courier new', monospace"><br></f=
ont></div><div><div style=3D"font-family:'courier new',monospace">A=
ddress =A0 =A0 =A0 =A0 DC =A0 =A0 =A0 =A0 =A0Rack =A0 =A0 =A0 =A0Status Sta=
te =A0 Load =A0 =A0 =A0 =A0 =A0 =A0Owns =A0 =A0Token =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0</div>
<div style=3D"font-family:'courier new',monospace">=A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A01134274=
55640312821154458202477256070485 =A0 =A0=A0</div><div style=3D"font-family:=
'courier new',monospace">
10.0.1.2 =A0 =A0datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A025.77 GB=
=A0 =A0 =A0 =A033.33% =A00 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0</div><div style=3D"font-family:'=
;courier new',monospace">10.0.1.3 =A0 =A0datacenter1 rack1 =A0 =A0 =A0 =
Up =A0 =A0 Normal =A024.76 GB =A0 =A0 =A0 =A033.33% =A056713727820156410577=
229101238628035242 =A0 =A0 =A0</div>
<div style=3D"font-family:'courier new',monospace">10.0.1.4 =A0 dat=
acenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A023.74 GB =A0 =A0 =A0 =A033.=
33% =A0113427455640312821154458202477256070485</div><div style=3D"font-fami=
ly:'courier new',monospace">
<br></div><div><font face=3D"arial, sans-serif">So, at a high level at leas=
t, this doesn't seem to be the issue. The data is written using a row-k=
ey of user_id, and the data sets being pulled back for each user_id should =
be about the same size.</font></div>
</div><div><font face=3D"arial, sans-serif"><br></font></div><div><font fac=
e=3D"arial, sans-serif">2) The seed node in a cluster naturally sees more l=
oad?</font></div><div><font face=3D"arial, sans-serif"><br></font></div><di=
v>
<font face=3D"arial, sans-serif">Though I'm not entirely sure why. But =
the node that is experiencing a load of 10 is the seed node.</font></div><d=
iv><font face=3D"arial, sans-serif"><br></font></div><div><font face=3D"ari=
al, sans-serif">Can anyone shed some light on this? Is it a concern?</font>=
</div>
<div><font face=3D"arial, sans-serif"><br></font></div><div><font face=3D"a=
rial, sans-serif">Thanks,</font></div><div><font face=3D"arial, sans-serif"=
>Kevin</font></div><div><font face=3D"arial, sans-serif"><br></font></div><=
div>
<font face=3D"arial, sans-serif"><br></font></div></blockquote></div><br><b=
r clear=3D"all"><br>-- <br><font color=3D"#888888">Tyler Hobbs<span></span>=
<br>
<a href=3D"http://datastax.com/" target=3D"_blank">DataStax</a><br></font><=
br>
--f46d04088d8b7d5c7204bd48fe09--