Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion 3 node cluster, seeing very heavy loads on one of the nodes???

Received: by 10.204.129.15 with SMTP id m15mr885292bks.2.1334021112981;
        Mon, 09 Apr 2012 18:25:12 -0700 (PDT)
X-BeenThere: pycassa-discuss@googlegroups.com
Received: by 10.205.128.6 with SMTP id hc6ls5483128bkc.3.gmail; Mon, 09 Apr
 2012 18:25:12 -0700 (PDT)
Received: by 10.204.131.75 with SMTP id w11mr886248bks.0.1334021112300;
        Mon, 09 Apr 2012 18:25:12 -0700 (PDT)
Received: by 10.204.131.75 with SMTP id w11mr886247bks.0.1334021112270;
        Mon, 09 Apr 2012 18:25:12 -0700 (PDT)
Return-Path: <ty...@datastax.com>
Received: from mail-lb0-f175.google.com (mail-lb0-f175.google.com [209.85.217.175])
        by gmr-mx.google.com with ESMTPS id cs9si20114653bkb.3.2012.04.09.18.25.12
        (version=TLSv1/SSLv3 cipher=OTHER);
        Mon, 09 Apr 2012 18:25:12 -0700 (PDT)
Received-SPF: pass (google.com: domain of ty...@datastax.com designates 209.85.217.175 as permitted sender) client-ip=209.85.217.175;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of ty...@datastax.com designates 209.85.217.175 as permitted sender) smtp.mail=ty...@datastax.com
Received: by lbky2 with SMTP id y2so2078436lbk.6
        for <pycassa-discuss@googlegroups.com>; Mon, 09 Apr 2012 18:25:11 -0700 (PDT)
        d=google.com; s=20120113;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:x-gm-message-state;
        bh=qr/i4lchDqBazS+EXwZ7H8qjIuLb/wN/YjTp/4huDRA=;
        b=TT7dXlvdJczEV2MR0CmPkEibfb5EI7vcQLL5cVzlp8W1Wd0/8dda+NJYxtqpn/hpMF
         iDXr45Oa6qG9WJdJa8JdtfYmQDcV2tw+JntyKyeDqfXyGFmtdro9fXIvWwlYWehqz7yW
         jN7caCOICQq1OMEZ4VJzjVBIY+cS277/57yQjKJRCWS3VBOnEv4cY4jt+mlnV31Pw4RJ
         q9GCL+/+Ck9xEvyt2ocmCCB1tiYaYZGIiagpRQBnMIft3UwcRt72dTGE6M+eqVgC0FB/
         nVzeUAYVBzZeM9Ng61nR7K9dlP+0auGTanJNZSF+mEVm20zUICs8Cvhe+/ZAVCTdjB3H
         cJqQ==
MIME-Version: 1.0
Received: by 10.152.113.229 with SMTP id jb5mr13393582lab.45.1334021111766;
 Mon, 09 Apr 2012 18:25:11 -0700 (PDT)
Received: by 10.112.86.131 with HTTP; Mon, 9 Apr 2012 18:25:11 -0700 (PDT)
In-Reply-To: <33094183.430.1333993506993.JavaMail.geo-discussion-forums@yngr3>
References: <33094183.430.1333993506993.JavaMail.geo-discussion-forums@yngr3>
Date: Mon, 9 Apr 2012 20:25:11 -0500
Message-ID: <CAAam9st7At9HBvVuPY7NgUzJyDbJhm7uaQXNhvO=hqAwQ2a...@mail.gmail.com>
Subject: Re: 3 node cluster, seeing very heavy loads on one of the nodes???
From: Tyler Hobbs <ty...@datastax.com>
To: pycassa-discuss@googlegroups.com
Content-Type: multipart/alternative; boundary=f46d04088d8b7d5c7204bd48fe09
X-Gm-Message-State: ALoCoQkjYfGRkTQEiCDsCF4ygeZk9W/tCC7BVVzT0B9HsD/hHzUpKkBksKN6cgRzMFJp3fV11gPG

--f46d04088d8b7d5c7204bd48fe09
Content-Type: text/plain; charset=ISO-8859-1

What parameters are you using when creating the connection pool?

On Mon, Apr 9, 2012 at 12:45 PM, stantonk <stant...@gmail.com> wrote:

> I have a 3 node cluster setup with RandomPlacement strategy. We run a
> weekly job that produces heavy load for several hours while reading data
> for all our users and producing summary data. When that happens, I'm seeing
> a CPU load of ~10 on one of our nodes, and only loads of 2-3 on the other
> two.
>
> We are using PyCassa as our client to read all of the data to generate the
> reports, and I'm wondering if there is some way PyCassa could be the
> culprit? I looked at the source and saw the random.shuffle() call, so it
> doesn't seem like that would be the issue.
>
> Provided PyCassa (or the way in which we're using it) is not the culprit,
> that leaves me with a couple other ideas:
>
> 1) The data isn't evenly distributed across the nodes?
>
> Running "nodetool ring" produces the following output:
>
> Address         DC          Rack        Status State   Load
>  Owns    Token
>
>      113427455640312821154458202477256070485
> 10.0.1.2    datacenter1 rack1       Up     Normal  25.77 GB        33.33%
>  0
> 10.0.1.3    datacenter1 rack1       Up     Normal  24.76 GB        33.33%
>  56713727820156410577229101238628035242
> 10.0.1.4   datacenter1 rack1       Up     Normal  23.74 GB        33.33%
>  113427455640312821154458202477256070485
>
> So, at a high level at least, this doesn't seem to be the issue. The data
> is written using a row-key of user_id, and the data sets being pulled back
> for each user_id should be about the same size.
>
> 2) The seed node in a cluster naturally sees more load?
>
> Though I'm not entirely sure why. But the node that is experiencing a load
> of 10 is the seed node.
>
> Can anyone shed some light on this? Is it a concern?
>
> Thanks,
> Kevin
>
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

--f46d04088d8b7d5c7204bd48fe09
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

What parameters are you using when creating the connection pool?<br><br><di=
v class=3D"gmail_quote">On Mon, Apr 9, 2012 at 12:45 PM, stantonk <span dir=
=3D"ltr">&lt;<a href=3D"mailto:stant...@gmail.com">stant...@gmail.com</a>&g=
t;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">I have a 3 node cluster setup with RandomPla=
cement strategy. We run a weekly job that produces heavy load for several h=
ours while reading data for all our users and producing summary data. When =
that happens, I&#39;m seeing a CPU load of ~10 on one of our nodes, and onl=
y loads of 2-3 on the other two.<div>
<br></div><div>We are using PyCassa as our client to read all of the data t=
o generate the reports, and I&#39;m wondering if there is some way PyCassa =
could be the culprit? I looked at the source and saw the random.shuffle() c=
all, so it doesn&#39;t seem like that would be the issue.</div>
<div><br></div><div>Provided PyCassa (or the way in which we&#39;re using i=
t) is not the culprit, that leaves me with a couple other ideas:</div><div>=
<br></div><div>1) The data isn&#39;t evenly distributed across the nodes?</=
div>
<div><br></div><div>Running &quot;nodetool ring&quot; produces the followin=
g output:</div><div><font face=3D"&#39;courier new&#39;, monospace"><br></f=
ont></div><div><div style=3D"font-family:&#39;courier new&#39;,monospace">A=
ddress =A0 =A0 =A0 =A0 DC =A0 =A0 =A0 =A0 =A0Rack =A0 =A0 =A0 =A0Status Sta=
te =A0 Load =A0 =A0 =A0 =A0 =A0 =A0Owns =A0 =A0Token =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0</div>
<div style=3D"font-family:&#39;courier new&#39;,monospace">=A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A01134274=
55640312821154458202477256070485 =A0 =A0=A0</div><div style=3D"font-family:=
&#39;courier new&#39;,monospace">
10.0.1.2 =A0 =A0datacenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A025.77 GB=
 =A0 =A0 =A0 =A033.33% =A00 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0</div><div style=3D"font-family:&#39=
;courier new&#39;,monospace">10.0.1.3 =A0 =A0datacenter1 rack1 =A0 =A0 =A0 =
Up =A0 =A0 Normal =A024.76 GB =A0 =A0 =A0 =A033.33% =A056713727820156410577=
229101238628035242 =A0 =A0 =A0</div>
<div style=3D"font-family:&#39;courier new&#39;,monospace">10.0.1.4 =A0 dat=
acenter1 rack1 =A0 =A0 =A0 Up =A0 =A0 Normal =A023.74 GB =A0 =A0 =A0 =A033.=
33% =A0113427455640312821154458202477256070485</div><div style=3D"font-fami=
ly:&#39;courier new&#39;,monospace">
<br></div><div><font face=3D"arial, sans-serif">So, at a high level at leas=
t, this doesn&#39;t seem to be the issue. The data is written using a row-k=
ey of user_id, and the data sets being pulled back for each user_id should =
be about the same size.</font></div>
</div><div><font face=3D"arial, sans-serif"><br></font></div><div><font fac=
e=3D"arial, sans-serif">2) The seed node in a cluster naturally sees more l=
oad?</font></div><div><font face=3D"arial, sans-serif"><br></font></div><di=
v>
<font face=3D"arial, sans-serif">Though I&#39;m not entirely sure why. But =
the node that is experiencing a load of 10 is the seed node.</font></div><d=
iv><font face=3D"arial, sans-serif"><br></font></div><div><font face=3D"ari=
al, sans-serif">Can anyone shed some light on this? Is it a concern?</font>=
</div>
<div><font face=3D"arial, sans-serif"><br></font></div><div><font face=3D"a=
rial, sans-serif">Thanks,</font></div><div><font face=3D"arial, sans-serif"=
>Kevin</font></div><div><font face=3D"arial, sans-serif"><br></font></div><=
div>
<font face=3D"arial, sans-serif"><br></font></div></blockquote></div><br><b=
r clear=3D"all"><br>-- <br><font color=3D"#888888">Tyler Hobbs<span></span>=
<br>
<a href=3D"http://datastax.com/" target=3D"_blank">DataStax</a><br></font><=
br>

--f46d04088d8b7d5c7204bd48fe09--