Message from discussion
wsrep_sst_method skip and IST.
Received: by 10.204.15.209 with SMTP id l17mr734993bka.6.1348840472063;
Fri, 28 Sep 2012 06:54:32 -0700 (PDT)
X-BeenThere: codership-team@googlegroups.com
Received: by 10.204.155.71 with SMTP id r7ls4580530bkw.6.gmail; Fri, 28 Sep
2012 06:54:31 -0700 (PDT)
Received: by 10.204.129.81 with SMTP id n17mr735794bks.3.1348840470952;
Fri, 28 Sep 2012 06:54:30 -0700 (PDT)
Received: by 10.204.129.81 with SMTP id n17mr735793bks.3.1348840470929;
Fri, 28 Sep 2012 06:54:30 -0700 (PDT)
Return-Path: <alexey.yurche...@codership.com>
Received: from mailfw02.zoner.fi (mailfw02.zoner.fi. [84.34.147.249])
by gmr-mx.google.com with ESMTPS id k7si867639bks.2.2012.09.28.06.54.30
(version=TLSv1/SSLv3 cipher=OTHER);
Fri, 28 Sep 2012 06:54:30 -0700 (PDT)
Received-SPF: neutral (google.com: 84.34.147.249 is neither permitted nor denied by best guess record for domain of alexey.yurche...@codership.com) client-ip=84.34.147.249;
Authentication-Results: gmr-mx.google.com; spf=neutral (google.com: 84.34.147.249 is neither permitted nor denied by best guess record for domain of alexey.yurche...@codership.com) smtp.mail=alexey.yurche...@codership.com
Received: from www15.zoner.fi ([84.34.147.35])
by wwwsmtp02.zoner.fi with ESMTP; 28 Sep 2012 16:54:30 +0300
Received: from localhost ([127.0.0.1] helo=www.codership.com)
by www15.zoner.fi with esmtpa (Exim 4.77)
(envelope-from <alexey.yurche...@codership.com>)
id 1THb1R-0002V5-Vs
for codership-team@googlegroups.com; Fri, 28 Sep 2012 16:54:30 +0300
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8;
format=flowed
Content-Transfer-Encoding: 7bit
Date: Fri, 28 Sep 2012 16:54:29 +0300
From: Alex Yurchenko <alexey.yurche...@codership.com>
To: <codership-team@googlegroups.com>
Subject: Re: [codership-team] =?UTF-8?Q?wsrep=5Fsst=5Fmethod=20skip=20and?=
=?UTF-8?Q?=20IST=2E?=
Organization: Codership Oy
In-Reply-To: <03c5cdf0-a3c6-4c34-9a57-b35d1a2790be@googlegroups.com>
References: <9a5fcb17-ad23-482d-b84a-aa32209ca54e@googlegroups.com>
<CAKHyketP86aPXtuKSxTYJr0bpQgeRSUvVKkT3cn_hkNjnDZ...@mail.gmail.com>
<1157db34-a04f-41b1-8dc0-855ff1e8a1dc@googlegroups.com>
<CAKHyketM+pFWZ2UzZuSEJzS0vFoOM_dzeYFjSeTd1=7ucpQ...@mail.gmail.com>
<03c5cdf0-a3c6-4c34-9a57-b35d1a2790be@googlegroups.com>
Message-ID: <cd9c3d2378739a44147f977d8db04...@codership.com>
X-Sender: alexey.yurche...@codership.com
User-Agent: Roundcube Webmail/0.7.2
X-Antivirus-Scanner: Clean mail though you should still use an Antivirus
On 2012-09-28 00:10, Ilias Bertsimas wrote:
> Henrik,
>
> Thank you for your answer. I understand the importance of SST and the
> need
> for it to run when the nodes are inconsistent and that is the way I
> use
> (xtrabackup) it on another production cluster with ~120GB of DATA.
> From what I am seeing examining the sst scripts it seems IST is done
> there
> on CASE BYPASS so a good idea would be to modify one of the scripts
> to work
> for BYPASS and fail with an error on full SST so I can handle the
> data
> consistency manually from there.
> Alexey can you please confirm the above ?
Yes, you're absolutely correct there. But I immediately have another
question: why do it manually when you can write a script?
Well, in any case, you need to cook a new SST script - whether to fail
or to handle data consistency.
> Kind Regards,
> Ilias.
>
> On Thursday, September 27, 2012 9:41:29 PM UTC+1, Henrik Ingo wrote:
>>
>> Ilias
>>
>> My point is, rather than leaving wsrep_sst_method=skip, you should
>> leave it to something else so that SST will *fail* and the node is
>> not
>> allowed to return to cluster. Now with skip method, the node will
>> "succeed" in joining the cluster but will still not have the same
>> data.
>>
>> As a quick and dirty solution, I would set wsrep_sst_method=rsync
>> and
>> then uninstall rsync from the servers, so then SST will fail if it
>> is
>> tried. A nicer solution of course is to create your own sst script
>> (or
>> ask Codership to) that will just return error immediately. (Heh,
>> that
>> would then be wsrep_sst_method=fail :-)
>>
>> Alex: You didn't answer the actual question: Will IST be used even
>> when wsrep_sst_method=skip? (I assume yes, but I've been wrong
>> before...)
O, tempora! What? You people can't just try and see what happens? Ok,
can tell you cause I tried and saw. No state transfer will happen -
neither SST nor IST. But it is not in the spec, so this behavior should
not be relied on. wsrep_sst_method=skip was invented to assemble _idle_
clusters of the nodes which are known to be identical. Because some
users were uncomfortab;e with copying grastate.dat from node to node.
Well, turns out it is even worse than copying grastate.dat, cause it
requires an absolutely idle cluster.
>> henrik
>>
>> On Thu, Sep 27, 2012 at 3:22 PM, Ilias Bertsimas
>> <awar...@gmail.com<javascript:>>
>> wrote:
>> > Hello Henrik,
>> >
>> > Yes I know the purpose of skip sst method is for setting up a
>> cluster
>> > manually.
>> > The only reason I use it is because I do not want an sst to happen
>> under
>> any
>> > circumstances and it happens once it can't do an IST and sometimes
>> it
>> can
>> > happen without really needed based on my experience.
>> > An SST is impractical on a 5TB dataset.
>> > I have a big enough gcache size to cover at least 12 hours of data
>> changes.
>> >
>> > Thanks!
>> >
>> > On Thursday, September 27, 2012 1:11:04 PM UTC+1, Henrik Ingo
>> wrote:
>> >>
>> >> On Thu, Sep 27, 2012 at 2:28 PM, Ilias Bertsimas
>> <awar...@gmail.com>
>> >> wrote:
>> >> > I have a galera cluster with a huge amount of data where a full
>> SST
>> >> > would be
>> >> > pointless at it will take 3-4 days plus the amount of time
>> needed to
>> >> > apply
>> >> > the new writesets to catch up.
>> >> > I have set cluster's wsrep_sst_method to skip but it is not
>> clear if
>> it
>> >> > will
>> >> > skip IST as well.
>> >>
>> >> Actually, I don't think you are supposed to use the skip method
>> as a
>> >> permanent setting. If I understood correctly, Percona developed
>> it to
>> >> be used when initially starting the cluster. In this case you
>> could
>> >> manually restore the same data to all nodes, so you know they are
>> in
>> >> the same state before you start any nodes at all.
>> >>
>> >> Otoh if you have a running cluster and some node is disconnected
>> long
>> >> enough to need an SST, then you can't leave wsrep_sst_method to
>> skip
>> >> since the node would then have inconsistent data.
>> >>
>> >> > Can someone confirm how it will react if it needs an IST ?
>> >>
>> >> No. (I have my guess, but that's not what you want, so I'll leave
>> to
>> >> Codership guys to confirm.)
>> >>
>> >> But referring to what I said above, you should just make sure
>> that
>> >> your gcache.size is large enough that SST never needs to happen.
>> And
>> >> if a node is disconnected long enough that IST won't work, then
>> you
>> >> are back to square one.
>> >>
>> >> henrik
>> >> --
>> >> henri...@avoinelama.fi
>> >> +358-40-8211286 skype: henrik.ingo irc: hingo
>> >> www.openlife.cc
>> >>
>> >> My LinkedIn profile:
>> http://www.linkedin.com/profile/view?id=9522559
>> >
>> > --
>> >
>> >
>>
>>
>>
>> --
>> henri...@avoinelama.fi <javascript:>
>> +358-40-8211286 skype: henrik.ingo irc: hingo
>> www.openlife.cc
>>
>> My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559
>>
--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011