Re: [codership-team] wsrep_local_state 2

619 views
Skip to first unread message

Alex Yurchenko

unread,
Mar 17, 2013, 9:37:04 PM3/17/13
to codersh...@googlegroups.com
Hi Igor,

Strictly speaking there may be several reasons for this, starting with
inaccurate description, but the most likely is that you're using the
older buggy version of the software. Any further diagnostic is
impossible without full logs from all three nodes, and description of
the load.

On 2013-03-17 17:01, Egor Shevtsov wrote:
> Hi Guys,
> Sorry I'm very new to Galera, so the question sounds like a basic
> one, but
> I can't find the explanation.
> I'm playing with 3 nodes Galera installation in our Dev env. I made a
> change to innodb_buffer_pool_size on all 3 nodes and restarted them
> one by
> one.
> Restarted the first one, it came back synchronised quickly, restarted
> the
> second one, it restarted, failed and on second attempt started SST,

This one is suspicious, it might be the case of misconfiguration.

> which succeeded in the end.
> When SST was underway, I checked wsrep% status on the Donor host, it
> showed:
> wsrep_local_state | 2
>
> wsrep_local_state_comment | Donor/Desynced
>
> Which is as far as I understand donor node starts writing to the
> cache.
> After I completed the restart of all 3 nodes, ALL nodes look good,
> connected to the same Primary Component, the same
> wsrep_cluster_conf_id,
> wsrep_last_commited,
> wsrep_connected ON.
> but one of the nodes shows:
> wsrep_local_state | 2
>
> wsrep_local_state_comment | Donor/Desynced
>
> were the others 2:
> wsrep_local_state | 4
>
> wsrep_local_state_comment | Synced
>
> The question is why the Donor reported as Desynced, when it looks
> perfectly
> OK.
> Many thanks.
>
> Igor

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Egor Shevtsov

unread,
Mar 18, 2013, 9:23:30 AM3/18/13
to codersh...@googlegroups.com
Hi Alex,
Thanks for replying to me.
I run our Development Galera cluster on 
Server version:         5.5.29-MariaDB-log MariaDB Server, wsrep_23.7.3.rXXXX
The latest one I believe, installed from MariaDB repositories for CentOS 6.3 64Bit.
wsrep_provider_version 23.2.2(r137)

I did CentOS upgrade on the joiner node to 6.4, but affected node (donor) as well as the another 3rd node still CentOS 6.3


my.cnf: http://pastebin.com/tz3riCBB identical on all 3 nodes except of the wsrep_node_name= diff and refer to the node name as 'dev-db-th-slave-1' for example. 
Error.log  http://pastebin.com/gWc0mxbx from the donor machine that changed it's state during SST at 130317 13:38:22 (Shifting SYNCED -> DONOR/DESYNCED)
and was in this state and wsrep_local_state =2 until I restarted the box at 130317 16:06:54
After that it joined nicely Synced , wsrep_local_state =4

This is http://pastebin.com/PKWWgguY joiner node log. I can see at 130317 13:21:49:
 Prepared IST, than Selected 0 (dev-db-th-slave-1)(SYNCED) as donor. 
State transfer to 1 (db-gb-pwr-1-01) failed: -113 (No route to host) -- could be our DNS.
than on second start up: 
Failed to prepare for incremental state transfer: Local state UUID
Selected 0 (dev-db-th-slave-1)(SYNCED) as donor. -- and this time no name resolution issue, SST succedded.

So the main question for me here:
Why the donnor: dev-db-th-slave-1 changed it's state from  SYNCED -> DONOR/DESYNCED during the SST transfer? 
I was able to write to this node, query it normally and didn't see any issues except of the  wsrep_local_state_comment = DONOR/DESYNCED and wsrep_local_state =2 .

I hope it's not too much
We want move to Galera in our prod asap, but I need to learn about behavioral of this lovely beast before we commit to that.
 
Many thanks,
Igor



On Sunday, 17 March 2013 15:01:42 UTC, Egor Shevtsov wrote:
Hi Guys,
Sorry I'm very new to Galera, so the question sounds like a basic one, but I can't find the explanation.
I'm playing with 3 nodes Galera installation in our Dev env. I made a change to innodb_buffer_pool_size on all 3 nodes and restarted them one by one.
Restarted the first one, it came back synchronised quickly, restarted the second one, it restarted, failed and on second attempt started SST, which succeeded in the end.

Alex Yurchenko

unread,
Mar 18, 2013, 10:21:05 AM3/18/13
to codersh...@googlegroups.com
On 2013-03-18 15:23, Egor Shevtsov wrote:
> Hi Alex,
> Thanks for replying to me.
> I run our Development Galera cluster on
> Server version: 5.5.29-MariaDB-log MariaDB Server,
> wsrep_23.7.3.rXXXX
> The latest one I believe, installed from MariaDB repositories for
> CentOS
> 6.3 64Bit.
> wsrep_provider_version 23.2.2(r137)

This is a tad too old, we have a newer release here:
https://launchpad.net/galera/+download

You want to use that one especially when you're using custom
evs.*send_window settings.
Well, this is quite natural - the node becomes a donor, so naturally it
changes state. So that, for example, it is not picked as donor for the
second time.

The question is why it didn't change the state back to
DONOR->JOINED->SYNCED after SST was over.

> I was able to write to this node, query it normally and didn't see
> any
> issues except of the wsrep_local_state_comment = DONOR/DESYNCED and
> wsrep_local_state =2 .

And it looks like while the SST script signalled mysqld that it can
continue committing transactions (and that's why you didn't see any
issues):

130317 13:38:22 [Note] WSREP: Tables flushed.
130317 13:43:39 [Note] WSREP: Provider resumed.

it didn't actually return (or there is a bug that ignores SST script
return, but it is very unlikely, as it never was reported before). Why
SST script could not return is for you to discover, but the bottomline
is that as long as SST script is running, the node have to stay in the
DONOR state.

I'm pretty sure that's all that there is to it.

Egor Shevtsov

unread,
Mar 18, 2013, 11:05:13 AM3/18/13
to codersh...@googlegroups.com
Thanks Alex,
I installed Galera as part of MariaDB Galera solution using yum and their repository.
Now as I try to run
rpm -ivh galera-23.2.4-1.rhel5.x86_64.rpm 
Preparing...                ########################################### [100%]
        file /usr/bin/garbd from install of galera-23.2.4-1.rhel5.x86_64 conflicts with file from package galera-23.2.2-1.rhel5.x86_64
        file /usr/lib64/galera/libgalera_smm.so from install of galera-23.2.4-1.rhel5.x86_64 conflicts with file from package galera-23.2.2-1.rhel5.x86_64


I got a conflict.
Do you know by chance is it safe to

rpm -ivh --replacefiles galera-23.2.4-1.rhel5.x86_64.rpm

Thanks,

Alex Yurchenko

unread,
Mar 18, 2013, 11:15:53 AM3/18/13
to codersh...@googlegroups.com
On 2013-03-18 17:05, Egor Shevtsov wrote:
> Thanks Alex,
> I installed Galera as part of MariaDB Galera solution using yum and
> their
> repository.
> Now as I try to run
> rpm -ivh galera-23.2.4-1.rhel5.x86_64.rpm
> Preparing...
> ###########################################
> [100%]
> file /usr/bin/garbd from install of
> galera-23.2.4-1.rhel5.x86_64
> conflicts with file from package galera-23.2.2-1.rhel5.x86_64
> file /usr/lib64/galera/libgalera_smm.so from install of
> galera-23.2.4-1.rhel5.x86_64 conflicts with file from package
> galera-23.2.2-1.rhel5.x86_64
>
>
> I got a conflict.
> Do you know by chance is it safe to
>
> rpm -ivh *--replacefiles* galera-23.2.4-1.rhel5.x86_64.rpm

It is. But I think that what you really want is

rpm -Uvh galera-23.2.4-1.rhel5.x86_64.rpm

BTW, why are you using rhel5 RPMs? I thought you had CentOS 6...

Egor Shevtsov

unread,
Mar 18, 2013, 11:25:45 AM3/18/13
to codersh...@googlegroups.com
This is the name of the package for CentOS. 

galera-23.2.4-1.rhel6.x86_64.rpm (md5)Galera wsrep provider binary for RHEL/CentOS 6 (64bit)

just noticed I downloaded rpm for CentOS 5. had to get a proper one. I use Debian mostly.

Yeah. name of the package for ContOS goes galera-23.2.4-1.rhel6.x86_64.rpm

Nothing we can do here.

Egor Shevtsov

unread,
Mar 18, 2013, 11:40:31 AM3/18/13
to codersh...@googlegroups.com
Updated the wsrep provider package in 3 steps. no issues. 
1. service mysql stop
2. rpm -Uvh galera-23.2.4-1.rhel6.x86_64.rpm 
3. service mysql start

>>>show status like 'wsrep_provider_ver%';
wsrep_provider_version | 23.2.4(r147) 

NICE. Thank you Alex.

Egor Shevtsov

unread,
Mar 18, 2013, 5:17:42 PM3/18/13
to codersh...@googlegroups.com
OK. It was firewall issue, that prevented Donor to communicate properly with the Joiner node.
That was fixed and now IST works as it should and All the nodes are in Sync after the SST process is completed.
Thanks Alex. 

Henrik Ingo

unread,
Mar 19, 2013, 8:16:31 AM3/19/13
to Alex Yurchenko, codersh...@googlegroups.com
On Mon, Mar 18, 2013 at 4:21 PM, Alex Yurchenko
<alexey.y...@codership.com> wrote:
> On 2013-03-18 15:23, Egor Shevtsov wrote:
>>
>> Hi Alex,
>> Thanks for replying to me.
>> I run our Development Galera cluster on
>> Server version: 5.5.29-MariaDB-log MariaDB Server,
>> wsrep_23.7.3.rXXXX
>> The latest one I believe, installed from MariaDB repositories for CentOS
>> 6.3 64Bit.
>> wsrep_provider_version 23.2.2(r137)
>
>
> This is a tad too old, we have a newer release here:
> https://launchpad.net/galera/+download

To be clear, the MariaDB binaries are the most recent ones. For some
reason the Galera replication library was not. You can update only the
library from the link where Alex pointed to. I don't know why MariaDB
repository doesn't have the newest library.

henrik

--
henri...@avoinelama.fi
+358-40-8211286 skype: henrik.ingo irc: hingo
www.openlife.cc

My LinkedIn profile: http://www.linkedin.com/profile/view?id=9522559

Egor Shevtsov

unread,
Mar 19, 2013, 9:29:13 AM3/19/13
to codersh...@googlegroups.com, Alex Yurchenko, henri...@avoinelama.fi
Hi Henrik,
That was done yesterday. The process of updating the library rpm run nicely on all 3 nodes.
I changed  evs.send_window, evs.user_send_window to the default once, fixed firewall outbound rules on the affected node and all looks rosy as it should.
More I play with Galera I can't stop thinking how great this product is and what an amazing job was done do bring it to the masses.  
Thanks a lot guys.

igor 
Reply all
Reply to author
Forward
0 new messages