Repair out of sync cluster nodes connection

Oleksandr Drach

unread,

Jan 24, 2013, 6:59:47 AM1/24/13

to codersh...@googlegroups.com

Dear Codership Community!

I've had a 5 node wsrep cluster working well. 2 nodes have been switched off and the rest 3 nodes worked normally, there was some changed to DB.

But now both of those 2 disabled nodes are failed to connect to cluster.

Current wsrep_sst_method is rsync_wan

Any ways to fix cluster without re-installation?

Thanks!

Ilias Bertsimas

unread,

Jan 24, 2013, 7:06:45 AM1/24/13

to codersh...@googlegroups.com

Hello,

Can you please define what you mean by failed to connect and also post galera configuration options of the nodes involved and the logs when you start them up and they try to connect to the cluster.

Kind Regards,

Ilias.

Message has been deleted

Oleksandr Drach

unread,

Jan 24, 2013, 7:21:18 AM1/24/13

to codersh...@googlegroups.com

Hello,

Sure! That's my fault - I have to put them initially.

Node configuration:

[mysqld]

# (This must be substituted by wsrep_format)

binlog_format=ROW

# Currently only InnoDB storage engine is supported

default-storage-engine=innodb

# to avoid issues with 'bulk mode inserts' using autoinc

innodb_autoinc_lock_mode=2

# This is a must for paralell applying

innodb_locks_unsafe_for_binlog=1

# Query Cache is not supported with wsrep

query_cache_size=0

query_cache_type=0

# Override bind-address

# In some systems bind-address defaults to 127.0.0.1, and with mysqldump SST

# it will have (most likely) disastrous consequences on donor node

bind-address=10.0.1.28

##

## WSREP options

##

# Full path to wsrep provider library or 'none'

wsrep_provider=/usr/lib/galera/libgalera_smm.so

# Provider specific configuration options

#wsrep_provider_options=

wsrep_provider_options="gmcast.listen_addr = tcp://10.0.1.28:4567; gcs.fc_limit = 128; evs.send_window=512; evs.user_send_window=512; evs.keepalive_period = PT3S; evs.inactive_check_period = PT10S; evs.suspect_timeout = PT30S; evs.inactive_timeout = PT1M; evs.consensus_timeout = PT1M;"

# Logical cluster name. Should be the same for all nodes.

wsrep_cluster_name="my_wsrep_mysql_cluster"

# Group communication system handle

wsrep_cluster_address="gcomm://10.0.1.207"

# Human-readable node name (non-unique). Hostname by default.

#wsrep_node_name=

# Base replication <address|hostname>[:port] of the node.

# The values supplied will be used as defaults for state transfer receiving,

# listening ports and so on. Default: address of the first network interface.

wsrep_node_address=10.0.1.28

# Address for incoming client connections. Autodetect by default.

wsrep_node_incoming_address=10.0.1.28

# How many threads will process writesets from other nodes

wsrep_slave_threads=16

# DBUG options for wsrep provider

#wsrep_dbug_option

# Generate fake primary keys for non-PK tables (required for multi-master

# and parallel applying operation)

wsrep_certify_nonPK=1

# Maximum number of rows in write set

wsrep_max_ws_rows=131072

# Maximum size of write set

wsrep_max_ws_size=1073741824

# to enable debug level logging, set this to 1

wsrep_debug=0

# convert locking sessions into transactions

wsrep_convert_LOCK_to_trx=0

# how many times to retry deadlocked autocommits

wsrep_retry_autocommit=1

# change auto_increment_increment and auto_increment_offset automatically

wsrep_auto_increment_control=1

# retry autoinc insert, which failed for duplicate key error

wsrep_drupal_282555_workaround=0

# enable "strictly synchronous" semantics for read operations

wsrep_causal_reads=0

# Command to call when node status or cluster membership changes.

# Will be passed all or some of the following options:

# --status - new status of this node

# --uuid - UUID of the cluster

# --primary - whether the component is primary or not ("yes"/"no")

# --members - comma-separated list of members

# --index - index of this node in the list

wsrep_notify_cmd=yes

##

## WSREP State Transfer options

##

# State Snapshot Transfer method

#wsrep_sst_method=skip

wsrep_sst_method=rsync_wan

#wsrep_sst_method=mysqldump

# Address on THIS node to receive SST at. DON'T SET IT TO DONOR ADDRESS!!!

# (SST method dependent. Defaults to the first IP of the first interface)

wsrep_sst_receive_address=10.0.1.28

# SST authentication string. This will be used to send SST to joining nodes.

# Depends on SST method. For mysqldump method it is root:<root password>

wsrep_sst_auth=wsrep:MySECRETPaSS

# Desired SST donor name.

#wsrep_sst_donor=

# Protocol version to use

# wsrep_protocol_version=

Syslog on unsuccessful connection retries:

Jan 24 05:11:59 db5 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql

Jan 24 05:11:59 db5 mysqld_safe: WSREP: Running position recovery with --log_error=/tmp/tmp.qEcgNvaTJu

Jan 24 05:12:01 db5 mysqld_safe: WSREP: Recovered position 7ceca664-661a-11e2-0800-d387abc379ea:0

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: wsrep_start_position var submitted: '7ceca664-661a-11e2-0800-d387abc379ea:0'

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: Read nil XID from storage engines, skipping position init

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: wsrep_load(): Galera 23.2.2(r137) by Codership Oy <in...@codership.com> loaded succesfully.

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: Found saved state: de5fcacb-11fc-11e2-0800-a4371bcdc02d:-1

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: Reusing existing '/var/lib/mysql//galera.cache'.

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: Passing config to GCS: base_host = 10.0.1.28; base_port = 4567; cert.log_conflicts = no; evs.consensus_timeout = PT1M; evs.inactive_check_period = PT10S; evs.inactive_timeout = PT1M; evs.keepalive_period = PT3S; evs.send_window = 512; evs.suspect_timeout = PT30S; evs.user_send_window = 512; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 128; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; gmcast.listen_addr = tcp://10.0.1.28:4567; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: wsrep_sst_grab()

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: Start replication

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: protonet asio version 0

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: backend: asio

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: GMCast version 0

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: (42182cc1-661f-11e2-0800-87ba3178bbb2, 'tcp://10.0.1.28:4567') listening at tcp://10.0.1.28:4567

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: (42182cc1-661f-11e2-0800-87ba3178bbb2, 'tcp://10.0.1.28:4567') multicast: , ttl: 1

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: EVS version 0

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: PC version 0

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: gcomm: connecting to group 'my_wsrep_mysql_cluster', peer '10.0.1.207:'

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: (42182cc1-661f-11e2-0800-87ba3178bbb2, 'tcp://10.0.1.28:4567') turning message relay requesting on, nonlive peers: tcp://174.37.16.89:4567

Jan 24 05:12:01 db5 mysqld: 130124 5:12:01 [Note] WSREP: (42182cc1-661f-11e2-0800-87ba3178bbb2, 'tcp://10.0.1.28:4567') turning message relay requesting off

Jan 24 05:12:03 db5 mysqld: 130124 5:12:03 [Note] WSREP: declaring f11806d8-661c-11e2-0800-3a1ac26d46f3 stable

Jan 24 05:12:03 db5 mysqld: 130124 5:12:03 [Note] WSREP: declaring f637d776-661c-11e2-0800-e1f7f6800e01 stable

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: view(view_id(PRIM,42182cc1-661f-11e2-0800-87ba3178bbb2,9) memb {

Jan 24 05:12:06 db5 mysqld: #01142182cc1-661f-11e2-0800-87ba3178bbb2,

Jan 24 05:12:06 db5 mysqld: #011f11806d8-661c-11e2-0800-3a1ac26d46f3,

Jan 24 05:12:06 db5 mysqld: #011f637d776-661c-11e2-0800-e1f7f6800e01,

Jan 24 05:12:06 db5 mysqld: } joined {

Jan 24 05:12:06 db5 mysqld: } left {

Jan 24 05:12:06 db5 mysqld: } partitioned {

Jan 24 05:12:06 db5 mysqld: })

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: gcomm: connected

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: Opened channel 'my_wsrep_mysql_cluster'

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: Waiting for SST to complete.

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 4560c67c-661f-11e2-0800-cf50fbeb0855

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: STATE EXCHANGE: sent state msg: 4560c67c-661f-11e2-0800-cf50fbeb0855

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: STATE EXCHANGE: got state msg: 4560c67c-661f-11e2-0800-cf50fbeb0855 from 0 (db5)

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: STATE EXCHANGE: got state msg: 4560c67c-661f-11e2-0800-cf50fbeb0855 from 1 (db2.example.org)

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: STATE EXCHANGE: got state msg: 4560c67c-661f-11e2-0800-cf50fbeb0855 from 2 (db1.example.org)

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: Quorum results:

Jan 24 05:12:06 db5 mysqld: #011version = 2,

Jan 24 05:12:06 db5 mysqld: #011component = PRIMARY,

Jan 24 05:12:06 db5 mysqld: #011conf_id = 6,

Jan 24 05:12:06 db5 mysqld: #011members = 2/3 (joined/total),

Jan 24 05:12:06 db5 mysqld: #011act_id = 147294,

Jan 24 05:12:06 db5 mysqld: #011last_appl. = -1,

Jan 24 05:12:06 db5 mysqld: #011protocols = 0/4/2 (gcs/repl/appl),

Jan 24 05:12:06 db5 mysqld: #011group UUID = de5fcacb-11fc-11e2-0800-a4371bcdc02d

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: Flow-control interval: [222, 222]

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 147294)

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: State transfer required:

Jan 24 05:12:06 db5 mysqld: #011Group state: de5fcacb-11fc-11e2-0800-a4371bcdc02d:147294

Jan 24 05:12:06 db5 mysqld: #011Local state: de5fcacb-11fc-11e2-0800-a4371bcdc02d:-1

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Note] WSREP: New cluster view: global state: de5fcacb-11fc-11e2-0800-a4371bcdc02d:147294, view# 7: Primary, number of nodes: 3, my index: 0, protocol version 2

Jan 24 05:12:06 db5 mysqld: 130124 5:12:06 [Warning] WSREP: Gap in state sequence. Need state transfer.

Jan 24 05:12:08 db5 mysqld: 130124 5:12:08 [Note] WSREP: Running: 'wsrep_sst_rsync_wan --role 'joiner' --address '10.0.1.28' --auth 'wsrep:MySECRETPaSS' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '19360''

Jan 24 05:12:08 db5 mysqld: 130124 5:12:08 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync_wan --role 'joiner' --address '10.0.1.28' --auth 'wsrep:MySECRETPaSS' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '19360'

Jan 24 05:12:08 db5 mysqld: #011Read: 'rsync daemon already running.'

Jan 24 05:12:08 db5 mysqld: 130124 5:12:08 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync_wan --role 'joiner' --address '10.0.1.28' --auth 'wsrep:MySECRETPaSS' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '19360': 114 (Operation already in progress)

Jan 24 05:12:08 db5 mysqld: 130124 5:12:08 [ERROR] WSREP: Failed to prepare for 'rsync_wan' SST. Unrecoverable.

Jan 24 05:12:08 db5 mysqld: 130124 5:12:08 [ERROR] Aborting

Jan 24 05:12:08 db5 mysqld:

Jan 24 05:12:10 db5 mysqld: 130124 5:12:10 [Note] WSREP: Closing send monitor...

Jan 24 05:12:10 db5 mysqld: 130124 5:12:10 [Note] WSREP: Closed send monitor.

Jan 24 05:12:10 db5 mysqld: 130124 5:12:10 [Note] WSREP: gcomm: terminating thread

Jan 24 05:12:10 db5 mysqld: 130124 5:12:10 [Note] WSREP: gcomm: joining thread

Jan 24 05:12:10 db5 mysqld: 130124 5:12:10 [Note] WSREP: gcomm: closing backend

Thanks!

Четвер, 24 січня 2013 р. 14:06:45 UTC+2 користувач Ilias Bertsimas написав:

Ilias Bertsimas

unread,

Jan 24, 2013, 7:30:03 AM1/24/13

to codersh...@googlegroups.com

Hi,

From a first look at the log which I assume comes from a node trying to connect to the cluster you seem to have a synced cluster of 2 nodes and the 3rd one is joining now.

Jan 24 05:12:06 db5 mysqld: #011members = 2/3 (joined/total),

It tries to take a full SST but it fails on rsync. Seems rsync is already running ? Can you try and see if there is an rsync process running in the joining node and stop it.

Then you could try again by starting mysql.

Jan 24 05:12:08 db5 mysqld: 130124 5:12:08 [ERROR] WSREP: Failed to read 'ready <addr>' from: wsrep_sst_rsync_wan --role 'joiner' --address '10.0.1.28' --auth 'wsrep:MySECRETPaSS' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '19360'

Jan 24 05:12:08 db5 mysqld: #011Read: 'rsync daemon already running.'

Jan 24 05:12:08 db5 mysqld: 130124 5:12:08 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync_wan --role 'joiner' --address '10.0.1.28' --auth 'wsrep:MySECRETPaSS' --datadir '/var/lib/mysql/' --defaults-file '/etc/mysql/my.cnf' --parent '19360': 114 (Operation already in progress)

Jan 24 05:12:08 db5 mysqld: 130124 5:12:08 [ERROR] WSREP: Failed to prepare for 'rsync_wan' SST. Unrecoverable.

Jan 24 05:12:08 db5 mysqld: 130124 5:12:08 [ERROR] Aborting

Kind Regards,

Ilias.

Oleksandr Drach

unread,

Jan 24, 2013, 8:18:20 AM1/24/13

to codersh...@googlegroups.com

Perfect!

It was true - third node was not able to join cluster. Helped it to connect by killing excessive processes, then deleted /var/lib/mysql//grastate.dat on 4th node has helped to join it.

On 5th node I've started to play with /var/lib/mysql hence reinstallation of wsrep server just helped.

Thanks a lot, Ilias!

Четвер, 24 січня 2013 р. 14:30:03 UTC+2 користувач Ilias Bertsimas написав:

Ilias Bertsimas

unread,

Jan 24, 2013, 8:24:01 AM1/24/13

to codersh...@googlegroups.com

Glad I could help Oleksandr!

Oleksandr Drach

unread,

Jan 24, 2013, 11:21:00 AM1/24/13

to codersh...@googlegroups.com

I am wondering why those machine could not connect to cluster again.

What is the proper way to temporary remove node from cluster and then return it back, syncing the data?

I have used "service mysql stop" and "service mysql start" on Ubuntu but it has not helped me to restore the cluster.

Probably there should be some hacking on galera avariables?

Thanks!

Ilias Bertsimas

unread,

Jan 24, 2013, 11:40:17 AM1/24/13

to codersh...@googlegroups.com

In my experience what describes works for me. But I have noticed sometimes the sst tools, netcat in the case of xtrabackup or rsync may be kept running after a failure or even after completing successfully so the next time they are needed to perform an SST they may fail. So if you check for their processes before starting the node's mysql might help.

Kind Regards,

Ilias.

Alex Yurchenko

unread,

Jan 24, 2013, 1:06:00 PM1/24/13

to codersh...@googlegroups.com

On 2013-01-24 18:21, Oleksandr Drach wrote:
> I am wondering why those machine could not connect to cluster again.
>

> *What is the proper way to temporary remove node from cluster and
> then
> return it back, syncing the data?*
> *
> *

> I have used "service mysql stop" and "service mysql start" on Ubuntu
> but it
> has not helped me to restore the cluster.

Yes, graceful shutdown is one of the proper ways to remove node from

the cluster.

> Probably there should be some hacking on galera avariables?

Well, I think the issue was that you have shutdown the node when it was
in transitional state, most likely while it was sending/receiving
snapshot and rsync daemon was running. And I guess cleanup was
incomplete on shutdown and rsync daemon was left running there. Donor
cleanup in case of rsync-wan script is a rather uncharted territory,
unfortunately.

> Thanks!

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

Oleksandr Drach

unread,

Jan 25, 2013, 8:13:39 AM1/25/13

to codersh...@googlegroups.com

Alex,

Thanks for your reply.

Четвер, 24 січня 2013 р. 20:06:00 UTC+2 користувач Alexey Yurchenko написав:

And I guess cleanup was
incomplete on shutdown and rsync daemon was left running there. Donor
cleanup in case of rsync-wan script is a rather uncharted territory,
unfortunately.

I am using rsync_wan and I have noticed that after "service mysql stop" nodes often failing in joining the cluster again.

Hence after reply of Ilias I've made some investigations and found that following commands always helps in dealing with issue.

After:

# ps ax | grep rsync

# kill {rsyncID}

# rm /var/lib/mysql//grastate.dat

# service mysql start

the node joins the a cluster again without issues!

Probably this workaround could also help someone else and may be put to Q&A.

Thanks!

Alex Yurchenko

unread,

Jan 25, 2013, 8:35:59 AM1/25/13

to codersh...@googlegroups.com

On 2013-01-25 15:13, Oleksandr Drach wrote:
Hi Oleksandr,

I'm afraid that either you're giving not a well-informed advice or we
have some serious issues there. Specifically removing grastate.dat
_forces_ the node to use SST instead of IST and generally should not be
advised, especially in the case like yours, where you want to minimize
WAN traffic. Galera takes care to invalidate the contents of this file
whenever recovery by IST is not safe, so, removing it normally does not
have any beneficial effect.

Rather we perhaps should look in more detail into what is happening
there. Specifically, are you calling "service mysql stop" on a node
which is in JOINER or DONOR state, why do you have to do that, and
what's in the logs.

Regards,
Alex

> Alex,
> Thanks for your reply.
>
> Четвер, 24 січня 2013 р. 20:06:00 UTC+2 користувач Alexey Yurchenko
> написав:
>>
>> And I guess cleanup was
>> incomplete on shutdown and rsync daemon was left running there.
>> Donor
>> cleanup in case of rsync-wan script is a rather uncharted territory,
>> unfortunately.
>>
>

> I am using *rsync_wan* and I have noticed that after "*service mysql
> stop*"

> nodes often failing in joining the cluster again.
> Hence after reply of Ilias I've made some investigations and found
> that
> following commands always helps in dealing with issue.
>
> After:

> *# ps ax | grep rsync*
> *# kill {rsyncID}*
> *# rm /var/lib/mysql//grastate.dat*
> *# service mysql start*

> the node joins the a cluster again without issues!
>
> Probably this workaround could also help someone else and may be put
> to Q&A.
> Thanks!

Oleksandr Drach

unread,

Jan 25, 2013, 12:37:53 PM1/25/13

to codersh...@googlegroups.com

On Friday, January 25, 2013 3:35:59 PM UTC+2, Alexey Yurchenko wrote:

On 2013-01-25 15:13, Oleksandr Drach wrote:
Hi Oleksandr,

I'm afraid that either you're giving not a well-informed advice or we
have some serious issues there. Specifically removing grastate.dat
_forces_ the node to use SST instead of IST and generally should not be
advised, especially in the case like yours, where you want to minimize
WAN traffic. Galera takes care to invalidate the contents of this file
whenever recovery by IST is not safe, so, removing it normally does not
have any beneficial effect.

You are right, large amount of traffic is generated but this way definitely works.

But while node is in syncing state failover scripts will take care to use for production purposes another nodes.

Rather we perhaps should look in more detail into what is happening
there. Specifically, are you calling "service mysql stop" on a node
which is in JOINER or DONOR state, why do you have to do that, and
what's in the logs.

Ok, I'll try to reproduce the situation when regular "service mysql start" does not help to return node to the cluster and catch some logs here.

Oleksandr Drach

unread,

Jan 29, 2013, 11:07:52 AM1/29/13

to codersh...@googlegroups.com

Ilias, you was right, rsync daemon causing starting issue.

Hence the following command solves the connectiviety issue:

killall rsync && service mysql start

Ilias Bertsimas

unread,

Jan 29, 2013, 11:19:40 AM1/29/13

to codersh...@googlegroups.com

Yes I had the same issue with xtrabackup sometimes but I considered it a slight annoyance.

It seems there is a bug report for that issue or at least one occurrence of it on the PXC bugs report page https://bugs.launchpad.net/percona-xtradb-cluster/+bug/1076816

Oleksandr Drach

unread,

Jan 29, 2013, 11:28:24 AM1/29/13

to codersh...@googlegroups.com

Thanks for useful link, seems like it's known issue for xtrabackup!

But the problem is that this situation is also occurs with codership galera using rsync_wan.

Probably it's bigger - galera issue.

Reply all

Reply to author

Forward