Multiple instances wsrep connecting to wrong address

1,362 views

Skip to first unread message

Michael Eklund

unread,

Oct 17, 2013, 10:32:00 PM10/17/13

to codersh...@googlegroups.com

We are running 2 instances per node. I have this setup running without issue on 3 iron nodes on Percona XtraDB Cluster (GPL) 5.5.31-23.7.5, Revision 438, wsrep_23.7.5.r3880 without issue. I have added 2 new nodes that are vmware vms and are 5.5.33-55-log Percona XtraDB Cluster (GPL), wsrep_23.7.6.r3915. I can bring up one, of either of these nodes without issue. If I try to bring up both nodes I get wsrep connecting to the wrong IP.

Here is the wsrep config info on node 4:

instance1:

##########################

# Galera Cluster Settings #

###########################

[mysqld]

wsrep_cluster_name = mysqlro

# Human-readable node name (non-unique). Hostname by default.

#wsrep_node_name = node0

wsrep_node_address = 10.1.10.200

wsrep_provider_options = "gmcast.listen_addr=tcp://10.1.10.200:4567;ist.recv_addr=10.1.10.200:4568;"

wsrep_node_incoming_address = 10.1.10.200

# this is used for bootstrapping only

#wsrep_cluster_address = gcomm://

wsrep_cluster_address = gcomm://10.1.10.27,10.1.10.28,10.1.10.29

wsrep_provider = /usr/lib/libgalera_smm.so

wsrep_sst_method = xtrabackup

wsrep_slave_threads = 16

innodb_locks_unsafe_for_binlog = 1

innodb_autoinc_lock_mode = 2

binlog_format = ROW

# Query Cache is not supported with wsrep

query_cache_size=0

query_cache_type=0

# secrets for state transfer.

!include /etc/mysql/ro.secrets.wsrep

[mysqld]

wsrep_sst_auth=root:gunpey911

# WARNING: This file is managed by the puppet mysql module

# ANY LOCAL CHANGES WILL BE OVERRITTEN

instance 2:

##########################

# Galera Cluster Settings #

###########################

[mysqld]

wsrep_cluster_name = mysqluser_data

# Human-readable node name (non-unique). Hostname by default.

#wsrep_node_name = node0

wsrep_node_address = 10.1.10.201

wsrep_provider_options = "gmcast.listen_addr=tcp://10.1.10.201:4567;ist.recv_addr=10.1.10.201:4568;"

wsrep_node_incoming_address = 10.1.10.201

# this is used for bootstrapping only

#wsrep_cluster_address = gcomm://

wsrep_cluster_address = gcomm://10.1.10.30,10.1.10.31,10.1.10.32

wsrep_provider = /usr/lib/libgalera_smm.so

wsrep_sst_method = xtrabackup

wsrep_slave_threads = 16

innodb_locks_unsafe_for_binlog = 1

innodb_autoinc_lock_mode = 2

binlog_format = ROW

# Query Cache is not supported with wsrep

query_cache_size=0

query_cache_type=0

# secrets for state transfer.

!include /etc/mysql/user_data.secrets.wsrep

On node 5:

instance 1:

##########################

# Galera Cluster Settings #

###########################

[mysqld]

wsrep_cluster_name = mysqlro

# Human-readable node name (non-unique). Hostname by default.

#wsrep_node_name = node0

wsrep_node_address = 10.1.10.202

wsrep_provider_options = "gmcast.listen_addr=tcp://10.1.10.202:4567;ist.recv_addr=10.1.10.202:4568;"

wsrep_node_incoming_address = 10.1.10.202

# this is used for bootstrapping only

#wsrep_cluster_address = gcomm://

wsrep_cluster_address = gcomm://10.1.10.27,10.1.10.28,10.1.10.29

wsrep_provider = /usr/lib/libgalera_smm.so

wsrep_sst_method = xtrabackup

wsrep_slave_threads = 16

innodb_locks_unsafe_for_binlog = 1

innodb_autoinc_lock_mode = 2

binlog_format = ROW

# Query Cache is not supported with wsrep

query_cache_size=0

query_cache_type=0

# secrets for state transfer.

!include /etc/mysql/ro.secrets.wsrep

# WARNING: This file is managed by the puppet mysql module

# ANY LOCAL CHANGES WILL BE OVERRITTEN

instance :

##########################

# Galera Cluster Settings #

###########################

[mysqld]

wsrep_cluster_name = mysqluser_data

# Human-readable node name (non-unique). Hostname by default.

#wsrep_node_name = node0

wsrep_node_address = 10.1.10.203

wsrep_provider_options = "gmcast.listen_addr=tcp://10.1.10.203:4567;ist.recv_addr=10.1.10.203:4568;"

wsrep_node_incoming_address = 10.1.10.203

# this is used for bootstrapping only

#wsrep_cluster_address = gcomm://

wsrep_cluster_address = gcomm://10.1.10.30,10.1.10.31,10.1.10.32

wsrep_provider = /usr/lib/libgalera_smm.so

wsrep_sst_method = xtrabackup

wsrep_slave_threads = 16

innodb_locks_unsafe_for_binlog = 1

innodb_autoinc_lock_mode = 2

binlog_format = ROW

# Query Cache is not supported with wsrep

query_cache_size=0

query_cache_type=0

# secrets for state transfer.

!include /etc/mysql/user_data.secrets.wsrep

As soon as I bring up either instance I get errors. Here is an example of bringing up instance2:

On node 4:

==> user_data.err <==

131017 22:24:53 [Note] WSREP: Node 1 (atl-mysqlro5) requested state transfer from '*any*'. Selected 0 (atl-mysqlro4)(SYNCED) as donor.

131017 22:24:53 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 61796743)

131017 22:24:53 [Note] WSREP: IST request: 3ca49689-2ea0-11e2-0800-77c8d18762d3:61794351-61796742|tcp://10.1.10.203:4568

131017 22:24:53 [Note] WSREP: IST first seqno 61794352 not found from cache, falling back to SST

131017 22:24:53 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

131017 22:24:53 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '10.1.10.203:4444/xtrabackup_sst' --auth 'root:gunpey911' --socket '/var/run/mysqld/user_data.sock' --datadir '/dealnews/mysqldb/user_data/' --defaults-file '/etc/mysql/user_data.cnf' --gtid '3ca49689-2ea0-11e2-0800-77c8d18762d3:61796743''

131017 22:24:53 [Note] WSREP: sst_donor_thread signaled with 0

WSREP_SST: [INFO] Streaming with xbstream (20131017 22:24:53.354)

WSREP_SST: [INFO] Using socat as streamer (20131017 22:24:53.357)

WSREP_SST: [INFO] Streaming GTID file before SST (20131017 22:24:53.368)

WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u stdio TCP:10.1.10.203:4444; RC=( ${PIPESTATUS[@]} ) (20131017 22:24:53.373)

WSREP_SST: [INFO] Sleeping before data transfer for SST (20131017 22:24:53.381)

131017 22:24:53 [Note] WSREP: (5b7d9132-379c-11e3-ad45-7f6c0f23935f, 'tcp://10.1.10.201:4567') reconnecting to 7716abfc-379c-11e3-99b0-ee61208245b2 (tcp://10.1.10.202:4567), attempt 0

==> ro.err <==

131017 22:24:53 [Note] WSREP: handshake failed, my group: 'mysqlro', peer group: 'mysqluser_data'

131017 22:24:55 [Note] WSREP: handshake failed, my group: 'mysqlro', peer group: 'mysqluser_data'

131017 22:24:56 [Note] WSREP: handshake failed, my group: 'mysqlro', peer group: 'mysqluser_data'

131017 22:24:58 [Note] WSREP: handshake failed, my group: 'mysqlro', peer group: 'mysqluser_data'

131017 22:24:59 [Note] WSREP: handshake failed, my group: 'mysqlro', peer group: 'mysqluser_data'

131017 22:25:01 [Note] WSREP: handshake failed, my group: 'mysqlro', peer group: 'mysqluser_data'

131017 22:25:02 [Note] WSREP: handshake failed, my group: 'mysqlro', peer group: 'mysqluser_data'

On Node 5:

131017 22:26:31 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

131017 22:27:07 [Note] WSREP: (7716abfc-379c-11e3-99b0-ee61208245b2, 'tcp://10.1.10.203:4567') reconnecting to 5b7d9132-379c-11e3-ad45-7f6c0f23935f (tcp://10.1.10.200:4567), attempt 90

131017 22:27:52 [Note] WSREP: (7716abfc-379c-11e3-99b0-ee61208245b2, 'tcp://10.1.10.203:4567') reconnecting to 5b7d9132-379c-11e3-ad45-7f6c0f23935f (tcp://10.1.10.200:4567), attempt 120

131017 22:28:37 [Note] WSREP: (7716abfc-379c-11e3-99b0-ee61208245b2, 'tcp://10.1.10.203:4567') reconnecting to 5b7d9132-379c-11e3-ad45-7f6c0f23935f (tcp://10.1.10.200:4567), attempt 150

131017 22:29:22 [Note] WSREP: (7716abfc-379c-11e3-99b0-ee61208245b2, 'tcp://10.1.10.203:4567') reconnecting to 5b7d9132-379c-11e3-ad45-7f6c0f23935f (tcp://10.1.10.200:4567), attempt 180

I cannot for the life of me figure out why node5/instance2 is trying to connect to 10.1.10.200:4567 which is node4/instance1.

Any insight?

Regards,

Mike E.

Teemu Ollakka

unread,

Oct 18, 2013, 7:04:55 AM10/18/13

to codersh...@googlegroups.com

Hi,

I believe that this happens because outgoing Galera connections are not bound to certain IP. Operating system decides to pic "wrong" IP address to local connection point and other nodes start to use that address to connect newly joined node. If you have two instances running on node with different listen addresses in the same subnet, it is matter of luck if instances get correct local point IP for outgoing connections. In your case, better to use different port ranges (port in gmcast.listen_addr and ist.recv_addr) for instances until this is fixed. Also remember to add proper ports to addresses in wsrep_cluster_address too.

- Teemu

Jan Kirchhoff

unread,

Oct 18, 2013, 8:04:23 AM10/18/13

to codersh...@googlegroups.com

On Linux, you can easily force the source IP adress using iptables rules.

should be something like
iptables -t nat -A POSTROUTING -d <DESTINATION_IP> -o <INTERFACE> -j
SNAT --to <SOURCE_IP>
just a few iptables commands on bootup/ifup and you should be fine

i.e. Server 1 has 10.0.0.1 and 10.0.0.2, Server 2 has 10.0.0.11 and
10.0.0.12 and you want 10.0.0.1 and 10.0.0.11 to be one "connection" and
.2/.12 as a separate one:

on Server 1
iptables -t nat -A POSTROUTING -d 10.0.0.11 -o eth0 -j SNAT --to 10.0.0.1
iptables -t nat -A POSTROUTING -d 10.0.0.12 -o eth0 -j SNAT --to 10.0.0.2

and on Server 2
iptables -t nat -A POSTROUTING -d 10.0.0.1 -o eth0 -j SNAT --to 10.0.0.11
iptables -t nat -A POSTROUTING -d 10.0.0.2 -o eth0 -j SNAT --to 10.0.0.12

This should enforce that all packets from server 1 going to IP 10.0.0.11
have the source IP 10.0.0.1 and all packets going to 10.0.0.12 have the
source IP 10.0.0.2. And the same in reverse for server 2.

Should be the correct syntax, but have a look at "man iptables" or
Google... iptables is very powerful, very handy sometimes and much
easier to use than many people think...

Jan

> <http://10.1.10.200:4568>;"

> wsrep_node_incoming_address = 10.1.10.200
> # this is used for bootstrapping only
> #wsrep_cluster_address = gcomm://
> wsrep_cluster_address = gcomm://10.1.10.27

> <http://10.1.10.27>,10.1.10.28,10.1.10.29

> <http://10.1.10.201:4568>;"

> wsrep_node_incoming_address = 10.1.10.201
> # this is used for bootstrapping only
> #wsrep_cluster_address = gcomm://
> wsrep_cluster_address = gcomm://10.1.10.30

> <http://10.1.10.30>,10.1.10.31,10.1.10.32

> wsrep_provider = /usr/lib/libgalera_smm.so
> wsrep_sst_method = xtrabackup
> wsrep_slave_threads = 16
> innodb_locks_unsafe_for_binlog = 1
> innodb_autoinc_lock_mode = 2
> binlog_format = ROW
> # Query Cache is not supported with wsrep
> query_cache_size=0
> query_cache_type=0
> # secrets for state transfer.
> !include /etc/mysql/user_data.secrets.wsrep
>
> On node 5:
>
> instance 1:
> ##########################
> # Galera Cluster Settings #
> ###########################
> [mysqld]
> wsrep_cluster_name = mysqlro
> # Human-readable node name (non-unique). Hostname by default.
> #wsrep_node_name = node0
> wsrep_node_address = 10.1.10.202
> wsrep_provider_options =
> "gmcast.listen_addr=tcp://10.1.10.202:4567;ist.recv_addr=10.1.10.202:4568

> <http://10.1.10.202:4568>;"

> wsrep_node_incoming_address = 10.1.10.202
> # this is used for bootstrapping only
> #wsrep_cluster_address = gcomm://
> wsrep_cluster_address = gcomm://10.1.10.27

> <http://10.1.10.27>,10.1.10.28,10.1.10.29

> <http://10.1.10.203:4568>;"

> wsrep_node_incoming_address = 10.1.10.203
> # this is used for bootstrapping only
> #wsrep_cluster_address = gcomm://
> wsrep_cluster_address = gcomm://10.1.10.30

> <http://10.1.10.30>,10.1.10.31,10.1.10.32

> wsrep_provider = /usr/lib/libgalera_smm.so
> wsrep_sst_method = xtrabackup
> wsrep_slave_threads = 16
> innodb_locks_unsafe_for_binlog = 1
> innodb_autoinc_lock_mode = 2
> binlog_format = ROW
> # Query Cache is not supported with wsrep
> query_cache_size=0
> query_cache_type=0
> # secrets for state transfer.
> !include /etc/mysql/user_data.secrets.wsrep
>
> As soon as I bring up either instance I get errors. Here is an
> example of bringing up instance2:
>
> On node 4:
>
> ==> user_data.err <==
> 131017 22:24:53 [Note] WSREP: Node 1 (atl-mysqlro5) requested state
> transfer from '*any*'. Selected 0 (atl-mysqlro4)(SYNCED) as donor.
> 131017 22:24:53 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO:
> 61796743)
> 131017 22:24:53 [Note] WSREP: IST request:
> 3ca49689-2ea0-11e2-0800-77c8d18762d3:61794351-61796742|tcp://10.1.10.203:4568

> <http://10.1.10.203:4568>

> 131017 22:24:53 [Note] WSREP: IST first seqno 61794352 not found
> from cache, falling back to SST
> 131017 22:24:53 [Note] WSREP: wsrep_notify_cmd is not defined,
> skipping notification.
> 131017 22:24:53 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role
> 'donor' --address '10.1.10.203:4444/xtrabackup_sst

> <http://10.1.10.203:4444/xtrabackup_sst>' --auth 'root:gunpey911'

> --socket '/var/run/mysqld/user_data.sock' --datadir
> '/dealnews/mysqldb/user_data/' --defaults-file
> '/etc/mysql/user_data.cnf' --gtid
> '3ca49689-2ea0-11e2-0800-77c8d18762d3:61796743''
> 131017 22:24:53 [Note] WSREP: sst_donor_thread signaled with 0
> WSREP_SST: [INFO] Streaming with xbstream (20131017 22:24:53.354)
> WSREP_SST: [INFO] Using socat as streamer (20131017 22:24:53.357)
> WSREP_SST: [INFO] Streaming GTID file before SST (20131017 22:24:53.368)
> WSREP_SST: [INFO] Evaluating xbstream -c ${INFO_FILE} | socat -u

> stdio TCP:10.1.10.203:4444 <http://10.1.10.203:4444>; RC=(

> ${PIPESTATUS[@]} ) (20131017 22:24:53.373)
> WSREP_SST: [INFO] Sleeping before data transfer for SST (20131017
> 22:24:53.381)
> 131017 22:24:53 [Note] WSREP: (5b7d9132-379c-11e3-ad45-7f6c0f23935f,

> 'tcp://10.1.10.201:4567 <http://10.1.10.201:4567>') reconnecting to
> 7716abfc-379c-11e3-99b0-ee61208245b2 (tcp://10.1.10.202:4567
> <http://10.1.10.202:4567>), attempt 0

>
> ==> ro.err <==
> 131017 22:24:53 [Note] WSREP: handshake failed, my group: 'mysqlro',
> peer group: 'mysqluser_data'
> 131017 22:24:55 [Note] WSREP: handshake failed, my group: 'mysqlro',
> peer group: 'mysqluser_data'
> 131017 22:24:56 [Note] WSREP: handshake failed, my group: 'mysqlro',
> peer group: 'mysqluser_data'
> 131017 22:24:58 [Note] WSREP: handshake failed, my group: 'mysqlro',
> peer group: 'mysqluser_data'
> 131017 22:24:59 [Note] WSREP: handshake failed, my group: 'mysqlro',
> peer group: 'mysqluser_data'
> 131017 22:25:01 [Note] WSREP: handshake failed, my group: 'mysqlro',
> peer group: 'mysqluser_data'
> 131017 22:25:02 [Note] WSREP: handshake failed, my group: 'mysqlro',
> peer group: 'mysqluser_data'
>
>
> On Node 5:
>
> 131017 22:26:31 [Note] WSREP: wsrep_notify_cmd is not defined,
> skipping notification.
> 131017 22:27:07 [Note] WSREP: (7716abfc-379c-11e3-99b0-ee61208245b2,

> 'tcp://10.1.10.203:4567 <http://10.1.10.203:4567>') reconnecting to
> 5b7d9132-379c-11e3-ad45-7f6c0f23935f (tcp://10.1.10.200:4567
> <http://10.1.10.200:4567>), attempt 90

> 131017 22:27:52 [Note] WSREP: (7716abfc-379c-11e3-99b0-ee61208245b2,

> 'tcp://10.1.10.203:4567 <http://10.1.10.203:4567>') reconnecting to
> 5b7d9132-379c-11e3-ad45-7f6c0f23935f (tcp://10.1.10.200:4567
> <http://10.1.10.200:4567>), attempt 120

> 131017 22:28:37 [Note] WSREP: (7716abfc-379c-11e3-99b0-ee61208245b2,

> 'tcp://10.1.10.203:4567 <http://10.1.10.203:4567>') reconnecting to
> 5b7d9132-379c-11e3-ad45-7f6c0f23935f (tcp://10.1.10.200:4567
> <http://10.1.10.200:4567>), attempt 150

> 131017 22:29:22 [Note] WSREP: (7716abfc-379c-11e3-99b0-ee61208245b2,

> 'tcp://10.1.10.203:4567 <http://10.1.10.203:4567>') reconnecting to
> 5b7d9132-379c-11e3-ad45-7f6c0f23935f (tcp://10.1.10.200:4567
> <http://10.1.10.200:4567>), attempt 180

>
>
> I cannot for the life of me figure out why node5/instance2 is trying

> to connect to 10.1.10.200:4567 <http://10.1.10.200:4567> which is

> node4/instance1.
>
> Any insight?
>
> Regards,
>
> Mike E.
>
>
>

> --
> You received this message because you are subscribed to the Google
> Groups "codership" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to codership-tea...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Michael Eklund

unread,

Oct 18, 2013, 10:59:45 AM10/18/13

to codersh...@googlegroups.com

It is my understanding that wsrep_node_incoming_address is supposed to take care of this:

wsrep_node_incoming_address

Usually wsrep_node_address is sufficient. But if the clients need to use another address to connect to the node (another NIC, NAT, etc.) it can be set here and will override wsrep_node_address value. Note that it is not necessarily the address to which mysqld binds to (it can bind to 0.0.0.0), but the address that the clients (or, possibly, load balancer) need to use to connect to the server."

It seems to be working as advertised on my Percona XtraDB Cluster (GPL) 5.5.31-23.7.5, Revision 438, wsrep_23.7.5.r3880 servers, but not on the new 5.5.33-55-log Percona XtraDB Cluster (GPL), wsrep_23.7.6.r3915 ones.

We have multiple clusters where this is working but these two new nodes are not.

MIke E.

Alexey Yurchenko

unread,

Oct 19, 2013, 4:27:16 AM10/19/13

to codersh...@googlegroups.com

On Friday, October 18, 2013 5:59:45 PM UTC+3, Michael Eklund wrote:

It is my understanding that wsrep_node_incoming_address is supposed to take care of this:

"
wsrep_node_incoming_address

This is for MySQL *client* connections. If *clients* need to connect at a different address. And it is very optional - the only effect of it is that this address gets into wsrep_incoming_addresses status variable.

Reply all

Reply to author

Forward

0 new messages