130131 17:21:31 [Note] WSREP: IST receiver using ssl130131 17:21:31 [Note] WSREP: Prepared IST receiver, listening at: ssl://<ec2_local_ip>:4568
130131 17:22:34 [ERROR] WSREP: IST failed: IST sender, failed to connect 'ssl://<ec2_local_ip>:4568': Connection timed out: 110 (Connection timed out) at galera/src/ist.cpp:Sender():628
From the receiver:130131 17:21:29 [Note] WSREP: Flow-control interval: [14, 28]130131 17:21:29 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 8745)130131 17:21:29 [Note] WSREP: State transfer required:Group state: b973ab86-61b9-11e2-0800-e1666f261698:8745Local state: b973ab86-61b9-11e2-0800-e1666f261698:8744130131 17:21:29 [Note] WSREP: New cluster view: global state: b973ab86-61b9-11e2-0800-e1666f261698:8745, view# 43: Primary, number of nodes: 3, my index: 0, protocol version 2130131 17:21:29 [Warning] WSREP: Gap in state sequence. Need state transfer.130131 17:21:31 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'joiner' --address '<ec2_external_ip>' --auth 'root:PASSWORDREDACTED' --datadir '/mnt/mysql/data/' --defaults-file '/local/mysql/etc/mysql/my.cnf' --parent '14945''130131 17:21:31 [Note] WSREP: Prepared SST request: xtrabackup|'<ec2_external_ip>:4444/xtrabackup_sst130131 17:21:31 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.130131 17:21:31 [Note] WSREP: Assign initial position for certification: 8745, protocol version: 2
130131 17:21:31 [Note] WSREP: IST receiver using ssl
130131 17:21:31 [Note] WSREP: Prepared IST receiver, listening at: ssl://'<ec2_local_ip>:4568130131 17:21:31 [Note] WSREP: Node 0 (<requestor_node>) requested state transfer from '*any*'. Selected 1 (<donor_node>)(SYNCED) as donor.130131 17:21:31 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 8745)130131 17:21:31 [Note] WSREP: Requesting state transfer: success, donor: 1130131 17:21:31 [Note] WSREP: SST complete, seqno: 8744...130131 17:21:34 [Note] /usr/sbin/mysqld: ready for connections.130131 17:22:34 [Warning] WSREP: 1 (<donor_node>): State transfer to 0 (<requestor_node>) failed: -110 (Connection timed out)130131 17:22:34 [ERROR] WSREP: gcs/src/gcs_group.c:gcs_group_handle_join_msg():712: Will never receive state. Need to abort.130131 17:22:34 [Note] WSREP: gcomm: terminating thread130131 17:22:34 [Note] WSREP: gcomm: joining thread130131 17:22:34 [Note] WSREP: gcomm: closing backendFrom the donor:130131 17:21:29 [Note] WSREP: Flow-control interval: [14, 28]130131 17:21:29 [Note] WSREP: New cluster view: global state: b973ab86-61b9-11e2-0800-e1666f261698:8745, view# 43: Primary, number of nodes: 3, my index: 1, protocol version 2130131 17:21:29 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.130131 17:21:29 [Note] WSREP: Assign initial position for certification: 8745, protocol version: 2130131 17:21:31 [Note] WSREP: Node 0 (<requestor_node>) requested state transfer from '*any*'. Selected 1 (<donor_node>)(SYNCED) as donor.130131 17:21:31 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 8745)130131 17:21:31 [Note] WSREP: IST request: b973ab86-61b9-11e2-0800-e1666f261698:8744-8745|ssl://<requestor_ec2_local_ip>:4568130131 17:21:31 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.130131 17:21:31 [Note] WSREP: Running: 'wsrep_sst_xtrabackup --role 'donor' --address '/<requestor_ec2_external_ip>:4444/xtrabackup_sst' --auth 'root:PASSWORDREDACTED' --socket '/var/run/mysqld/mysqld.sock' --datadir '/mnt/mysql/data/' --defaults-file '/local/mysql/etc/mysql/my.cnf' --gtid 'b973ab86-61b9-11e2-0800-e1666f261698:8744' --bypass'130131 17:21:31 [Note] WSREP: sst_donor_thread signaled with 0130131 17:21:31 [Note] WSREP: IST sender using ssl130131 17:22:34 [ERROR] WSREP: IST failed: IST sender, failed to connect 'ssl:///<requestor_ec2_local_ip>:4568': Connection timed out: 110 (Connection timed out)at galera/src/ist.cpp:Sender():628130131 17:22:34 [Warning] WSREP: 1 (/<donor_node>): State transfer to 0 (/<requestor_node>) failed: -110 (Connection timed out)130131 17:22:34 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 8745)Thank you,Joshua
Hey folks.I have an XtraDB Cluster in AWS.
Each cluster node is fronted by an Amazon Elastic IP. Thus, the wsrep_cluster_address and wsrep_node_address are represented by internet routable addresses, the ec2_public_ipv4,These, in turn, are mapped by Amazon to the nodes ec2_private_ipv4, each node's regional LAN address.
This was done to allow nodes in different Amazon regions to communicate with each other globally (using SSL communication of course).The cluster, in both dev and production environments, is working well:
- SST instantiation of new nodes works just fine with xtrabackup across regions.
- All nodes are in sync globally.
If, however, a node is taken offline for any reason, upon a restart we are seeing:
30131 3:04:19 [Warning] WSREP: Failed to prepare for incremental state transfer: Failed to open IST listener at ssl://ec2_public_ipv4:4568', asio error 'Cannot assign requested address': 99 (Cannot assign requested address)
at galera/src/ist.cpp:prepare():309. IST will be unavailable.
What I believe this means:
- The node is trying to listen on its ec2_public_ipv4:4568 which is the Elastic IP, not local, and thus the node can't bind to it.
- I want the node to listen on its ec2_private_ipv4:4568 which is its local LAN address.
- I want an IST sender to continue to identify this node on its ec2_public_ipv4:4568 such that IST occurs just as Galera replication does, on the Elastic IPs, transparently NAT'd to the nodes local LAN IPs.
I thought the wsrep_sst_receive_address might provide for this, but in my testing it does not appear to. Is there something similar to a wsrep_ist_receive_address that needs to be set?Cheers,Joshua