Hi Alex
I've tried both options and this is what happened:
1. Setting wsrep_provider='none':
Node get disconnected, and functional. But when i set again the wsrep_provider with the path to libgalera_smm.so, the node get stalled in initialized state.
Node Log:
120510 14:33:41 [Note] WSREP: Stop replication
120510 14:33:43 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
120510 14:33:43 [Note] WSREP: wsrep_load(): Galera 2.1dev(r109) by Codership Oy <in...@codership.com> loaded succesfully.
120510 14:33:43 [Note] WSREP: Preallocating 134219048/134219048 bytes in '/vol01/var//galera.cache'...
120510 14:33:43 [Note] WSREP: Passing config to GCS: gcache.dir = /vol01/var/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /vol01/var//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3
And nothing else happened
2. Setting wsrep_cluster_address='gcomm://'
Node get disconnected and boost a new cluster with it as only member. OK
I did some inserts on the "new" cluster and a delete on the "old" cluster (with 2 node as members). The rows deleted also exists on the disconnected node, but due that the node isn't in the old cluster, the rows keep existing on it. OK
Then, i restored wsrep_cluster_address value to the original one. The node joined the cluster with no problems but data never get synced: The rows i deleted on the old cluster (and that was present on disconnected node) still available on the rejoined node.
Still, the joined node can perform selects and new inserts with no problem.
But, when i did the same delete on the rejoined node, the entirely cluster fail (because the classic row replication error HA_ERR_KEY_NOT_FOUND) and the 2 nodes that was never being disconnected from original cluster ask for SST. In other words: it was like a new cluster was boostraped, with the aggravating that SST on one node failed, due Resource temporarily unavailable. SST method: Xtrabackup.
So,
i stayed with a single cluster, but with a single node.
Log of one of the nodes from original cluster:
120510 14:51:19 [Note] WSREP: Flow-control interval: [12, 23]
120510 14:51:19 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.dani; Can't find record in 'dani', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1085, Error_code: 1032
120510 14:51:19 [Warning] WSREP: RBR event 2 Delete_rows apply warning: 120, 8728
120510 14:51:19 [ERROR] WSREP: Failed to apply trx: source: 1c0e4463-9aaf-11e1-0800-499c4e2eb871 version: 2 local: 0 state: CERTIFYING flags: 1 conn_id: 4 trx_id: 51510 seqnos (l: 8768, g: 8728, s: 8727, d: 8721, ts: 1336661498271159426)
120510 14:51:19 [ERROR] WSREP: Failed to apply app buffer: �իO , seqno: 8728, status: WSREP_FATAL
at galera/src/replicator_smm.cpp:apply_wscoll():51
at galera/src/replicator_smm.cpp:apply_trx_ws():122
120510 14:51:19 [ERROR] WSREP: Node consistency compromized, aborting...
120510 14:51:19 [Note] WSREP: Closing send monitor...
120510 14:51:19 [Note] WSREP: Closed send monitor.
120510 14:51:19 [Note] WSREP: gcomm: terminating thread
120510 14:51:19 [Note] WSREP: gcomm: joining thread
120510 14:51:19 [Note] WSREP: gcomm: closing backend
120510 14:51:19 [Note] WSREP: view(view_id(NON_PRIM,1c0e4463-9aaf-11e1-0800-499c4e2eb871,105) memb {
719b79c6-9954-11e1-0800-f06b656b08da,
} joined {
} left {
} partitioned {
1c0e4463-9aaf-11e1-0800-499c4e2eb871,
})
120510 14:51:19 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
120510 14:51:19 [Note] WSREP: view((empty))
120510 14:51:19 [Note] WSREP: gcomm: closed
120510 14:51:19 [Note] WSREP: Flow-control interval: [8, 16]
120510 14:51:19 [Note] WSREP: Received NON-PRIMARY.
120510 14:51:19 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 8728)
120510 14:51:19 [Note] WSREP: Received self-leave message.
120510 14:51:19 [Note] WSREP: Flow-control interval: [0, 0]
120510 14:51:19 [Note] WSREP: Received SELF-LEAVE. Closing connection.
120510 14:51:19 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 8728)
120510 14:51:19 [Note] WSREP: RECV thread exiting 0: Success
120510 14:51:19 [Note] WSREP: recv_thread() joined.
120510 14:51:19 [Note] WSREP: Closing slave action queue.
120510 14:51:19 [Note] WSREP: /usr/sbin/mysqld: Terminated.
120510 14:51:19 mysqld_safe Number of processes running now: 0
120510 14:51:19 mysqld_safe mysqld restarted
My question is: When the node re join the cluster, this (the cluster) shouldn't realize that the sequence number on the joiner node (and i suppose, the UUID also different) in grastate.dat is different and request an SST?
Thank you!
Regards
Daniel