Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
temporarily remove server from cluster
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  6 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Andy Thompson  
View profile  
 More options May 9 2012, 12:48 pm
From: "Andy Thompson" <athomp...@mooreheadcomm.com>
Date: Wed, 09 May 2012 12:48:40 -0400
Local: Wed, May 9 2012 12:48 pm
Subject: temporarily remove server from cluster
If I want to temporarily remove a server from my cluster to test various processes against, can I just set wsrep_on=off, do my testing and then re enable it?  I can't recall if that will make the server stop responding entirely.  Or do I need to shut it off and set it up in it's own cluster for a short time and then add it back to the live cluster?  I just want to make sure if data is changed by some misfortune that those changes aren't replicated to the cluster.

thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Yurchenko  
View profile  
 More options May 9 2012, 4:40 pm
From: Alex Yurchenko <alexey.yurche...@codership.com>
Date: Wed, 09 May 2012 23:40:25 +0300
Local: Wed, May 9 2012 4:40 pm
Subject: Re: [codership-team] temporarily remove server from cluster
On 2012-05-09 19:48, Andy Thompson wrote:

> If I want to temporarily remove a server from my cluster to test
> various processes against, can I just set wsrep_on=off, do my testing
> and then re enable it?  I can't recall if that will make the server
> stop responding entirely.  Or do I need to shut it off and set it up
> in it's own cluster for a short time and then add it back to the live
> cluster?  I just want to make sure if data is changed by some
> misfortune that those changes aren't replicated to the cluster.

> thanks

Yes, you can set wsrep_on=off globally and nothing will be replicated
TO cluster. However the node will still receive and apply all events
FROM cluster.

If you want the node to disconnect from cluster entirely, you need to
- either start a separate cluster by setting
wsrep_cluster_address='gcomm://'
- or unload wsrep provider completely by setting wsrep_provider='none'

Regards,
Alex

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Mauricio Guzmán Burgos  
View profile  
 More options May 10 2012, 11:47 am
From: Daniel Mauricio Guzmán Burgos <neth...@gmail.com>
Date: Thu, 10 May 2012 12:47:13 -0300
Local: Thurs, May 10 2012 11:47 am
Subject: Re: [codership-team] temporarily remove server from cluster

Hi Alex

I've tried both options and this is what happened:

1. Setting wsrep_provider='none':

Node get disconnected, and functional. But when i set again the
wsrep_provider with the path to libgalera_smm.so, the node get stalled in
initialized state.

Node Log:
120510 14:33:41 [Note] WSREP: Stop replication
120510 14:33:43 [Note] WSREP: wsrep_load(): loading provider library
'/usr/lib64/libgalera_smm.so'
120510 14:33:43 [Note] WSREP: wsrep_load(): Galera 2.1dev(r109) by
Codership Oy <i...@codership.com> loaded succesfully.
120510 14:33:43 [Note] WSREP: Preallocating 134219048/134219048 bytes in
'/vol01/var//galera.cache'...
120510 14:33:43 [Note] WSREP: Passing config to GCS: gcache.dir =
/vol01/var/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name =
/vol01/var//galera.cache; gcache.page_size = 128M; gcache.size = 128M;
gcs.fc_debug = 0; gcs.fc_factor = 0.5; gcs.fc_limit = 16;
gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle =
0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit =
0.25; replicator.causal_read_timeout = PT30S; replicator.commit_order = 3

And nothing else happened

2. Setting wsrep_cluster_address='gcomm://'

Node get disconnected and boost a new cluster with it as only member. OK

I did some inserts on the "new" cluster and a delete on the "old" cluster
(with 2 node as members). The rows deleted also exists on the disconnected
node, but due that the node isn't in the old cluster, the rows keep
existing on it. OK

Then, i restored wsrep_cluster_address value to the original one. The node
joined the cluster with no problems but data never get synced: The rows i
deleted on the old cluster (and that was present on disconnected node)
still available on the rejoined node.
Still, the joined node can perform selects and new inserts with no problem.

But, when i did the same delete on the rejoined node, the entirely cluster
fail (because the classic row replication error HA_ERR_KEY_NOT_FOUND) and
the 2 nodes that was never being disconnected from original cluster ask for
SST. In other words: it was like a new cluster was boostraped, with the
aggravating that SST on one node failed, due Resource temporarily
unavailable. SST method: Xtrabackup.
So, i stayed with a single cluster, but with a single node.

Log of one of the nodes from original cluster:

120510 14:51:19 [Note] WSREP: Flow-control interval: [12, 23]
120510 14:51:19 [ERROR] Slave SQL: Could not execute Delete_rows event on
table test.dani; Can't find record in 'dani', Error_code: 1032; handler
error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 1085,
Error_code: 1032
120510 14:51:19 [Warning] WSREP: RBR event 2 Delete_rows apply warning:
120, 8728
120510 14:51:19 [ERROR] WSREP: Failed to apply trx: source:
1c0e4463-9aaf-11e1-0800-499c4e2eb871 version: 2 local: 0 state: CERTIFYING
flags: 1 conn_id: 4 trx_id: 51510 seqnos (l: 8768, g: 8728, s: 8727, d:
8721, ts: 1336661498271159426)
120510 14:51:19 [ERROR] WSREP: Failed to apply app buffer: իO , seqno:
8728, status: WSREP_FATAL
     at galera/src/replicator_smm.cpp:apply_wscoll():51
     at galera/src/replicator_smm.cpp:apply_trx_ws():122
120510 14:51:19 [ERROR] WSREP: Node consistency compromized, aborting...
120510 14:51:19 [Note] WSREP: Closing send monitor...
120510 14:51:19 [Note] WSREP: Closed send monitor.
120510 14:51:19 [Note] WSREP: gcomm: terminating thread
120510 14:51:19 [Note] WSREP: gcomm: joining thread
120510 14:51:19 [Note] WSREP: gcomm: closing backend
120510 14:51:19 [Note] WSREP:
view(view_id(NON_PRIM,1c0e4463-9aaf-11e1-0800-499c4e2eb871,105) memb {
    719b79c6-9954-11e1-0800-f06b656b08da,

} joined {
} left {
} partitioned {

    1c0e4463-9aaf-11e1-0800-499c4e2eb871,
})

120510 14:51:19 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no,
my_idx = 0, memb_num = 1
120510 14:51:19 [Note] WSREP: view((empty))
120510 14:51:19 [Note] WSREP: gcomm: closed
120510 14:51:19 [Note] WSREP: Flow-control interval: [8, 16]
120510 14:51:19 [Note] WSREP: Received NON-PRIMARY.
120510 14:51:19 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 8728)
120510 14:51:19 [Note] WSREP: Received self-leave message.
120510 14:51:19 [Note] WSREP: Flow-control interval: [0, 0]
120510 14:51:19 [Note] WSREP: Received SELF-LEAVE. Closing connection.
120510 14:51:19 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 8728)
120510 14:51:19 [Note] WSREP: RECV thread exiting 0: Success
120510 14:51:19 [Note] WSREP: recv_thread() joined.
120510 14:51:19 [Note] WSREP: Closing slave action queue.
120510 14:51:19 [Note] WSREP: /usr/sbin/mysqld: Terminated.
120510 14:51:19 mysqld_safe Number of processes running now: 0
120510 14:51:19 mysqld_safe mysqld restarted

My question is: When the node re join the cluster, this (the cluster)
shouldn't realize that the sequence number on the joiner node (and i
suppose, the UUID also different) in grastate.dat is different and request
an SST?

Thank you!

Regards
Daniel

2012/5/9 Alex Yurchenko <alexey.yurche...@codership.com>


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Yurchenko  
View profile  
 More options May 10 2012, 2:09 pm
From: Alex Yurchenko <alexey.yurche...@codership.com>
Date: Thu, 10 May 2012 21:09:46 +0300
Local: Thurs, May 10 2012 2:09 pm
Subject: Re: [codership-team] temporarily remove server from cluster
Hi Daniel,

On 2012-05-10 18:47, Daniel Mauricio Guzmán Burgos wrote:

Alright, you didn't tell the new Galera instance where to connect to.
It is waiting for wsrep_cluster_address.

The problem here is that at no point there UUID becomes different,
sorry, I must have foreseen that. I guess setting wsrep_on=0 globally
would have been the safest option. There is an idea to generate a new
UUID every time wsrep_cluster_address is set to gcomm://, but that will
break GTID continuity between cluster restarts. And that may be not what
we want.

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Andy Thompson  
View profile  
 More options May 11 2012, 12:32 pm
From: "Andy Thompson" <athomp...@mooreheadcomm.com>
Date: Fri, 11 May 2012 12:32:17 -0400
Local: Fri, May 11 2012 12:32 pm
Subject: Re: [codership-team] temporarily remove server from cluster
So is the "cleanest" way to put the node back in the cluster to just
shut it down and restart it with a correct gcomm:// address?

-andy

>>> On 5/10/2012 at 02:09 PM, in message

<d85517f33452e768a6ddd14a8db19...@codership.com>, Alex Yurchenko

replicator.commit_order  

HA_ERR_KEY_NOT_FOUND)  


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Yurchenko  
View profile  
 More options May 11 2012, 1:06 pm
From: Alex Yurchenko <alexey.yurche...@codership.com>
Date: Fri, 11 May 2012 20:06:18 +0300
Local: Fri, May 11 2012 1:06 pm
Subject: Re: [codership-team] temporarily remove server from cluster
On 2012-05-11 19:32, Andy Thompson wrote:

> So is the "cleanest" way to put the node back in the cluster to just
> shut it down and restart it with a correct gcomm:// address?

The "cleanest" would also include deletion of grastate.dat file after
shutting the server down.

--
Alexey Yurchenko,
Codership Oy, www.codership.com
Skype: alexey.yurchenko, Phone: +358-400-516-011

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »