Hi this is my first post to the group and i am hoping to find some answers to my questions, i apologize for a long post, but i think if i give you all the details then debugging will be easier..
So here is a detailed description of the issue
Server Version : ubuntu 10.04 LTS percona version: 5.5.24-55-log Percona XtraDB Cluster (GPL), wsrep_23.6.r341
++++++++++++++++++++++++++++ # This was formally known as [safe_mysqld]. Both versions are currently parsed. [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 wsrep_urls = gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m://
Question1: How can i change the rsync process to use private IP instead of public IP?
4. Once the sync is completed on node 3, the clustercheck still shows that the node is down and node is not usable as a cluster node 5. Then i have to issue sudo service mysql stop and tthen sudo /etc/init.d/mysql start and it says database failed to start but the rsync process starts and after the process is completed node3 becomes a part of the cluster
Question2: How can i change the mysql process to start using /etc/init.dmysq instead of service mysql start during the boot time.?
Question3: if node1 becomes a donor it stops accepting connections which make the application unusable, once suggestion is to add +if [ "$WSSREP_STATUS" == "4" ] || [ "$WSSREP_STATUS" == "2" ] in the cluster check, but doing that how accurate is the data during the rsync or should i be using xtrabackup?
Question4: how do i configure the nodes to use incremental to avoid this error? 120807 11:48:00 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (afc4ea7d-dc5e-11e1-0800-0616c529eebe): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be unavailable.
I have many more questions as i go on and test the configuration but if someone can answer these, i think i can clear a lot of my doubts...
> 4. Once the sync is completed on node 3, the clustercheck still shows that the node is down and node is not usable as a cluster node
> 5. Then i have to issue sudo service mysql stop and tthen sudo /etc/init.d/mysql start and it says database failed to start but the rsync process starts and after the process is completed node3 becomes a part of the cluster
> Question2: How can i change the mysql process to start using /etc/init.dmysq instead of service mysql start during the boot time.?
> Question3: if node1 becomes a donor it stops accepting connections which make the application unusable, once suggestion is to add +if [ "$WSSREP_STATUS" == "4" ] || [ "$WSSREP_STATUS" == "2" ] in the cluster check, but doing that how accurate is the data during the rsync or should i be using xtrabackup?
The rsync will flush tables and pause replication on the donor node while the rsync is copying. The xtrabackup method allows for replication to continue during the donation, but it does briefly block on replication right at the end of the data copy.
> Question4: how do i configure the nodes to use incremental to avoid this error?
A shortcut here is using the undocumented wsrep_node_address setting, which sets the listen, ist, and sst addresses automatically if they are all on the same IP.
Thank Jay..you suggestions were indeed helpful... and yes i will investigate in detail about the mysql startup and then create a bug request another thing to note here is that is it because the wsrep_urls are in the mysqld_safe section that must be causing this? since the database has to be started in the mysqld_safe mode?
can that parameter be shifted to mysqld section? or is their another variable that i can use like e.g wsrep_cluster_address=gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm: //10.1.3.101:4567,gcomm://
So after your suggestions i noticed that the if the db is restarted the node becomes available very soon, is it due to the IST taking effect? here are my new parameter *[mysqld_safe]* *socket = /var/run/mysqld/mysqld.sock* *nice = 0* *wsrep_urls = gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m:// * * * *[mysqld]* *#* *# * Basic Settings* *#* *server_id=1* *binlog_format=ROW * *wsrep_provider=/usr/lib64/libgalera_smm.so * *wsrep_slave_threads=2 * *wsrep_cluster_name=dev_cluster * *wsrep_sst_method=xtrabackup # changed to xtrabackup from rsycn inorder to use IST* *wsrep_node_name=node1 * *innodb_locks_unsafe_for_binlog=1 * *innodb_autoinc_lock_mode=2* *log_slave_updates* *wsrep_replicate_myisam=1* *wsrep_sst_receive_address=10.1.6.118 # **i believe this should be the private ip of this node**?* *wsrep_provider_options = "gmcast.listen_addr=tcp://0.0.0.0:4567; ist.recv_addr=10.1.6.118:4568; " # i believe this should be the private ip of this node?*
> 4. Once the sync is completed on node 3, the clustercheck still > shows that the node is down and node is not usable as a cluster node > 5. Then i have to issue sudo service mysql stop and tthen sudo > /etc/init.d/mysql start and it says database failed to start but the > rsync process starts and after the process is completed node3 becomes a > part of the cluster
> Question2: How can i change the mysql process to start using > /etc/init.dmysq instead of service mysql start during the boot time.?
> Question3: if node1 becomes a donor it stops accepting connections which > make the application unusable, once suggestion is to add +if [ > "$WSSREP_STATUS" == "4" ] || [ "$WSSREP_STATUS" == "2" ] in the cluster > check, but doing that how accurate is the data during the rsync or should i > be using xtrabackup?
> The rsync will flush tables and pause replication on the donor node while > the rsync is copying. The xtrabackup method allows for replication to > continue during the donation, but it does briefly block on replication > right at the end of the data copy.
> Question4: how do i configure the nodes to use incremental to avoid this > error?
> A shortcut here is using the undocumented wsrep_node_address setting, > which sets the listen, ist, and sst addresses automatically if they are all > on the same IP.
On Aug 8, 2012, at 11:39 AM, amol <ajke...@gmail.com> wrote:
> Thank Jay..you suggestions were indeed helpful... and yes i will investigate in detail about the mysql startup and then create a bug request
> another thing to note here is that is it because the wsrep_urls are in the mysqld_safe section that must be causing this? since the database has to be started in the mysqld_safe mode?
It's possible, yes.
> can that parameter be shifted to mysqld section? or is their another variable that i can use like e.g
> wsrep_cluster_address=gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm: //10.1.3.101:4567,gcomm://
No, you cannot. wsrep_urls is the only variable that supports multiple gcomm:// addresses, and it's really just a bit of a hack that finds an open port in the list and passes that to the mysqld as the wsrep_cluster_address for you.
Again, I'd defer to either filing a bug or getting involved in any discussion on an existing bug (if exists).
> So after your suggestions i noticed that the if the db is restarted the node becomes available very soon, is it due to the IST taking effect?
Check the log, it should tell you when IST or SST is happening.
> here are my new parameter > [mysqld_safe]
> socket = /var/run/mysqld/mysqld.sock
> nice = 0
> wsrep_urls = gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m://
> [mysqld]
> #
> # * Basic Settings
> #
> server_id=1
> binlog_format=ROW > wsrep_provider=/usr/lib64/libgalera_smm.so > wsrep_slave_threads=2 > wsrep_cluster_name=dev_cluster
> wsrep_sst_method=xtrabackup # changed to xtrabackup from rsycn inorder to use IST
This has nothing to do with IST.
> wsrep_node_name=node1 > innodb_locks_unsafe_for_binlog=1 > innodb_autoinc_lock_mode=2
> log_slave_updates
> wsrep_replicate_myisam=1
> wsrep_sst_receive_address=10.1.6.118 # i believe this should be the private ip of this node?
> wsrep_provider_options = "gmcast.listen_addr=tcp://0.0.0.0:4567; ist.recv_addr=10.1.6.118:4568; " # i believe this should be the private ip of this node?
What IP you run SST and IST on is up to you and your environment.
Hi Jay, thanks for the answers...another question is..this might be a side track, so let me know if should open a new thread for this....
we were running some load tests on the entire setup which has (1 haproxy lb + 3 nodes) and i am noticing that after a few connections the scripts stop running with the error "Error connecting to mysql" and it starts back after a while.. i checked the innotop and did not see any locks or deadlocks in the db node, plus i am just running 1 thread at a time so i don't think it should be using too many connections.. but i wasn't sure whether any of the system user connections is causing the db to lock down? here is what my process list looks while the load test is running
On Wednesday, August 8, 2012 1:49:56 PM UTC-4, Jay Janssen wrote:
> On Aug 8, 2012, at 11:39 AM, amol <ajk...@gmail.com <javascript:>> wrote:
> Thank Jay..you suggestions were indeed helpful... and yes i will > investigate in detail about the mysql startup and then create a bug request > another thing to note here is that is it because the wsrep_urls are in > the mysqld_safe section that must be causing this? since the database has > to be started in the mysqld_safe mode?
> It's possible, yes.
> can that parameter be shifted to mysqld section? or is their another > variable that i can use like e.g > wsrep_cluster_address= > gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m://
> No, you cannot. wsrep_urls is the only variable that supports multiple > gcomm:// addresses, and it's really just a bit of a hack that finds an open > port in the list and passes that to the mysqld as the wsrep_cluster_address > for you.
> Again, I'd defer to either filing a bug or getting involved in any > discussion on an existing bug (if exists).
> So after your suggestions i noticed that the if the db is restarted the > node becomes available very soon, is it due to the IST taking effect?
> Check the log, it should tell you when IST or SST is happening.
> here are my new parameter > *[mysqld_safe]* > *socket = /var/run/mysqld/mysqld.sock* > *nice = 0* > *wsrep_urls = > gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m:// > * > * > * > *[mysqld]* > *#* > *# * Basic Settings* > *#* > *server_id=1* > *binlog_format=ROW * > *wsrep_provider=/usr/lib64/libgalera_smm.so * > *wsrep_slave_threads=2 * > *wsrep_cluster_name=dev_cluster * > *wsrep_sst_method=xtrabackup # changed to xtrabackup from rsycn > inorder to use IST*
> This has nothing to do with IST.
> *wsrep_node_name=node1 * > *innodb_locks_unsafe_for_binlog=1 * > *innodb_autoinc_lock_mode=2* > *log_slave_updates* > *wsrep_replicate_myisam=1* > *wsrep_sst_receive_address=10.1.6.118 # **i believe this should be the > private ip of this node**?* > *wsrep_provider_options = "gmcast.listen_addr=tcp://0.0.0.0:4567; > ist.recv_addr=10.1.6.118:4568; " # i believe this should be the private > ip of this node?*
> What IP you run SST and IST on is up to you and your environment.
Amol,
You should try to check exactly where the test scripts are failing (is it on connect?, is it on a query?, etc.) and, if possible, see if there is a more precise mysql error code associated with the problem. Are your scripts reconnecting every time they query?
The system users there seem normal.
On Aug 8, 2012, at 2:42 PM, amol <ajke...@gmail.com> wrote:
> Hi Jay, thanks for the answers...another question is..this might be a side track, so let me know if should open a new thread for this....
> we were running some load tests on the entire setup which has (1 haproxy lb + 3 nodes) and i am noticing that after a few connections the scripts stop running with the error "Error connecting to mysql"
> and it starts back after a while..
> i checked the innotop and did not see any locks or deadlocks in the db node, plus i am just running 1 thread at a time so i don't think it should be using too many connections..
> but i wasn't sure whether any of the system user connections is causing the db to lock down?
> here is what my process list looks while the load test is running
On Wednesday, August 8, 2012 2:50:37 PM UTC-4, Jay Janssen wrote:
> Amol, > You should try to check exactly where the test scripts are failing (is > it on connect?, is it on a query?, etc.) and, if possible, see if there is > a more precise mysql error code associated with the problem. Are your > scripts reconnecting every time they query?
> The system users there seem normal.
> On Aug 8, 2012, at 2:42 PM, amol <ajk...@gmail.com <javascript:>> wrote:
> Hi Jay, thanks for the answers...another question is..this might be a side > track, so let me know if should open a new thread for this....
> we were running some load tests on the entire setup which has (1 haproxy > lb + 3 nodes) and i am noticing that after a few connections the scripts > stop running with the error "Error connecting to mysql" > and it starts back after a while.. > i checked the innotop and did not see any locks or deadlocks in the db > node, plus i am just running 1 thread at a time so i don't think it should > be using too many connections.. > but i wasn't sure whether any of the system user connections is causing > the db to lock down? > here is what my process list looks while the load test is running
> mysql> show full processlist;
> +--------+-------------+--------------------+---------+---------+-------+-- ------------------+-----------------------+-----------+---------------+---- -------+ > | Id | User | Host | db | Command | Time | > State | Info | Rows_sent | Rows_examined | > Rows_read |
connecting to haproxy.. and here is the config i used in haproxy to avoid lock conflicts
backend pxc-onenode-back mode tcp balance leastconn option httpchk server c2 10.1.3.3:3306 check port 9200 inter 12000 rise 3 fall 3 server c1 10.1.6.8:3306 check port 9200 inter 12000 rise 3 fall 3 backup server c3 10.1.3.1:3306 check port 9200 inter 12000 rise 3 fall 3 backup
and i also tried connection to server c2 directly and when during the load test i got similar errors...
On Aug 8, 2012, at 3:16 PM, amol <ajke...@gmail.com> wrote:
> connecting to haproxy.. > and here is the config i used in haproxy to avoid lock conflicts
> backend pxc-onenode-back
> mode tcp
> balance leastconn
> option httpchk
> server c2 10.1.3.3:3306 check port 9200 inter 12000 rise 3 fall 3
> server c1 10.1.6.8:3306 check port 9200 inter 12000 rise 3 fall 3 backup
> server c3 10.1.3.1:3306 check port 9200 inter 12000 rise 3 fall 3 backup
> and i also tried connection to server c2 directly and when during the load test i got similar errors…
I'd look at the HA proxy dashboard to see if you can see any error counters increasing, likewise in c2 -- specifically things like 'aborted_connections' and so forth.
On Wednesday, August 8, 2012 3:21:32 PM UTC-4, Jay Janssen wrote:
> On Aug 8, 2012, at 3:16 PM, amol <ajk...@gmail.com <javascript:>> wrote:
> connecting to haproxy.. > and here is the config i used in haproxy to avoid lock conflicts
> backend pxc-onenode-back
> mode tcp
> balance leastconn
> option httpchk
> server c2 10.1.3.3:3306 check port 9200 inter 12000 rise 3 fall 3
> server c1 10.1.6.8:3306 check port 9200 inter 12000 rise 3 fall 3 > backup
> server c3 10.1.3.1:3306 check port 9200 inter 12000 rise 3 fall 3 > backup
> and i also tried connection to server c2 directly and when during the load > test i got similar errors…
> I'd look at the HA proxy dashboard to see if you can see any error > counters increasing, likewise in c2 -- specifically things like > 'aborted_connections' and so forth.
On Wednesday, August 8, 2012 3:21:32 PM UTC-4, Jay Janssen wrote:
> On Aug 8, 2012, at 3:16 PM, amol <ajk...@gmail.com <javascript:>> wrote:
> connecting to haproxy.. > and here is the config i used in haproxy to avoid lock conflicts
> backend pxc-onenode-back
> mode tcp
> balance leastconn
> option httpchk
> server c2 10.1.3.3:3306 check port 9200 inter 12000 rise 3 fall 3
> server c1 10.1.6.8:3306 check port 9200 inter 12000 rise 3 fall 3 > backup
> server c3 10.1.3.1:3306 check port 9200 inter 12000 rise 3 fall 3 > backup
> and i also tried connection to server c2 directly and when during the load > test i got similar errors…
> I'd look at the HA proxy dashboard to see if you can see any error > counters increasing, likewise in c2 -- specifically things like > 'aborted_connections' and so forth.
this is after all the changes i did earlier in the day on node 1 my.cnf for IST +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++ [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 wsrep_urls = gcomm://10.1.6.3:4567,gcomm://10.1.3.1:4567,gcomm://10.1.6.8:4567,gcomm://
On Tuesday, August 7, 2012 1:03:48 PM UTC-4, amol wrote:
> Hi this is my first post to the group and i am hoping to find some answers > to my questions, i apologize for a long post, but i think if i give you all > the details then debugging will be easier..
> So here is a detailed description of the issue
> Server Version : ubuntu 10.04 LTS > percona version: 5.5.24-55-log Percona XtraDB Cluster (GPL), > wsrep_23.6.r341
> ++++++++++++++++++++++++++++ > # This was formally known as [safe_mysqld]. Both versions are currently > parsed. > [mysqld_safe] > socket = /var/run/mysqld/mysqld.sock > nice = 0 > wsrep_urls = gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm:// > 10.1.3.101:4567,gcomm://
> Question1: How can i change the rsync process to use private IP instead of > public IP?
> 4. Once the sync is completed on node 3, the clustercheck still > shows that the node is down and node is not usable as a cluster node > 5. Then i have to issue sudo service mysql stop and tthen sudo > /etc/init.d/mysql start and it says database failed to start but the > rsync process starts and after the process is completed node3 becomes a > part of the cluster
> Question2: How can i change the mysql process to start using > /etc/init.dmysq instead of service mysql start during the boot time.?
> Question3: if node1 becomes a donor it stops accepting connections which > make the application unusable, once suggestion is to add +if [ > "$WSSREP_STATUS" == "4" ] || [ "$WSSREP_STATUS" == "2" ] in the cluster > check, but doing that how accurate is the data during the rsync or should i > be using xtrabackup?
> Question4: how do i configure the nodes to use incremental to avoid this > error? > 120807 11:48:00 [Warning] WSREP: Failed to prepare for incremental state > transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not > match group state UUID (afc4ea7d-dc5e-11e1-0800-0616c529eebe): 1 (Operation > not permitted) > at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will > be unavailable.
> I have many more questions as i go on and test the configuration but if > someone can answer these, i think i can clear a lot of my doubts...
Something about how you have SST configured is causing the ultimate problem here.
I can't say why the local state was reset to all zeros on reboot, how was the machine restarted? If the local server had kept its state correctly, an IST should have been possible.
On Aug 8, 2012, at 4:54 PM, amol <ajke...@gmail.com> wrote:
> this is after all the changes i did earlier in the day on node 1 my.cnf for IST
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++
> [mysqld_safe]
> socket = /var/run/mysqld/mysqld.sock
> nice = 0
> wsrep_urls = gcomm://10.1.6.3:4567,gcomm://10.1.3.1:4567,gcomm://10.1.6.8:4567,gcomm://
> On Tuesday, August 7, 2012 1:03:48 PM UTC-4, amol wrote:
> Hi this is my first post to the group and i am hoping to find some answers to my questions, i apologize for a long post, but i think if i give you all the details then debugging will be easier..
> So here is a detailed description of the issue
> Server Version : ubuntu 10.04 LTS
> percona version: 5.5.24-55-log Percona XtraDB Cluster (GPL), wsrep_23.6.r341
> (my.cnf) in node 2 > ++++++++++++++++++++++++++++
> # This was formally known as [safe_mysqld]. Both versions are currently parsed.
> [mysqld_safe]
> socket = /var/run/mysqld/mysqld.sock
> nice = 0
> wsrep_urls = gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m://
> ++++++++++++++++++++++++++++
> # This was formally known as [safe_mysqld]. Both versions are currently parsed.
> [mysqld_safe]
> socket = /var/run/mysqld/mysqld.sock
> nice = 0
> wsrep_urls = gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m://
> Testing Scenario: Setup haproxy with node1 up and node2 and node3 as backup (so the connections always go to one node)
> When i reboot node 3: > node1 becomes the donor: wsrep_local_state_comment | Donor (+) > node2 is up and running > node3 comes back up and starts to sync > node3:~$ ps -ef | grep mysql
> mysql 2429 1 0 11:27 ? 00:00:00 /usr/sbin/mysqld
> root 2549 1 0 11:27 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe
> mysql 3031 2549 0 11:27 ? 00:00:00 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin --user=mysql --log-error=/var/log/mysql/error.log --pid-file=/var/lib/mysql/dev-db-node3.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --wsrep_cluster_address=gcomm://10.1.6.118:4567
> mysql 3188 3031 0 11:27 ? 00:00:00 sh -c wsrep_sst_rsync 'joiner' '<public_ip_node3>' '' '/var/lib/mysql/' '/etc/mysql/conf.d/mysqld_safe_syslog.cnf' '3031' 2>sst.err
> mysql 3189 3188 0 11:27 ? 00:00:01 /bin/bash -ue /usr//bin/wsrep_sst_rsync joiner <public_ip_node3> /var/lib/mysql/ /etc/mysql/conf.d/mysqld_safe_syslog.cnf 3031
> mysql 3203 1 0 11:27 ? 00:00:00 rsync --daemon --port 4444 --config /var/lib/mysql//rsync_sst.conf
> mysql 3243 3203 0 11:27 ? 00:00:00 rsync --daemon --port 4444 --config /var/lib/mysql//rsync_sst.conf
> mysql 3248 3243 1 11:27 ? 00:00:08 rsync --daemon --port 4444 --config /var/lib/mysql//rsync_sst.conf
> mysql 5279 3189 0 11:35 ? 00:00:00 sleep 1
> akedar 5281 3771 0 11:35 pts/0 00:00:00 grep --color=auto mysql
> node3:~$
> Question1: How can i change the rsync process to use private IP instead of public IP?
> 4. Once the sync is completed on node 3, the clustercheck still shows that the node is down and node is not usable as a cluster node
> 5. Then i have to issue sudo service mysql stop and tthen sudo /etc/init.d/mysql start and it says database failed to start but the rsync process starts and after the process is completed node3 becomes a part of the cluster
> Question2: How can i change the mysql process to start using /etc/init.dmysq instead of service mysql start during the boot time.?
> Question3: if node1 becomes a donor it stops accepting connections which make the application unusable, once suggestion is to add +if [ "$WSSREP_STATUS" == "4" ] || [ "$WSSREP_STATUS" == "2" ] in the cluster check, but doing that how accurate is the data during the rsync or should i be using xtrabackup?
> Question4: how do i configure the nodes to use incremental to avoid this error?
> 120807 11:48:00 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (afc4ea7d-dc5e-11e1-0800-0616c529eebe): 1 (Operation
and after some search if found this error on the donor node
innobackupex: Error: mysql child process has died: ERROR 1045 (28000): Access denied for user 'mysql'@'localhost' (using password: NO)
So i resolved that error by creating a user..
grant process on *.* to 'mysql'@'localhost' identified by ''; flush privileges;
and then on server reboot i see that the donor was a different node and it shows this error....
innobackupex: Error: mysql child process has died: ERROR 1044 (42000) at line 3: Access denied for user 'mysql'@'localhost' to database 'mysql' while waiting for reply to MySQL request: 'USE mysql;' at /usr//bin/innobackupex line 374.
now i see that mysql user needs more privileges..so i have granted all privileges to mysql..so now i have to try getting the node backup using SST and then try the reboot
On Thursday, August 9, 2012 8:00:35 AM UTC-4, Jay Janssen wrote:
> Something about how you have SST configured is causing the ultimate > problem here.
> I can't say why the local state was reset to all zeros on reboot, how was > the machine restarted? If the local server had kept its state correctly, > an IST should have been possible.
> On Aug 8, 2012, at 4:54 PM, amol <ajk...@gmail.com <javascript:>> wrote:
> now when i reboot the node1 the database does not start (even after using > /etc/init.d/mysql start)
> and i see these error in mysql/error.log
> 120808 16:39:19 [Note] WSREP: Flow-control interval: [14, 28] > 120808 16:39:19 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 151412) > 120808 16:39:19 [Note] WSREP: State transfer required: > Group state: afc4ea7d-dc5e-11e1-0800-0616c529eebe:151412 > Local state: 00000000-0000-0000-0000-000000000000:-1 > 120808 16:39:19 [Note] WSREP: New cluster view: global state: > afc4ea7d-dc5e-11e1-0800-0616c529eebe:151412, view# 31: Primary, number of > nodes: 3, my index: 0, protocol version 2 > 120808 16:39:19 [Warning] WSREP: Gap in state sequence. Need state > transfer. > 120808 16:39:21 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' > '10.1.6.8' '' '/var/lib/mysql/' '/etc/mysql/conf.d/mysqld_safe_syslog.cnf' > '4411' 2>sst.err' > 120808 16:39:21 [Note] WSREP: Prepared SST request: xtrabackup| > 10.1.6.8:4444/xtrabackup_sst > 120808 16:39:21 [Note] WSREP: wsrep_notify_cmd is not defined, skipping > notification. > 120808 16:39:21 [Note] WSREP: Assign initial position for certification: > 151412, protocol version: 2 > 120808 16:39:21 [Warning] WSREP: Failed to prepare for incremental state > transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not > match group state UUID (afc4ea7d-dc5e-11e1-0800-0616c529eebe): 1 (Operation > not permitted) > at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be > unavailable. > 120808 16:39:21 [Note] WSREP: Node 0 (node1) requested state transfer from > '*any*'. Selected 1 (node2)(SYNCED) as donor. > 120808 16:39:21 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 151412) > 120808 16:39:21 [Note] WSREP: Requesting state transfer: success, donor: 1 > 120808 16:39:27 [ERROR] WSREP: Process completed with error: > wsrep_sst_xtrabackup 'joiner' '10.1.6.8' '' '/var/lib/mysql/' > '/etc/mysql/conf.d/mysqld_safe_syslog.cnf' '4411' 2>sst.err: 32 (Broken > pipe) > 120808 16:39:27 [ERROR] WSREP: Failed to read uuid:seqno from joiner > script. > 120808 16:39:27 [ERROR] WSREP: SST failed: 32 (Broken pipe) > 120808 16:39:27 [ERROR] Aborting
> On Tuesday, August 7, 2012 1:03:48 PM UTC-4, amol wrote:
>> Hi this is my first post to the group and i am hoping to find some >> answers to my questions, i apologize for a long post, but i think if i give >> you all the details then debugging will be easier..
>> So here is a detailed description of the issue
>> Server Version : ubuntu 10.04 LTS >> percona version: 5.5.24-55-log Percona XtraDB Cluster (GPL), >> wsrep_23.6.r341
>> ++++++++++++++++++++++++++++ >> # This was formally known as [safe_mysqld]. Both versions are currently >> parsed. >> [mysqld_safe] >> socket = /var/run/mysqld/mysqld.sock >> nice = 0 >> wsrep_urls = gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm:// >> 10.1.3.101:4567,gcomm://
the notable part here is that there is absolutely no error when i just stop the db (/etc/init/d/mysql stop ) and start the db (/etc/init.d/mysql/start)
so the question is...does the IST only work when you have stop the db and started it? if you reboot a node does it always do SST?
and my observation is SST using xtrabackup is slower thant rsync? but the donor node is atleast available for db connections...is that a valid statement?
On Thursday, August 9, 2012 12:33:00 PM UTC-4, amol wrote:
> yes i just did a reboot of the machine
> and after some search if found this error on the donor node
> innobackupex: Error: mysql child process has died: ERROR 1045 (28000): > Access denied for user 'mysql'@'localhost' (using password: NO)
> So i resolved that error by creating a user..
> grant process on *.* to 'mysql'@'localhost' identified by ''; > flush privileges;
> and then on server reboot i see that the donor was a different node and it shows this error....
> innobackupex: Error: mysql child process has died: ERROR 1044 (42000) at > line 3: Access denied for user 'mysql'@'localhost' to database 'mysql' > while waiting for reply to MySQL request: 'USE mysql;' at > /usr//bin/innobackupex line 374.
> now i see that mysql user needs more privileges..so i have granted all > privileges to mysql..so now i have to try getting the node backup using SST > and then try the reboot
> On Thursday, August 9, 2012 8:00:35 AM UTC-4, Jay Janssen wrote:
>> Something about how you have SST configured is causing the ultimate >> problem here.
>> I can't say why the local state was reset to all zeros on reboot, how was >> the machine restarted? If the local server had kept its state correctly, >> an IST should have been possible.
>> On Aug 8, 2012, at 4:54 PM, amol <ajk...@gmail.com> wrote:
>> now when i reboot the node1 the database does not start (even after using >> /etc/init.d/mysql start)
>> and i see these error in mysql/error.log
>> 120808 16:39:19 [Note] WSREP: Flow-control interval: [14, 28] >> 120808 16:39:19 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 151412) >> 120808 16:39:19 [Note] WSREP: State transfer required: >> Group state: afc4ea7d-dc5e-11e1-0800-0616c529eebe:151412 >> Local state: 00000000-0000-0000-0000-000000000000:-1 >> 120808 16:39:19 [Note] WSREP: New cluster view: global state: >> afc4ea7d-dc5e-11e1-0800-0616c529eebe:151412, view# 31: Primary, number of >> nodes: 3, my index: 0, protocol version 2 >> 120808 16:39:19 [Warning] WSREP: Gap in state sequence. Need state >> transfer. >> 120808 16:39:21 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' >> '10.1.6.8' '' '/var/lib/mysql/' '/etc/mysql/conf.d/mysqld_safe_syslog.cnf' >> '4411' 2>sst.err' >> 120808 16:39:21 [Note] WSREP: Prepared SST request: xtrabackup| >> 10.1.6.8:4444/xtrabackup_sst >> 120808 16:39:21 [Note] WSREP: wsrep_notify_cmd is not defined, skipping >> notification. >> 120808 16:39:21 [Note] WSREP: Assign initial position for certification: >> 151412, protocol version: 2 >> 120808 16:39:21 [Warning] WSREP: Failed to prepare for incremental state >> transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not >> match group state UUID (afc4ea7d-dc5e-11e1-0800-0616c529eebe): 1 (Operation >> not permitted) >> at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be >> unavailable. >> 120808 16:39:21 [Note] WSREP: Node 0 (node1) requested state transfer >> from '*any*'. Selected 1 (node2)(SYNCED) as donor. >> 120808 16:39:21 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 151412) >> 120808 16:39:21 [Note] WSREP: Requesting state transfer: success, donor: 1 >> 120808 16:39:27 [ERROR] WSREP: Process completed with error: >> wsrep_sst_xtrabackup 'joiner' '10.1.6.8' '' '/var/lib/mysql/' >> '/etc/mysql/conf.d/mysqld_safe_syslog.cnf' '4411' 2>sst.err: 32 (Broken >> pipe) >> 120808 16:39:27 [ERROR] WSREP: Failed to read uuid:seqno from joiner >> script. >> 120808 16:39:27 [ERROR] WSREP: SST failed: 32 (Broken pipe) >> 120808 16:39:27 [ERROR] Aborting
>> On Tuesday, August 7, 2012 1:03:48 PM UTC-4, amol wrote:
>>> Hi this is my first post to the group and i am hoping to find some >>> answers to my questions, i apologize for a long post, but i think if i give >>> you all the details then debugging will be easier..
>>> So here is a detailed description of the issue
>>> Server Version : ubuntu 10.04 LTS >>> percona version: 5.5.24-55-log Percona XtraDB Cluster (GPL), >>> wsrep_23.6.r341
>>> ++++++++++++++++++++++++++++ >>> # This was formally known as [safe_mysqld]. Both versions are currently >>> parsed. >>> [mysqld_safe] >>> socket = /var/run/mysqld/mysqld.sock >>> nice = 0 >>> wsrep_urls = gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567 >>> ,gcomm://10.1.3.101:4567,gcomm://
well that seems to have done the trick for now, once the permissions were set for mysql user..all nodes rebooted fine.just need to remove some privileges as "all" is not ideal for a user with no password...:)
On Thursday, August 9, 2012 12:33:00 PM UTC-4, amol wrote:
> yes i just did a reboot of the machine
> and after some search if found this error on the donor node
> innobackupex: Error: mysql child process has died: ERROR 1045 (28000): > Access denied for user 'mysql'@'localhost' (using password: NO)
> So i resolved that error by creating a user..
> grant process on *.* to 'mysql'@'localhost' identified by ''; > flush privileges;
> and then on server reboot i see that the donor was a different node and it shows this error....
> innobackupex: Error: mysql child process has died: ERROR 1044 (42000) at > line 3: Access denied for user 'mysql'@'localhost' to database 'mysql' > while waiting for reply to MySQL request: 'USE mysql;' at > /usr//bin/innobackupex line 374.
> now i see that mysql user needs more privileges..so i have granted all > privileges to mysql..so now i have to try getting the node backup using SST > and then try the reboot
> On Thursday, August 9, 2012 8:00:35 AM UTC-4, Jay Janssen wrote:
>> Something about how you have SST configured is causing the ultimate >> problem here.
>> I can't say why the local state was reset to all zeros on reboot, how was >> the machine restarted? If the local server had kept its state correctly, >> an IST should have been possible.
>> On Aug 8, 2012, at 4:54 PM, amol <ajk...@gmail.com> wrote:
>> now when i reboot the node1 the database does not start (even after using >> /etc/init.d/mysql start)
>> and i see these error in mysql/error.log
>> 120808 16:39:19 [Note] WSREP: Flow-control interval: [14, 28] >> 120808 16:39:19 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 151412) >> 120808 16:39:19 [Note] WSREP: State transfer required: >> Group state: afc4ea7d-dc5e-11e1-0800-0616c529eebe:151412 >> Local state: 00000000-0000-0000-0000-000000000000:-1 >> 120808 16:39:19 [Note] WSREP: New cluster view: global state: >> afc4ea7d-dc5e-11e1-0800-0616c529eebe:151412, view# 31: Primary, number of >> nodes: 3, my index: 0, protocol version 2 >> 120808 16:39:19 [Warning] WSREP: Gap in state sequence. Need state >> transfer. >> 120808 16:39:21 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner' >> '10.1.6.8' '' '/var/lib/mysql/' '/etc/mysql/conf.d/mysqld_safe_syslog.cnf' >> '4411' 2>sst.err' >> 120808 16:39:21 [Note] WSREP: Prepared SST request: xtrabackup| >> 10.1.6.8:4444/xtrabackup_sst >> 120808 16:39:21 [Note] WSREP: wsrep_notify_cmd is not defined, skipping >> notification. >> 120808 16:39:21 [Note] WSREP: Assign initial position for certification: >> 151412, protocol version: 2 >> 120808 16:39:21 [Warning] WSREP: Failed to prepare for incremental state >> transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not >> match group state UUID (afc4ea7d-dc5e-11e1-0800-0616c529eebe): 1 (Operation >> not permitted) >> at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be >> unavailable. >> 120808 16:39:21 [Note] WSREP: Node 0 (node1) requested state transfer >> from '*any*'. Selected 1 (node2)(SYNCED) as donor. >> 120808 16:39:21 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 151412) >> 120808 16:39:21 [Note] WSREP: Requesting state transfer: success, donor: 1 >> 120808 16:39:27 [ERROR] WSREP: Process completed with error: >> wsrep_sst_xtrabackup 'joiner' '10.1.6.8' '' '/var/lib/mysql/' >> '/etc/mysql/conf.d/mysqld_safe_syslog.cnf' '4411' 2>sst.err: 32 (Broken >> pipe) >> 120808 16:39:27 [ERROR] WSREP: Failed to read uuid:seqno from joiner >> script. >> 120808 16:39:27 [ERROR] WSREP: SST failed: 32 (Broken pipe) >> 120808 16:39:27 [ERROR] Aborting
>> On Tuesday, August 7, 2012 1:03:48 PM UTC-4, amol wrote:
>>> Hi this is my first post to the group and i am hoping to find some >>> answers to my questions, i apologize for a long post, but i think if i give >>> you all the details then debugging will be easier..
>>> So here is a detailed description of the issue
>>> Server Version : ubuntu 10.04 LTS >>> percona version: 5.5.24-55-log Percona XtraDB Cluster (GPL), >>> wsrep_23.6.r341
>>> ++++++++++++++++++++++++++++ >>> # This was formally known as [safe_mysqld]. Both versions are currently >>> parsed. >>> [mysqld_safe] >>> socket = /var/run/mysqld/mysqld.sock >>> nice = 0 >>> wsrep_urls = gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567 >>> ,gcomm://10.1.3.101:4567,gcomm://
On Friday, August 10, 2012 3:13:20 AM UTC+7, amol wrote:
> so the question is...does the IST only work when you have stop the db and > started it? > if you reboot a node does it always do SST?
IST happens whenever it is possible. That is:
1) joiner position can be reliably established (e.g. if a server crashes during DDL it can't) 2) donor cache contains enough writesets to cover the gap
These are the only two conditions which IST depends on. If at least one of these conditions is not met, donor will defer to SST.
On Thursday, August 23, 2012 10:13:29 PM UTC-4, Alexey Yurchenko wrote:
> On Friday, August 10, 2012 3:13:20 AM UTC+7, amol wrote:
>> so the question is...does the IST only work when you have stop the db and >> started it? >> if you reboot a node does it always do SST?
> IST happens whenever it is possible. That is:
> 1) joiner position can be reliably established (e.g. if a server crashes > during DDL it can't) > 2) donor cache contains enough writesets to cover the gap
> These are the only two conditions which IST depends on. If at least one of > these conditions is not met, donor will defer to SST.