Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Issue on ubuntu when i reboot the machine Percona cluster fails on node
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  18 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
amol  
View profile  
 More options Aug 7 2012, 1:03 pm
From: amol <ajke...@gmail.com>
Date: Tue, 7 Aug 2012 10:03:48 -0700 (PDT)
Local: Tues, Aug 7 2012 1:03 pm
Subject: Issue on ubuntu when i reboot the machine Percona cluster fails on node

Hi this is my first post to the group and i am hoping to find some answers
to my questions, i apologize for a long post, but i think if i give you all
the details then debugging will be easier..

So here is a detailed description of the issue

Server Version : ubuntu 10.04 LTS
percona version: 5.5.24-55-log Percona XtraDB Cluster (GPL), wsrep_23.6.r341

*Configuration details: 3 nodes (node1, node2, node3)*
*(my.cnf) in node 1 *
++++++++++++++++++++++++++++
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
wsrep_urls      =
gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m://

[mysqld]
#
# * Basic Settings
#
server_id=1
binlog_format=ROW  
wsrep_provider=/usr/lib64/libgalera_smm.so  
#wsrep_cluster_address=gcomm://
wsrep_slave_threads=2
wsrep_cluster_name=dev_cluster
wsrep_sst_method=rsync
wsrep_node_name=node1  
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
log_slave_updates
wsrep_replicate_myisam=1
++++++++++++++++++++++++++++
*
*
*(my.cnf) in node 2 *
++++++++++++++++++++++++++++
# This was formally known as [safe_mysqld]. Both versions are currently
parsed.
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
wsrep_urls      =
gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m://

[mysqld]
#
# * Basic Settings
#
server_id=2
binlog_format=ROW  
wsrep_provider=/usr/lib64/libgalera_smm.so  
#wsrep_cluster_address=gcomm://10.1.6.118:4567
wsrep_slave_threads=2
wsrep_cluster_name=dev_cluster
wsrep_sst_method=rsync
wsrep_node_name=node2  
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
log_slave_updates
wsrep_replicate_myisam=1
++++++++++++++++++++++++++++

*(my.cnf) in node 3*

++++++++++++++++++++++++++++
# This was formally known as [safe_mysqld]. Both versions are currently
parsed.
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
wsrep_urls      =
gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m://

[mysqld]
#
# * Basic Settings
#
server_id=3
binlog_format=ROW
wsrep_provider=/usr/lib64/libgalera_smm.so
#wsrep_cluster_address=gcomm://10.1.6.118:4567
wsrep_slave_threads=2
wsrep_cluster_name=dev_cluster
wsrep_sst_method=rsync
wsrep_node_name=node3
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
log_slave_updates
wsrep_replicate_myisam=1
++++++++++++++++++++++++++++

Testing Scenario: Setup haproxy with node1 up and node2 and node3 as backup
(so the connections always go to one node)

When i reboot node 3:

   1. node1 becomes the donor:  wsrep_local_state_comment  | Donor (+)
   2. node2 is up and running
   3. node3 comes back  up and starts to sync

node3:~$ ps -ef | grep mysql
mysql     2429     1  0 11:27 ?        00:00:00 /usr/sbin/mysqld
root      2549     1  0 11:27 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe
mysql     3031  2549  0 11:27 ?        00:00:00 /usr/sbin/mysqld
--basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib/mysql/plugin
--user=mysql --log-error=/var/log/mysql/error.log
--pid-file=/var/lib/mysql/dev-db-node3.pid
--socket=/var/run/mysqld/mysqld.sock --port=3306
--wsrep_cluster_address=gcomm://10.1.6.118:4567
mysql     3188  3031  0 11:27 ?        00:00:00 sh -c wsrep_sst_rsync
'joiner' '<public_ip_node3>' '' '/var/lib/mysql/'
'/etc/mysql/conf.d/mysqld_safe_syslog.cnf' '3031' 2>sst.err
mysql     3189  3188  0 11:27 ?        00:00:01 /bin/bash -ue
/usr//bin/wsrep_sst_rsync joiner <public_ip_node3>  /var/lib/mysql/
/etc/mysql/conf.d/mysqld_safe_syslog.cnf 3031
mysql     3203     1  0 11:27 ?        00:00:00 rsync --daemon --port 4444
--config /var/lib/mysql//rsync_sst.conf
mysql     3243  3203  0 11:27 ?        00:00:00 rsync --daemon --port 4444
--config /var/lib/mysql//rsync_sst.conf
mysql     3248  3243  1 11:27 ?        00:00:08 rsync --daemon --port 4444
--config /var/lib/mysql//rsync_sst.conf
mysql     5279  3189  0 11:35 ?        00:00:00 sleep 1
akedar    5281  3771  0 11:35 pts/0    00:00:00 grep --color=auto mysql
node3:~$

Question1: How can i change the rsync process to use private IP instead of
public IP?

         4. Once the sync is completed on node 3, the clustercheck still
shows that the node is down and node is not usable as a cluster node
         5. Then i have to issue sudo service mysql stop and tthen sudo
/etc/init.d/mysql start and it says database failed to start but the rsync
process starts and after the process is completed node3 becomes a part of
the cluster

Question2: How can i change the mysql process to start using
/etc/init.dmysq instead of service mysql start during the boot time.?

Question3: if node1 becomes a donor it stops accepting connections which
make the application unusable, once suggestion is to add +if [
"$WSSREP_STATUS" == "4" ] || [ "$WSSREP_STATUS" == "2" ] in the cluster
check, but doing that how accurate is the data during the rsync or should i
be using xtrabackup?

Question4: how do i configure the nodes to use incremental to avoid this
error?
120807 11:48:00 [Warning] WSREP: Failed to prepare for incremental state
transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not
match group state UUID (afc4ea7d-dc5e-11e1-0800-0616c529eebe): 1 (Operation
not permitted)
         at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will
be unavailable.

I have many more questions as i go on and test the configuration but if
someone can answer these, i think i can clear a lot of my doubts...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jay Janssen  
View profile  
 More options Aug 7 2012, 1:22 pm
From: Jay Janssen <jay.jans...@percona.com>
Date: Tue, 7 Aug 2012 13:22:28 -0400
Local: Tues, Aug 7 2012 1:22 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

Hi Amol,
  I'll try to answer your questions below:

On Aug 7, 2012, at 1:03 PM, amol <ajke...@gmail.com> wrote:

> Question1: How can i change the rsync process to use private IP instead of public IP?

Set wsrep_sst_receive_address (http://www.codership.com/wiki/doku.php?id=mysql_options_0.8).

>          4. Once the sync is completed on node 3, the clustercheck still shows that the node is down and node is not usable as a cluster node
>          5. Then i have to issue sudo service mysql stop and tthen sudo /etc/init.d/mysql start and it says database failed to start but the rsync process starts and after the process is completed node3 becomes a part of the cluster

> Question2: How can i change the mysql process to start using /etc/init.dmysq instead of service mysql start during the boot time.?

I think this is a bug.  Feel free to poke around the launchpad project  (https://bugs.launchpad.net/percona-xtradb-cluster/+bugs), and file a bug if one does not exist.

> Question3: if node1 becomes a donor it stops accepting connections which make the application unusable, once suggestion is to add +if [ "$WSSREP_STATUS" == "4" ] || [ "$WSSREP_STATUS" == "2" ] in the cluster check, but doing that how accurate is the data during the rsync or should i be using xtrabackup?

The rsync will flush tables and pause replication on the donor node while the rsync is copying.  The xtrabackup method allows for replication to continue during the donation, but it does briefly block on replication right at the end of the data copy.

> Question4: how do i configure the nodes to use incremental to avoid this error?

The most obvious way to configure IST is using the ist.recv_addr in the wsrep_provider_options (http://www.codership.com/wiki/doku.php?id=galera_parameters).  It's not obvious, but IST uses its own port.

A shortcut here is using the undocumented wsrep_node_address setting, which sets the listen, ist, and sst addresses automatically if they are all on the same IP.  

Hope this helps.

Jay Janssen, Senior MySQL Consultant, Percona Inc.
http://about.me/jay.janssen
Percona Live in NYC Oct 1-2nd: http://www.percona.com/live/nyc-2012/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 8 2012, 11:39 am
From: amol <ajke...@gmail.com>
Date: Wed, 8 Aug 2012 08:39:19 -0700 (PDT)
Local: Wed, Aug 8 2012 11:39 am
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

Thank Jay..you suggestions were indeed helpful... and yes i will
investigate in detail about the mysql startup and then create a bug request
another thing to note here is that is it because the wsrep_urls are in the
mysqld_safe section that must be causing this? since the database has to be
started in the mysqld_safe mode?

can that parameter be shifted to mysqld section? or is their another
variable that i can use like e.g
wsrep_cluster_address=gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm: //10.1.3.101:4567,gcomm://

So after your suggestions i noticed that the if the db is restarted the
node becomes available very soon, is it due to the IST taking effect?
here are my new parameter
*[mysqld_safe]*
*socket = /var/run/mysqld/mysqld.sock*
*nice = 0*
*wsrep_urls      =
gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm://10.1.3.101:4567,gcom m://
*
*
*
*[mysqld]*
*#*
*# * Basic Settings*
*#*
*server_id=1*
*binlog_format=ROW   *
*wsrep_provider=/usr/lib64/libgalera_smm.so   *
*wsrep_slave_threads=2 *
*wsrep_cluster_name=dev_cluster  *
*wsrep_sst_method=xtrabackup    # changed to xtrabackup from rsycn inorder
to use IST*
*wsrep_node_name=node1   *
*innodb_locks_unsafe_for_binlog=1 *
*innodb_autoinc_lock_mode=2*
*log_slave_updates*
*wsrep_replicate_myisam=1*
*wsrep_sst_receive_address=10.1.6.118  # **i believe this should be the
private ip of this node**?*
*wsrep_provider_options = "gmcast.listen_addr=tcp://0.0.0.0:4567;
ist.recv_addr=10.1.6.118:4568; "  # i believe this should be the private ip
of this node?*

Thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jay Janssen  
View profile  
 More options Aug 8 2012, 1:49 pm
From: Jay Janssen <jay.jans...@percona.com>
Date: Wed, 8 Aug 2012 13:49:56 -0400
Local: Wed, Aug 8 2012 1:49 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

On Aug 8, 2012, at 11:39 AM, amol <ajke...@gmail.com> wrote:

> Thank Jay..you suggestions were indeed helpful... and yes i will investigate in detail about the mysql startup and then create a bug request
> another thing to note here is that is it because the wsrep_urls are in the mysqld_safe section that must be causing this? since the database has to be started in the mysqld_safe mode?

It's possible, yes.

> can that parameter be shifted to mysqld section? or is their another variable that i can use like e.g
> wsrep_cluster_address=gcomm://10.1.6.118:4567,gcomm://10.1.3.30:4567,gcomm: //10.1.3.101:4567,gcomm://

No, you cannot.  wsrep_urls is the only variable that supports multiple gcomm:// addresses, and it's really just a bit of a hack that finds an open port in the list and passes that to the mysqld as the wsrep_cluster_address for you.

Again, I'd defer to either filing a bug or getting involved in any discussion on an existing bug (if exists).  

> So after your suggestions i noticed that the if the db is restarted the node becomes available very soon, is it due to the IST taking effect?

Check the log, it should tell you when IST or SST is happening.  

This has nothing to do with IST.

> wsrep_node_name=node1  
> innodb_locks_unsafe_for_binlog=1
> innodb_autoinc_lock_mode=2
> log_slave_updates
> wsrep_replicate_myisam=1
> wsrep_sst_receive_address=10.1.6.118  # i believe this should be the private ip of this node?
> wsrep_provider_options = "gmcast.listen_addr=tcp://0.0.0.0:4567; ist.recv_addr=10.1.6.118:4568; "  # i believe this should be the private ip of this node?

What IP you run SST and IST on is up to you and your environment.  

Jay Janssen, Senior MySQL Consultant, Percona Inc.
http://about.me/jay.janssen
Percona Live in NYC Oct 1-2nd: http://www.percona.com/live/nyc-2012/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 8 2012, 2:42 pm
From: amol <ajke...@gmail.com>
Date: Wed, 8 Aug 2012 11:42:16 -0700 (PDT)
Local: Wed, Aug 8 2012 2:42 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

Hi Jay, thanks for the answers...another question is..this might be a side
track, so let me know if  should open a new thread for this....

we were running some load tests on the entire setup which has (1 haproxy lb
+ 3 nodes) and i am noticing that after a few connections the scripts stop
running with the error "Error connecting to mysql"
and it starts back after a while..
i checked the innotop and did not see any locks or deadlocks in the db
node, plus i am just running 1 thread at a time so i don't think it should
be using too many connections..
but i wasn't sure whether any of the system user connections is causing the
db to lock down?
here is what my process list looks while the load test is running

mysql> show full processlist;
+--------+-------------+--------------------+---------+---------+-------+-- ------------------+-----------------------+-----------+---------------+---- -------+
| Id     | User        | Host               | db      | Command | Time  |
State              | Info                  | Rows_sent | Rows_examined |
Rows_read |
+--------+-------------+--------------------+---------+---------+-------+-- ------------------+-----------------------+-----------+---------------+---- -------+
|      1 | system user |                    | NULL    | Sleep   | 39005 |
wsrep aborter idle | NULL                  |         0 |             0 |  
      1 |
|      2 | system user |                    | NULL    | Sleep   |  2815 |
committed 81728    | NULL                  |         0 |             0 |  
      1 |
|      3 | system user |                    | NULL    | Sleep   |  2816 |
committed 81727    | NULL                  |         0 |             0 |  
      1 |
| 136444 | user1       | localhost          | NULL    | Query   |     0 |
sleeping           | show full processlist |         0 |             0 |  
      1 |
| 141856 | applusdev   | 10.1.4.6:34993     | demo  | Sleep   |     0 |    
               | NULL                  |         0 |             0 |      
 59 |
| 141869 | applusdev   | 10.1.4.6:35006     | demo    | Sleep   |     0 |  
                 | NULL                  |         1 |             1 |    
    1 |
| 141871 | applusdev   | 10.1.4.6:35008     | demo    | Sleep   |     0 |  
                 | NULL                  |         0 |             0 |    
    1 |
+--------+-------------+--------------------+---------+---------+-------+-- ------------------+-----------------------+-----------+---------------+---- -------+

I am open to suggestions to debug this issue, as i cannot proceed to
production with this issue lingering...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jay Janssen  
View profile  
 More options Aug 8 2012, 2:50 pm
From: Jay Janssen <jay.jans...@percona.com>
Date: Wed, 8 Aug 2012 14:50:37 -0400
Local: Wed, Aug 8 2012 2:50 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

Amol,
  You should try to check exactly where the test scripts are failing (is it on connect?, is it on a query?, etc.) and, if possible, see if there is a more precise mysql error code associated with the problem.    Are your scripts reconnecting every time they query?

  The system users there seem normal.

On Aug 8, 2012, at 2:42 PM, amol <ajke...@gmail.com> wrote:

Jay Janssen, Senior MySQL Consultant, Percona Inc.
http://about.me/jay.janssen
Percona Live in NYC Oct 1-2nd: http://www.percona.com/live/nyc-2012/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 8 2012, 2:58 pm
From: amol <ajke...@gmail.com>
Date: Wed, 8 Aug 2012 11:58:45 -0700 (PDT)
Local: Wed, Aug 8 2012 2:58 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

yes the precise error in the log file is

PHP Warning:  mysqli::mysqli(): (HY000/2003): Can't connect to MySQL server
on '<server_IP>'

and yes the script is using a new connection for every new record it
inserts, and closes the connection

some variables from the db...

mysql> show variables like 'max_connection%';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 151   |
+-----------------+-------+

mysql> show status like '%connections';
+----------------------+--------+
| Variable_name        | Value  |
+----------------------+--------+
| Connections          | 305747 |
| Max_used_connections | 115    |
+----------------------+--------+


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jay Janssen  
View profile  
 More options Aug 8 2012, 3:12 pm
From: Jay Janssen <jay.jans...@percona.com>
Date: Wed, 8 Aug 2012 15:12:42 -0400
Local: Wed, Aug 8 2012 3:12 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

Is that connecting directly to a cluster node or to a VIP/proxy like HAproxy?

On Aug 8, 2012, at 2:58 PM, amol <ajke...@gmail.com> wrote:

Jay Janssen, Senior MySQL Consultant, Percona Inc.
http://about.me/jay.janssen
Percona Live in NYC Oct 1-2nd: http://www.percona.com/live/nyc-2012/

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 8 2012, 3:16 pm
From: amol <ajke...@gmail.com>
Date: Wed, 8 Aug 2012 12:16:56 -0700 (PDT)
Local: Wed, Aug 8 2012 3:16 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

connecting to haproxy..
and here is the config i used in haproxy to avoid lock conflicts

backend pxc-onenode-back
        mode tcp
        balance leastconn
        option httpchk
        server c2 10.1.3.3:3306 check port 9200 inter 12000 rise 3 fall 3
        server c1 10.1.6.8:3306 check port 9200 inter 12000 rise 3 fall 3
backup
        server c3 10.1.3.1:3306 check port 9200 inter 12000 rise 3 fall 3
backup

and i also tried connection to server c2 directly and when during the load
test i got similar errors...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jay Janssen  
View profile  
 More options Aug 8 2012, 3:21 pm
From: Jay Janssen <jay.jans...@percona.com>
Date: Wed, 8 Aug 2012 15:21:32 -0400
Local: Wed, Aug 8 2012 3:21 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

On Aug 8, 2012, at 3:16 PM, amol <ajke...@gmail.com> wrote:

> connecting to haproxy..
> and here is the config i used in haproxy to avoid lock conflicts

> backend pxc-onenode-back
>         mode tcp
>         balance leastconn
>         option httpchk
>         server c2 10.1.3.3:3306 check port 9200 inter 12000 rise 3 fall 3
>         server c1 10.1.6.8:3306 check port 9200 inter 12000 rise 3 fall 3 backup
>         server c3 10.1.3.1:3306 check port 9200 inter 12000 rise 3 fall 3 backup

> and i also tried connection to server c2 directly and when during the load test i got similar errors…

I'd look at the HA proxy dashboard to see if you can see any error counters increasing, likewise in c2 -- specifically things like 'aborted_connections' and so forth.

Jay Janssen, Senior MySQL Consultant, Percona Inc.
http://about.me/jay.janssen
Percona Live in NYC Oct 1-2nd: http://www.percona.com/live/nyc-2012/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 8 2012, 3:41 pm
From: amol <ajke...@gmail.com>
Date: Wed, 8 Aug 2012 12:41:46 -0700 (PDT)
Local: Wed, Aug 8 2012 3:41 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

where do i check for aborted_connections?

So here are my findings from haproxy (this is from csv output)

run 1
pxc-onenode-back,c2,0,0,0,13,,12693,21212854,541070495,,0,,6,0,25,1,UP,1,1, 0,4,1,25,36,,1,6,1,,12668,,2,0,,115,L7OK,200,22,,,,,,,0,,,,0,0,
pxc-onenode-back,c1,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,0,1,3,1,23,36,,1,6,2,,0 ,,2,0,,0,L7OK,200,33,,,,,,,0,,,,0,0,
pxc-onenode-back,c3,0,0,0,7,,360,157380,383640,,0,,0,0,0,0,UP,1,0,1,4,1,30, 40,,1,6,3,,360,,2,0,,97,L7OK,200,66,,,,,,,0,,,,0,0,
pxc-onenode-back,BACKEND,0,0,0,13,0,13442,21370234,541454135,0,0,,421
,0,25,1,UP,1,1,2,,1,30,29,,1,6,0,,13028,,1,0,,115,,,,,,,,,,,,,,0,0,

run 2
pxc-onenode-back,c2,0,0,0,13,,17644,23366860,546367867,,0,,8,0,34,1,UP,1,1, 0,5,1,240,36,,1,6,1,,17610,,2,0,,132,L7OK,200,23,,,,,,,0,,,,0,0,
pxc-onenode-back,c1,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,0,1,5,1,238,36,,1,6,2,, 0,,2,0,,0,L7OK,200,23,,,,,,,0,,,,0,0,
pxc-onenode-back,c3,0,0,0,7,,360,157380,383640,,0,,0,0,0,0,UP,1,0,1,6,1,245 ,40,,1,6,3,,360,,2,0,,97,L7OK,200,21,,,,,,,0,,,,0,0,
pxc-onenode-back,BACKEND,0,0,0,13,0,18384,23524240,546751507,0,0,,423
,0,34,1,UP,1,1,2,,1,245,29,,1,6,0,,17970,,1,0,,132,,,,,,,,,,,,,,0,0,

run 3
pxc-onenode-back,c2,0,0,0,13,,21574,25073755,550547926,,0,,10,0,48,1,UP,1,1 ,0,7,2,87,73,,1,6,1,,21526,,2,0,,132,L7OK,200,22,,,,,,,0,,,,0,0,
pxc-onenode-back,c1,0,0,0,0,,0,0,0,,0,,0,0,0,0,UP,1,0,1,7,2,97,72,,1,6,2,,0 ,,2,0,,0,L7OK,200,22,,,,,,,0,,,,0,0,
pxc-onenode-back,c3,0,0,0,10,,1393,606406,1508709,,0,,0,0,0,0,UP,1,0,1,8,1, 522,40,,1,6,3,,1393,,2,0,,105,L7OK,200,22,,,,,,,0,,,,0,0,
pxc-onenode-back,BACKEND,0,0,0,13,0,23333,25680161,552056635,0,0,,425
,0,48,1,UP,1,1,2,,1,522,29,,1,6,0,,22919,,1,0,,132,,,,,,,,,,,,,,0,0,

So i see a increase in the error count

another thing  noticed is that the haproxy log shows an increase in
activity and then flips over from node 2 to node3

Aug  8 15:31:43 localhost haproxy[30255]: 10.1.4.5:40021
[08/Aug/2012:15:31:40.309] pxc-onenode-front pxc-onenode-back/c2
0/3005/3008 500 -- 2/2/2/2/0 0/0
Aug  8 15:32:04 localhost haproxy[30255]: 10.1.4.5:40027
[08/Aug/2012:15:31:51.310] pxc-onenode-front pxc-onenode-back/c2
0/13024/13028 500 -- 6/6/6/6/2 0/0
Aug  8 15:32:06 localhost haproxy[30255]: 10.1.4.5:40011
[08/Aug/2012:15:31:37.222] pxc-onenode-front pxc-onenode-back/c2 0/0/29621
3016 -- 4/4/4/4/0 0/0
Aug  8 15:32:06 localhost haproxy[30255]: 10.1.4.5:40020
[08/Aug/2012:15:31:37.299] pxc-onenode-front pxc-onenode-back/c2
0/3005/29554 765 -- 3/3/3/3/0 0/0
Aug  8 15:32:21 localhost haproxy[30255]: 10.1.4.5:40048
[08/Aug/2012:15:32:21.419] pxc-onenode-front pxc-onenode-back/c2 0/0/4 500
-- 4/4/4/4/0 0/0
Aug  8 15:32:25 localhost haproxy[30255]: 10.1.4.5:40026
[08/Aug/2012:15:31:45.262] pxc-onenode-front pxc-onenode-back/c2
0/3002/40118 3016 -- 2/2/2/2/0 0/0
Aug  8 15:32:37 localhost haproxy[30255]: 10.1.4.5:40029
[08/Aug/2012:15:31:58.300] pxc-onenode-front pxc-onenode-back/c2
15024/3010/39130 3016 -- 2/2/2/2/3 0/0
Aug  8 15:32:42 localhost haproxy[30255]: 10.1.4.5:40035
[08/Aug/2012:15:32:10.359] pxc-onenode-front pxc-onenode-back/c2
0/8015/32100 3016 -- 1/1/1/1/1 0/0
Aug  8 15:32:42 localhost haproxy[30255]: Backup Server pxc-onenode-back/c1
is DOWN, reason: Layer4 timeout, check duration: 12000ms. 1 active and 1
backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Aug  8 15:32:51 localhost haproxy[30255]: Server pxc-onenode-back/c2 is
DOWN, reason: Layer4 timeout, check duration: 12009ms. 0 active and 1
backup servers left. Running on backup. 1 sessions active, 0 requeued, 0
remaining in queue.
Aug  8 15:32:52 localhost haproxy[30255]: 10.1.4.5:40069
[08/Aug/2012:15:32:52.650] pxc-onenode-front pxc-onenode-back/c3 0/0/46 500
-- 3/3/2/1/0 0/0
Aug  8 15:32:52 localhost haproxy[30255]: 10.1.4.5:40070
[08/Aug/2012:15:32:52.696] pxc-onenode-front pxc-onenode-back/c3 0/0/3 995
-- 2/2/2/1/0 0/0
Aug  8 15:32:53 localhost haproxy[30255]: 10.1.4.5:40086
[08/Aug/2012:15:32:52.871] pxc-onenode-front pxc-onenode-back/c3 0/0/144
1185 -- 4/4/4/3/0 0/0
Aug  8 15:32:53 localhost haproxy[30255]: 10.1.4.5:40085
[08/Aug/2012:15:32:52.869] pxc-onenode-front pxc-onenode-back/c3 0/0/146
2276 -- 3/3/3/2/0 0/0
Aug  8 15:32:53 localhost haproxy[30255]: 10.1.4.5:40068
[08/Aug/2012:15:32:52.579] pxc-onenode-front pxc-onenode-back/c3 0/0/437
3016 -- 2/2/2/1/0 0/0
Aug  8 15:32:53 localhost haproxy[30255]: Backup Server stats-back/c3 is
DOWN, reason: Layer4 timeout, check duration: 12000ms. 1 active and 1
backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
Aug  8 15:32:53 localhost haproxy[30255]: 10.1.4.5:40081
[08/Aug/2012:15:32:52.839] pxc-onenode-front pxc-onenode-back/c3 0/0/187
669 -- 1/1/1/0/0 0/0


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 8 2012, 4:54 pm
From: amol <ajke...@gmail.com>
Date: Wed, 8 Aug 2012 13:54:09 -0700 (PDT)
Local: Wed, Aug 8 2012 4:54 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

now when i reboot the node1 the database does not start (even after using
/etc/init.d/mysql start)

and i see these error in mysql/error.log

120808 16:39:19 [Note] WSREP: Flow-control interval: [14, 28]
120808 16:39:19 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 151412)
120808 16:39:19 [Note] WSREP: State transfer required:
Group state: afc4ea7d-dc5e-11e1-0800-0616c529eebe:151412
Local state: 00000000-0000-0000-0000-000000000000:-1
120808 16:39:19 [Note] WSREP: New cluster view: global state:
afc4ea7d-dc5e-11e1-0800-0616c529eebe:151412, view# 31: Primary, number of
nodes: 3, my index: 0, protocol version 2
120808 16:39:19 [Warning] WSREP: Gap in state sequence. Need state transfer.
120808 16:39:21 [Note] WSREP: Running: 'wsrep_sst_xtrabackup 'joiner'
'10.1.6.8' '' '/var/lib/mysql/' '/etc/mysql/conf.d/mysqld_safe_syslog.cnf'
'4411' 2>sst.err'
120808 16:39:21 [Note] WSREP: Prepared SST request:
xtrabackup|10.1.6.8:4444/xtrabackup_sst
120808 16:39:21 [Note] WSREP: wsrep_notify_cmd is not defined, skipping
notification.
120808 16:39:21 [Note] WSREP: Assign initial position for certification:
151412, protocol version: 2
120808 16:39:21 [Warning] WSREP: Failed to prepare for incremental state
transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not
match group state UUID (afc4ea7d-dc5e-11e1-0800-0616c529eebe): 1 (Operation
not permitted)
 at galera/src/replicator_str.cpp:prepare_for_IST():439. IST will be
unavailable.
120808 16:39:21 [Note] WSREP: Node 0 (node1) requested state transfer from
'*any*'. Selected 1 (node2)(SYNCED) as donor.
120808 16:39:21 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 151412)
120808 16:39:21 [Note] WSREP: Requesting state transfer: success, donor: 1
120808 16:39:27 [ERROR] WSREP: Process completed with error:
wsrep_sst_xtrabackup 'joiner' '10.1.6.8' '' '/var/lib/mysql/'
'/etc/mysql/conf.d/mysqld_safe_syslog.cnf' '4411' 2>sst.err: 32 (Broken
pipe)
120808 16:39:27 [ERROR] WSREP: Failed to read uuid:seqno from joiner script.
120808 16:39:27 [ERROR] WSREP: SST failed: 32 (Broken pipe)
120808 16:39:27 [ERROR] Aborting

120808 16:39:27 [Warning] WSREP: 1 (node2): State transfer to 0 (node1)
failed: -1 (Operation not permitted)
120808 16:39:27 [ERROR] WSREP:
gcs/src/gcs_group.c:gcs_group_handle_join_msg():712: Will never receive
state. Need to abort.
120808 16:39:27 [Note] WSREP: gcomm: terminating thread
120808 16:39:27 [Note] WSREP: gcomm: joining thread
120808 16:39:27 [Note] WSREP: gcomm: closing backend
120808 16:39:27 [Note] WSREP:
view(view_id(NON_PRIM,20ca3744-e199-11e1-0800-0de247e11b46,31) memb {
20ca3744-e199-11e1-0800-0de247e11b46,

} joined {
} left {
} partitioned {

5ffb372a-e118-11e1-0800-1e749dee7061,
71386e58-e109-11e1-0800-8855542b6c12,
})

120808 16:39:27 [Note] WSREP: view((empty))
120808 16:39:27 [Note] WSREP: gcomm: closed
120808 16:39:27 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted
120808 16:39:27 mysqld_safe mysqld from pid file
/var/lib/mysql/dev2-db-upgrade.pid ended

this is after all the changes i did earlier in the day on node 1 my.cnf for
IST
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
wsrep_urls      =
gcomm://10.1.6.3:4567,gcomm://10.1.3.1:4567,gcomm://10.1.6.8:4567,gcomm://

[mysqld]
#
# * Basic Settings
#
server_id=1
binlog_format=ROW  
wsrep_provider=/usr/lib64/libgalera_smm.so  
wsrep_slave_threads=2
wsrep_cluster_name=dev_cluster
wsrep_sst_method=xtrabackup
wsrep_node_name=node1  
innodb_locks_unsafe_for_binlog=1
innodb_autoinc_lock_mode=2
log_slave_updates
wsrep_replicate_myisam=1
wsrep_sst_receive_address=10.1.6.8
wsrep_provider_options = "gmcast.listen_addr=tcp://0.0.0.0:4567;
ist.recv_addr=10.1.6.8:4568; "
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jay Janssen  
View profile  
 More options Aug 9 2012, 8:00 am
From: Jay Janssen <jay.jans...@percona.com>
Date: Thu, 9 Aug 2012 08:00:35 -0400
Local: Thurs, Aug 9 2012 8:00 am
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

Something about how you have SST configured is causing the ultimate problem here.

I can't say why the local state was reset to all zeros on reboot, how was the machine restarted?  If the local server had kept its state correctly, an IST should have been possible.  

On Aug 8, 2012, at 4:54 PM, amol <ajke...@gmail.com> wrote:

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 9 2012, 12:33 pm
From: amol <ajke...@gmail.com>
Date: Thu, 9 Aug 2012 09:33:00 -0700 (PDT)
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

yes i just did a reboot of the machine

 and after some search if found this error on the donor node

innobackupex: Error: mysql child process has died: ERROR 1045 (28000):
Access denied for user 'mysql'@'localhost' (using password: NO)

So i resolved that error by creating a user..

grant process on *.* to 'mysql'@'localhost' identified by '';
flush privileges;

and then on server reboot i see that the donor was a different node and it shows this error....

innobackupex: Error: mysql child process has died: ERROR 1044 (42000) at
line 3: Access denied for user 'mysql'@'localhost' to database 'mysql'
 while waiting for reply to MySQL request: 'USE mysql;' at
/usr//bin/innobackupex line 374.

now i see that mysql user needs more privileges..so i have granted all
privileges to mysql..so now i have to try getting the node backup using SST
and then try the reboot

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 9 2012, 4:13 pm
From: amol <ajke...@gmail.com>
Date: Thu, 9 Aug 2012 13:13:20 -0700 (PDT)
Local: Thurs, Aug 9 2012 4:13 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

the notable part here is that there is absolutely no error when i just stop
the db (/etc/init/d/mysql stop ) and start the db (/etc/init.d/mysql/start)

so the question is...does the IST only work when you have stop the db and
started it?
if you reboot a node does it always do SST?

and my observation is SST using xtrabackup is slower thant rsync? but the
donor node is atleast available for db connections...is that a valid
statement?

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 9 2012, 10:55 pm
From: amol <ajke...@gmail.com>
Date: Thu, 9 Aug 2012 19:55:15 -0700 (PDT)
Local: Thurs, Aug 9 2012 10:55 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

well that seems to have done the trick for now, once the permissions were
set for mysql user..all nodes rebooted fine.just need to remove some
privileges as "all" is not ideal for a user with no password...:)

...

read more »


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alexey Yurchenko  
View profile  
 More options Aug 23 2012, 10:13 pm
From: Alexey Yurchenko <ayurc...@gmail.com>
Date: Thu, 23 Aug 2012 19:13:29 -0700 (PDT)
Local: Thurs, Aug 23 2012 10:13 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

On Friday, August 10, 2012 3:13:20 AM UTC+7, amol wrote:

> so the question is...does the IST only work when you have stop the db and
> started it?
> if you reboot a node does it always do SST?

IST happens whenever it is possible. That is:

1) joiner position can be reliably established (e.g. if a server crashes
during DDL it can't)
2) donor cache contains enough writesets to cover the gap

These are the only two conditions which IST depends on. If at least one of
these conditions is not met, donor will defer to SST.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
amol  
View profile  
 More options Aug 23 2012, 11:19 pm
From: amol <ajke...@gmail.com>
Date: Thu, 23 Aug 2012 20:19:52 -0700 (PDT)
Local: Thurs, Aug 23 2012 11:19 pm
Subject: Re: Issue on ubuntu when i reboot the machine Percona cluster fails on node

Thank alexey for the clarification...


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »