Flapping Hosts

iglablues

unread,

May 10, 2011, 2:44:52 PM5/10/11

to MySQL Multi Master Manager Development

I have a simple two-master/1 monitor setup running on Ubuntu 10.04
(these are virtual servers that I'm using to test prior to deploying
this in the real world). I just installed this the other day using the
info at the mysql-mmm site and it went well, no issues there. I
installed it using a single IP for reads/writes since I don't require
separating out those functions. However now that it's running I see a
ton of messages in the agent and monitor logs indicating connection
issues, and it's causing problems with the actual setup as after a
while both masters will go hard offline and then come back into
awaiting recovery. Yesterday 5pm I checked and both masters were
awaiting recovery. I set them both online, and when I had left around
an hour later things were okay. By the time I checked on it again
today at 2pm or so, they were awaiting recovery again.

My agent logs show a bunch of the following messages:

"FATAL Couldn't allow writes: ERROR: Can't connect to MySQL
(host=192.168.1.5:3306, user=agent)! Can't connect to MySQL server on
'192.168.1.5' (4)"

My monitor logs show alternating info and warn alerts for rep_backlog
and rep_threads for both hosts. For example:

2011/05/10 10:56:32 WARN Check 'rep_backlog' on 'mysql1' is in
unknown state! Message: UNKNOWN: Connect error (host =
192.168.1.15:3306, user = monitor)! Can't connect to MySQL server on
'192.168
.1.15' (4)
2011/05/10 10:56:32 WARN Check 'rep_threads' on 'mysql1' is in
unknown state! Message: UNKNOWN: Connect error (host =
192.168.100.15:3306, user = monitor)! Can't connect to MySQL server on
'192.168
.1.15' (4)
2011/05/10 10:56:44 INFO Check 'rep_backlog' on 'mysql1' is ok!
2011/05/10 10:56:45 INFO Check 'rep_threads' on 'mysql1' is ok!
2011/05/10 10:57:54 WARN Check 'rep_threads' on 'mysql1' is in
unknown state! Message: UNKNOWN: Connect error (host =
192.168.1.15:3306, user = monitor)! Can't connect to MySQL server on
'192.168.1.15' (4)
2011/05/10 10:57:58 WARN Check 'rep_backlog' on 'mysql2' is in
unknown state! Message: UNKNOWN: Connect error (host =
192.168.1.16:3306, user = monitor)! Can't connect to MySQL server on
'192.168.1.16' (4)
2011/05/10 10:58:00 WARN Check 'rep_backlog' on 'mysql1' is in
unknown state! Message: UNKNOWN: Connect error (host =
192.168.1.15:3306, user = monitor)! Can't connect to MySQL server on
'192.168.1.15' (4)
2011/05/10 10:58:01 WARN Check 'rep_threads' on 'mysql2' is in
unknown state! Message: UNKNOWN: Connect error (host =
192.168.1.16:3306, user = monitor)! Can't connect to MySQL server on
'192.168.1.16' (4)
2011/05/10 10:58:03 INFO Check 'rep_threads' on 'mysql1' is ok!
2011/05/10 10:58:07 INFO Check 'rep_backlog' on 'mysql2' is ok!
2011/05/10 10:58:07 INFO Check 'rep_backlog' on 'mysql1' is ok!
2011/05/10 10:58:08 INFO Check 'rep_threads' on 'mysql2' is ok!

And on and on and on. I am able to connect to both MySQL hosts from
their opposite, i.e. I can connect to mysql2 from mysql1 using all 3
of the accounts I set up. They have the permissions as identified by
http://mysql-mmm.org/mmm2:guide, with the host portion of the user
account being set to 192.168.1.% so that it covers any machine within
the subnet.

I haven't found any useful information from Googling and am hoping
someone here has the ability and inclination to help me figure out why
this is happening.

Thanks in advance.

Manuel Arostegui Ramirez

unread,

May 11, 2011, 10:51:23 AM5/11/11

to mmm-...@googlegroups.com

Make sure the IP ain't moving around both machines.

Try to do a 'watch ip addr' and check if it's going back and forward between then.

Manuel.

2011/5/10 iglablues <steviel...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "MySQL Multi Master Manager Development" group.
To post to this group, send email to mmm-...@googlegroups.com.
To unsubscribe from this group, send email to mmm-devel+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mmm-devel?hl=en.

iglablues

unread,

May 13, 2011, 12:18:38 PM5/13/11

to MySQL Multi Master Manager Development

Thanks for the advice. I did this and didn't see any indication that
the IP was being moved between hosts. After I put both hosts back
online, mmm_mond.log on the monitoring server started to fill with
toggling info and warning messages related to rep_threads and
rep_backlog. Here's a timeline of events:

I put these hosts online at 1:49pm:

2011/05/11 13:49:09 FATAL Admin changed state of 'mysql1' from
AWAITING_RECOVERY to ONLINE
2011/05/11 13:49:09 INFO Orphaned role 'writer(192.168.1.200)' has
been assigned to 'mysql1'
2011/05/11 13:49:22 FATAL Admin changed state of 'mysql2' from
AWAITING_RECOVERY to ONLINE

mysql1 logged continuous errors (192.168.1.15 is mysql1):

2011/05/11 13:49:26 FATAL Couldn't allow writes: ERROR: Can't connect
to MySQL (host = 192.168.1.15:3306, user = agent)! Can't connect to

MySQL server on '192.168.1.15' (4)

This went on until 3:57, when mysql1 then logged the following:

2011/05/11 15:57:18 INFO We have some new roles added or old rules
deleted!
2011/05/11 15:57:18 INFO Deleted: writer(192.168.1.200)
2011/05/11 15:57:21 FATAL Couldn't deny writes: ERROR: Can't connect
to MySQL (host = 192.168.1.15:3306, user = agent)! Can't connect to

MySQL server on '192.168.1.15' (4)

At this same time on mysql2, the server at 192.168.1.16 which has no
roles assigned at all:

2011/05/11 15:57:20 INFO We have some new roles added or old rules
deleted!
2011/05/11 15:57:20 INFO Added: writer(192.168.1.200)
2011/05/11 15:57:23 FATAL Couldn't sync with master: ERROR: Can't
connect to MySQL (host = 192.168.1.16:3306, user = agent)! Can't

connect to MySQL server on '192.168.1.16' (4)

Now mysql2 starts logging:

2011/05/11 15:57:26 FATAL Couldn't allow writes: ERROR: Can't connect
to MySQL (host = 192.168.1.16:3306, user = agent)! Can't connect to

MySQL server on '192.168.1.16' (4)

This happens for a few minutes until the monitor decides to delete the
writer role from this server and try to give it back to mysql1. mysql1
has the writer role for a bit and then it gets assigned back to
mysql2. This goes back and forth for a while until logging stops for
both hosts around 4pm.

Starting around 3:57pm the monitoring node has been logging the fact
that both hosts are swapping between a HARD_OFFLINE to
AWAITING_RECOVERY to ONLINE back to HARD_OFFLINE:

2011/05/11 13:49:09 FATAL Admin changed state of 'mysql1' from
AWAITING_RECOVERY to ONLINE
2011/05/11 13:49:22 FATAL Admin changed state of 'mysql2' from
AWAITING_RECOVERY to ONLINE
2011/05/11 15:57:32 FATAL State of host 'mysql1' changed from
HARD_OFFLINE to AWAITING_RECOVERY
2011/05/11 15:57:35 FATAL State of host 'mysql1' changed from
AWAITING_RECOVERY to ONLINE because it was down for only 18 seconds
2011/05/11 16:00:27 FATAL State of host 'mysql2' changed from
HARD_OFFLINE to AWAITING_RECOVERY
2011/05/11 16:00:30 FATAL State of host 'mysql2' changed from
AWAITING_RECOVERY to ONLINE because it was down for only 12 seconds

This is pretty much the state of affairs every time. My thought is
that the issue is the MySQL agent is not able to connect to the local
MySQL database and that this is causing mysql-mmm monitor to think the
host is offline. I tested local access using the agent and even added
a agent@localhost user just in case with the same privileges as the
subnet-specific user I'd created already. I still get the same issue.

On May 11, 10:51 am, Manuel Arostegui Ramirez

<manuel.todoli...@gmail.com> wrote:
> Make sure the IP ain't moving around both machines.
> Try to do a 'watch ip addr' and check if it's going back and forward between
> then.
>
> Manuel.
>

> 2011/5/10 iglablues <stevielivesh...@gmail.com>

Reply all

Reply to author

Forward