Problems with hanging checks

153 views
Skip to first unread message

Eric Lindvall

unread,
Sep 22, 2011, 7:10:19 PM9/22/11
to mmm-...@googlegroups.com
I have experienced some strange issues with checks failing to run after some period of time with MMM 2.2.1.

Here's an example of d01 being stuck in REPLICATION_DELAY even though it actually is all caught up:

# mmm_control show
  d01(10.0.0.200) master/REPLICATION_DELAY. Roles: 
  d02(10.0.0.201) master/ONLINE. Roles: writer(10.0.0.100)

On d01:

# mysql -uroot -e 'show slave status\G' | grep Seconds_Behind
        Seconds_Behind_Master: 0

Here's what the checks show:

 # mmm_control checks all all
d02  ping         [last change: 2011/09/19 00:34:57]  OK
d02  mysql        [last change: 2011/09/19 23:43:30]  OK
d02  rep_threads  [last change: 2011/09/19 00:34:57]  OK
d02  rep_backlog  [last change: 2011/09/19 00:34:57]  OK: Backlog is null
d01  ping         [last change: 2011/09/19 00:34:57]  OK
d01  mysql        [last change: 2011/09/20 01:55:01]  OK
d01  rep_threads  [last change: 2011/09/19 00:34:57]  OK
d01  rep_backlog  [last change: 2011/09/19 00:34:57]  ERROR: Backlog is too big

I was able to solve the issue by killing the rep_backlog checker:

# ps auxwwf | grep mmm
root      7765  0.0  0.2 166212  8892 ?        S    Sep19   0:00 mmm_mond
root      7767  0.2  1.1 703152 48160 ?        Sl   Sep19  12:12  \_ mmm_mond
root      7889  0.1  0.1 155096  7728 ?        S    Sep19   9:45      \_ perl /usr/libexec/mysql-mmm/monitor/checker ping_ip
root      7892  0.1  0.2 185580  9196 ?        S    Sep19   8:47      \_ perl /usr/libexec/mysql-mmm/monitor/checker mysql
root      7894  0.1  0.2 155096  8408 ?        S    Sep19   7:49      \_ perl /usr/libexec/mysql-mmm/monitor/checker ping
root      7896  0.1  3.9 338732 161732 ?       S    Sep19   8:14      \_ perl /usr/libexec/mysql-mmm/monitor/checker rep_backlog
root      7898  0.1  3.9 338732 161892 ?       S    Sep19   7:57      \_ perl /usr/libexec/mysql-mmm/monitor/checker rep_threads

# kill 7896

# mmm_control checks all all
d02  ping         [last change: 2011/09/19 00:34:57]  OK
d02  mysql        [last change: 2011/09/19 23:43:30]  OK
d02  rep_threads  [last change: 2011/09/19 00:34:57]  OK
d02  rep_backlog  [last change: 2011/09/19 00:34:57]  OK: Backlog is null
d01  ping         [last change: 2011/09/19 00:34:57]  OK
d01  mysql        [last change: 2011/09/20 01:55:01]  OK
d01  rep_threads  [last change: 2011/09/19 00:34:57]  OK
d01  rep_backlog  [last change: 2011/09/22 15:55:31]  OK: Backlog is null

# mmm_control show
  d01(10.0.0.200) master/ONLINE. Roles: 
  d02(10.0.0.201) master/ONLINE. Roles: writer(10.0.0.100)

which shows that killing the check did work.

Is this a known issue? Is there a workaround?

Thanks,
Eric
Reply all
Reply to author
Forward
0 new messages