mysql DEFUNCT process issues

drk

unread,

Jun 3, 2009, 8:54:48 AM6/3/09

to MySQL Multi Master Manager Development

Hi list,

Im using mmm 1.2.3 in production already (to answer some questions
before) and its working perfectly. I just have one very bad situation.
And I know its not abut MMM but would be great to have a sollution ;)

Because of some different sollution (cant explain sorry) I had
yesterday an interesting problem. The mysql process become a defunct
process. Afaik a process is become a defunct process when its stopped
already, but still waiting for a parent process and its eated up
30-100% CPU(looked like its working on something). Usualy this happens
when something goes very wrong, this time i think its about some very
badly happend IO issues.
I tried kill -9 and even reboot, but nothing worked.

So the problem is, when I tried to connect to the mysql its just
become an idle connection. Its did nothing but what is worst it doesnt
timed out. I dont know why. Not even when the server was hard
reseted!! (im not that wannabe sysadmin kind of guy, so its true like
this :) )
So all the checker connections (nagios, mmm_mon and everything) was
stocked in a state where the connections was become idle and made no
traffic but doesnt timed out. So MMM doesnt changed the master and non
of my monitoring applications find out that is something wrong. Even
after the hard reset when the master came back, the replications are
still stopped. the slave thread doesnt disconnected and its doesnt
show any delay or error or anything. I had to stop and start manualy
on each slave which connected to that master.

You can simulate this situation with a bash fork bomb. Something
like :

bomb.sh
<---
#!/bin/bash

while(true) {
bomb.sh &
}
--!>

After running this, your machine become unreachable but its will
answer to ping requests and you can open an ssh connection but its
wont be timeout or interrupted after you reset the host machine.

When I was thinking about how to handle this i thought some kind of a
watchdog of the health checks could be useful. What about implementing
something like this to mmm2?

Regards,
Istvan Podor

Baron Schwartz

unread,

Jun 3, 2009, 11:27:54 AM6/3/09

to mmm-...@googlegroups.com

Your server was probably doing InnoDB recovery. Did you check the error logs?

--
Baron Schwartz, Director of Consulting, Percona Inc.
Our Blog: http://www.mysqlperformanceblog.com/
Our Services: http://www.percona.com/services.html

Arjen Lentz

unread,

Jun 3, 2009, 7:38:08 PM6/3/09

to mmm-...@googlegroups.com

Hi Baron

On 04/06/2009, at 1:27 AM, Baron Schwartz wrote:
> Your server was probably doing InnoDB recovery. Did you check the
> error logs?

Hmm, mysqld shouldn't survive a kill -9 unless something else is wrong
in kernel land.

Cheers,
Arjen.

--
Arjen Lentz, Director @ Open Query (http://openquery.com)
Affordable Training and ProActive Support for MySQL & related
technologies

Follow our blog at http://openquery.com/blog/
OurDelta: free enhanced builds for MySQL @ http://ourdelta.org

Baron Schwartz

unread,

Jun 3, 2009, 8:18:20 PM6/3/09

to mmm-...@googlegroups.com

Arjen,

On Wed, Jun 3, 2009 at 7:38 PM, Arjen Lentz <ar...@openquery.com> wrote:
>
> Hi Baron
>
> On 04/06/2009, at 1:27 AM, Baron Schwartz wrote:
>> Your server was probably doing InnoDB recovery. Did you check the
>> error logs?
>
> Hmm, mysqld shouldn't survive a kill -9 unless something else is wrong
> in kernel land.

He said

>>> 30-100% CPU(looked like its working on something). Usualy this

That is not a defunct process.

>>> I tried kill -9 and even reboot, but nothing worked.

I'm not 100% sure here, but it sounds to me like "I killed it and
restarted it, and it was using 30-100% CPU but I could not connect.
So I rebooted, and after reboot it was using 30-100% CPU and I still
could not connect." In which case, it is probably just doing crash
recovery.

But I don't think there is enough information in the original thread
to make any better guess than that.

Istvan Podor

unread,

Jun 4, 2009, 1:51:53 AM6/4/09

to mmm-...@googlegroups.com

Morning Baron, Arjen and list,

So as I read your suggestions ,its wierd. At first thanks for your help.

I have seen in the top it is a defunct process. And i know, mysql
shouldnt survive a kill -9.
So i seen its defuncted, its eated up a cpu core and Im sure, 100%
sure that it wasnt restarted. I had to ask the 7/24guys to press the
reset button for me, becouse its doesnt even can reboot. Linux kernel
cant shut down this process and it will wait for exit. I checked the
logs and all the usually thinks before it was reseted. Seen nothing..
And I know how does an innodb recovery looks like. It was ra right
after the reboot :)

I think Baron's right i mean, something really goes wrong with this
kernel ;) Since we have this architechture in servers this is ~3 time
i have to wake up in the evening and fix it. I thought its something
about the drbd and maybe that one is about to blame, but now without
drbd its still did the same. (different mashines, not one. I had the
same problem with 6different hardware)
So now actually I blame the raid controller with the architechture
got by intel.

So I should debug while it was up, but I had to fix it asap ;) So you
guys doesnt even seen anything like this before right? kind of
relaxing, couz not im the one who doesnt understand. ;)

regards,
Istvan Podor

PS.: My english is still far from perfect sorry for that :)

Reply all

Reply to author

Forward