Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Wait()ing on fork() but still have <defunct>s

320 views
Skip to first unread message

Pat Deegan

unread,
Dec 23, 2001, 5:34:40 PM12/23/01
to
Aloha.

I've tried a number of different methods and read the FAQ but still can't figure out
how to get rid of the <defunct>s poping up in the process table. Basically, a server
running on a linux system detaches from the controlling tty and waits for connections.
It forks a child per TCP/IP connection, which exits at the end of the transaction.
This child remains <defunct> in the ps table until a new connection arrives at which
point it is somehow reaped.

The code I am using now forks once to detach from the terminal then again for each
incomming connection and looks like this:



// First fork, used to detach from terminal
if (! fork())
{
// This is the child process
setsid();
close(0);
close(1);
close(2);
} else
{
// This is the parent process...

// Can't use wait here as it causes the /etc/rc.d/init.d script to hang...
// while(waitpid(-1,&status,WNOHANG) > 0); /* clean up child processes */

// need to go away in order to relinquish the tty
exit(0);
}

// Second fork, used to handle a connection
if (!fork()) { /* this is the child process */

/* Handle the connection ... */
exit(-1);
}


while(waitpid(-1,&status,WNOHANG) > 0); /* clean up child processes */


I've tried various combinations of wait/waitpid but can't seem to get it right - what
am I missing?

The complete code is available at http://moxy.psychogenic.com/

Thanks in advance!

Regards,
Pat Deegan

Artie Gold

unread,
Dec 23, 2001, 7:35:45 PM12/23/01
to
Pat Deegan wrote:
>
> Aloha.

Aloha.
Nice to see you found the right room. ;-)

OK. Consider what happens here.
The first time the above line is executed, it is likely that the
fork()-ed processes are still alive; therefore waidpid() returns a -1
and the loop is terminated.

What you want is to keep track of the number of child processes you
create (in this case 2), and loop your call to waitpid() until its
return value is > 0 that number of times (again, in this case, twice).
That would be correct; alas, it would be far from optimal (as the loop
would eat up time).

A better approach would be to do the same thing but put a call to
sleep() in the loop.

HTH,
--ag

Tobias Oed

unread,
Dec 26, 2001, 3:37:16 PM12/26/01
to

You can use a signal handler for SIGCHLD. Here is a
snipset of some of my code. I'm sure that by tweaking the
bits of the sa_flags you can get rid of the while() and
use a blocking waitpid call. Anyone?

static void sig_chld_handler(int signal){
while(waitpid(-1,NULL,WNOHANG)>0){
dfprintf((DEBP"one model done\n"));
}
}

static void start_handling_signals(void){

struct sigaction sig_chld;

sig_chld.sa_handler=sig_chld_handler;
sig_chld.sa_flags=0;
(void) sigemptyset(&sig_chld.sa_mask);

if(sigaction(SIGCHLD,&sig_chld,NULL)){
perror("daemon: couldn't install sig_chld handler");
exit(EXIT_FAILURE);
};
}

int main(void){


start_handling_signals();

....

Tobias.

Eyck Warich

unread,
Jan 8, 2002, 7:02:48 AM1/8/02
to
On Wed, 26 Dec 2001 15:37:16 -0500, Tobias Oed
<tob...@physics.odu.edu> wrote:

[...]

>You can use a signal handler for SIGCHLD. Here is a
>snipset of some of my code. I'm sure that by tweaking the
>bits of the sa_flags you can get rid of the while() and
>use a blocking waitpid call. Anyone?
>
>static void sig_chld_handler(int signal){
> while(waitpid(-1,NULL,WNOHANG)>0){
> dfprintf((DEBP"one model done\n"));
> }
>}

We have tricky problem with that... We have a network server process
listining for connections and forking worker processes for doing the
work. After a worker finished, its zombie is (or shall be) removed by
the servers SIGCHLD signal handler, which is quite similar to the one
shown above. Most of the times that works, but in some cases i get a
stuck server process (not running, no more connections accepted).
Looking with gdb attached to the server process i see the process
hanging in its signal handler waiting on waitpid() (assuming that gdb
shows things right). Of course that could be the reason for the whole
process to block, but:

1. I have no real idea why waitpid(WNOHANG) could/should block?

On the call stack i can see something like

fork() From server process, forking a new worker child
...
sigchildhandler() Server process signal handler
...
waitpid() Blocking position

2. Any possibility for races between
fork()/exit()/signals()/signalhandler(); i mean could it be a problem
that the server is in generating the new child process and gets
interrupted from the SIGCHLD signal?

With regards,
Eyck

Eyck Warich

unread,
Jan 8, 2002, 8:31:49 AM1/8/02
to
On Tue, 08 Jan 2002 13:02:48 +0100, Eyck Warich <war...@secunet.de>
wrote:

>We have tricky problem with that...

We moved on a bit and provided a test program. It does nothing than
creating ONE child process which does run forever (no exit) and n
child processes, which do nothing and return quite fast und shall get
collected by the server.

The effect is that the server gets stuck in its signal handler when
coming the forever-running child (which is not terminating, therefore
no exit status is available). When we leave out the single child
everything runs fine.

Is seems that WNOHANG does not have the intended effect (give me the
status back if a child terminated, but if no child terminated simply
go on)?

--

What we have in mind: We have different kinds of child processes, with
different life times. The termination of the longer durating child
shall be catched as the termination of the shorter running ones. It
seems that waitpid() gets stuck when running over our longer running
processes. Did we miss(understand) something from waitpid()?

--

Solaris 7 on Sparc5, gcc2.95.2

With regards,
Eyck

---

#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <iostream>
#include <signal.h>

using namespace std;

int nChildCounter = 0;
int nChilds = 0;

static void OnChildTerm (int);
static void DoNothing (void);

int main (int argc, char **argv)
{
if (argc != 2)
{
cout << "usage: " << argv [0] << " <no_of_childs>" << endl;
return 0;
}

signal (SIGCHLD, OnChildTerm);

pid_t pidDoNothing = fork ();

if (pidDoNothing == 0) // child
{
DoNothing ();
}
else
{
// get number of childs:
nChildCounter = atoi (argv [1]);

cout << "Try to work with " << nChildCounter << " child
processes." << endl;

for (int i = 0; i < nChildCounter; i++)
{
pid_t pidChild = fork ();

if (pidChild == 0) // child
{
cout << "child " << i << endl;
usleep (30000);
exit (0);
}
else if (pidChild > 0) // parent
{
;
}
else // error
{
cout << "fork error: " << strerror (errno) << "!" << endl;
}
}

for (int i = 0; i < 5; i++)
usleep (999999);

cout << "OK, test over; got " << nChilds << " child processes." <<
endl;
}

return 0;
}

void OnChildTerm (int signo)
{
int nStatus = 0;
pid_t pidChild = 0;
int nOptions = WNOHANG;

do
{
pidChild = waitpid (-1, &nStatus, nOptions);

if (pidChild == -1)
{
//cout << "waitpid() returned; no more terminated childs" <<
endl;
//cout << "errno: " << strerror(errno) << endl;
}
else if (pidChild == 0)
{
//cout << "waitpid() returned; status not available" << endl;
}
else
{
//cout << "waitpid() returned " << pidChild << ". " << endl;
nChilds++;
//cout << "Child counter = " << nChilds << "." << endl;
}
} while (pidChild >= 0);

signal (SIGCHLD, OnChildTerm);
}

void DoNothing (void)
{
cout << "Do nothing." << endl;

while (true)
usleep (1000);
}

Rainer Temme

unread,
Jan 8, 2002, 8:49:08 AM1/8/02
to
"Eyck Warich" <war...@secunet.de> wrote:

> void OnChildTerm (int signo)
> {
> int nStatus = 0;
> pid_t pidChild = 0;
> int nOptions = WNOHANG;
>
> do
> {
> pidChild = waitpid (-1, &nStatus, nOptions);
>
> if (pidChild == -1)
> {
> //cout << "waitpid() returned; no more terminated childs" <<
> endl;
> //cout << "errno: " << strerror(errno) << endl;
> }
> else if (pidChild == 0)
> {
> //cout << "waitpid() returned; status not available" << endl;
> }
> else
> {
> //cout << "waitpid() returned " << pidChild << ". " << endl;
> nChilds++;
> //cout << "Child counter = " << nChilds << "." << endl;
> }
> } while (pidChild >= 0);
>
> signal (SIGCHLD, OnChildTerm);
> }

Hi Eyck,

according to my manual-pages,

waitpid(...WNOHANG) will return 0 (zero) if no more terninated
children are available.... therefore, your while-loop condition
should read like ... while (pidChild > 0); ... (not while (pidChild >= 0); )

Remark: You might also switch to sigaction() instead of signal().

Regards ... Rainer

Andrew Chesnokov

unread,
Jan 8, 2002, 8:49:58 AM1/8/02
to
Hi.

> We have tricky problem with that... We have a network server process
> listining for connections and forking worker processes for doing the
> work. After a worker finished, its zombie is (or shall be) removed by
> the servers SIGCHLD signal handler, which is quite similar to the one
> shown above. Most of the times that works, but in some cases i get a
> stuck server process (not running, no more connections accepted).

> 1. I have no real idea why waitpid(WNOHANG) could/should block?
>
> On the call stack i can see something like
>
> fork() From server process, forking a new worker child
> ...
> sigchildhandler() Server process signal handler
> ...
> waitpid() Blocking position
>
> 2. Any possibility for races between
> fork()/exit()/signals()/signalhandler(); i mean could it be a problem
> that the server is in generating the new child process and gets
> interrupted from the SIGCHLD signal?
>

I didn't seen your code, but may be following issue will be helpfull.


Everything depends on the way you use to save the information about started
child process. Your sigchildhandler() can be called BEFORE your parent
process save any information about started child. In other words if you
have
something like this:

pid = fork()
switch(pid) {

case -1: // error with fork()
return -1;
case 0: // child
do something;
default: // parent

AddInfoAboutStartedChild(pid);

}


sigchildhandler() {
pid = waitpid(...);
UseInfoAboutStartedChild(pid); <--- This code can be called before your
main process call AddInfoAboutStartedChild(pid)
}

If your child process can be exited very fast then you should avoid to do
any actual processing in sigchildhandler().
You should set some kind of flag and use it in main parent process "thread".

Andrew

Eyck Warich

unread,
Jan 8, 2002, 9:12:28 AM1/8/02
to
On Tue, 8 Jan 2002 14:49:08 +0100, "Rainer Temme"
<Rainer...@Kstm.Sbs.NoSpam.De> wrote:

[...]

>waitpid(...WNOHANG) will return 0 (zero) if no more terninated
>children are available.... therefore, your while-loop condition
>should read like ... while (pidChild > 0); ... (not while (pidChild >= 0); )

Also found that, but i'm wondering about waitpid(WNOHANG) blocking
(maybe at the 2nd call with no children available?)?

Eyck

Barry Margolin

unread,
Jan 8, 2002, 12:07:31 PM1/8/02
to
In article <qcvl3usssc70pmnbq...@4ax.com>,

Are you sure it's really blocking in waitpid()? With the above error, your
program would go into an infinite loop repeatedly calling waitpid(), so
when you stop it in the debugger it's likely to be in that function.

Try using truss to see whether it's blocking or looping.

--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Tobias Oed

unread,
Jan 8, 2002, 5:24:11 PM1/8/02
to

waitpid(WNOHANG) won't hang. You need to fix your signal handler:

void OnChildTerm (int signo)
{
int nStatus = 0;
pid_t pidChild = 0;
int nOptions = WNOHANG;

cout << "entering sighandler" << endl;

do
{
pidChild = waitpid (-1, &nStatus, nOptions);
if (pidChild == -1)
{

cout << " error in waitpid():" << strerror(errno) << endl;
}
else if (pidChild > 0)
{
nChilds++;
cout << " Child counter = " << nChilds;
cout << " after " << pidChild << " finished w status " << nStatus
<< endl;
}
} while (pidChild > 0);

cout << "exiting sighandler" << endl;

signal (SIGCHLD, OnChildTerm);
}

This one works but will miss some of the children.
That's because you use signal instead of sigaction. When the signal
handler is entered the signal is reset to the default action
(That's why you have to reset the handler). With sigaction
the signal is blocked and will be triggered again as you
exit the signal handler.
Tobias.

0 new messages