Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ECHILD error after system() call

1,984 views
Skip to first unread message

Ahmed Rahal

unread,
Apr 5, 2005, 12:52:22 PM4/5/05
to
Hi,

I am running into a strange problem.
I try to find out why an apparently harmless system() call is returning
error -1 with errno 10.
ERRNO 10 is ECHILD, meaning either there is no child, or the child does not
belong to us.

This program was running flawless on several linux versions, but recently
we moved to a FedoraCore 3 (2.6.9-1.667smp kernel) and this error
started happening.

I have written a small test program and could not force the system() call
to produce a similar error.

The place where this happens is a forked child process. The parent process
forks on an incoming tcp connection, then the child process runs an external
command. Even running "/bin/ls" resulted in ECHILD error and forcing a 0
exit status didn't help. I guess that the problem happens after the command
successfully completing.

I guess that this is context linked as a small standalone program cannot
simulate the problem. Is there any detail I should look for in the source
that could trigger such a behaviour ?

thanks,

Ahmed RAHAL.


Kasper Dupont

unread,
Apr 5, 2005, 1:13:07 PM4/5/05
to
Ahmed Rahal wrote:
>
> Hi,
>
> I am running into a strange problem.
> I try to find out why an apparently harmless system() call is returning
> error -1 with errno 10.
> ERRNO 10 is ECHILD, meaning either there is no child, or the child does not
> belong to us.

Does the program change the handler for SIGCHLD?

--
Kasper Dupont

Ahmed Rahal

unread,
Apr 5, 2005, 1:20:50 PM4/5/05
to

"Kasper Dupont" <kas...@daimi.au.dk> a écrit dans le message news:
4252C723...@daimi.au.dk...

The only signal used is SIGALRM.
The only "flaw" I could detect was that the signal() call was set on
SIGALRM, but
never "unset", alarm(0) is used to reset the timer.
Besides, I tried to check any SIGCHLD handler manipluation. Found none.
The man page says that during system() call, SIGCHLD is blocked.
The system() call uses wait() to catch the end of the external program,
but wait() may return ECHILD if the process does not exist. I however
cannot see how such a condition may rise.

---

Ahmed RAHAL


Kasper Dupont

unread,
Apr 5, 2005, 3:24:11 PM4/5/05
to
Ahmed Rahal wrote:
>
> The system() call uses wait() to catch the end of the external program,
> but wait() may return ECHILD if the process does not exist. I however
> cannot see how such a condition may rise.

I would suspect this could happen if SIGCHLD was set to SIG_IGN.
I'd try inserting code before and after the call of system, to
verify what SIGCHLD was really set to. And I'd try using strace
to find out what happens during the call of system.

At which point does it fail? Does it actually execute the command?

--
Kasper Dupont

Basile Starynkevitch [news]

unread,
Apr 5, 2005, 3:44:05 PM4/5/05
to
On 2005-04-05, Ahmed Rahal <ara...@taranis-services.fr> wrote:
> Hi,
>
> I am running into a strange problem.
> I try to find out why an apparently harmless system() call is returning
> error -1 with errno 10.
> ERRNO 10 is ECHILD, meaning either there is no child, or the child does not
> belong to us.
>
> This program was running flawless on several linux versions, but recently
> we moved to a FedoraCore 3 (2.6.9-1.667smp kernel) and this error
> started happening.

In addition of the other replies you've got, I suggest using strace on
your program, provided you can repeat the bug.

Perhaps some limit is reached (but this is only a naive guess)

regards


--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net
aliases: basile<at>tunes<dot>org = bstarynk<at>nerim<dot>net
8, rue de la Faïencerie, 92340 Bourg La Reine, France

Ahmed Rahal

unread,
Apr 5, 2005, 3:50:21 PM4/5/05
to

"Kasper Dupont" <kas...@daimi.au.dk> a écrit dans le message news:
4252E5DB...@daimi.au.dk...

I will try to check the value of SIGCHLD before and after the call.
As man page advises, wait() may fail because of SIG_IGN.

The most courious is that the command does not fail, it actually executes
well, but the return code of system() that is checked indicates the ECHILD
error.
Therefore I think the problem arises only after the complete execution of
the
requested system command.

Thanks for your help so far, I'll post as soon as I get SIG_IGN tested.

---

Ahmed RAHAL.


0 new messages