Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

deadlock in waitchld()

2 views
Skip to first unread message

Roman Rakus

unread,
May 20, 2013, 10:11:01 AM5/20/13
to bug-...@gnu.org
Bash hangs in wait4() (WAITPID) if TERM signal trap handler is executed
during execution of pipeline.

RR

Reproducer:
#!/bin/bash

trap "/bin/echo trapped $$" TERM

printf '%d\n' $$

while :; do

dd if=/dev/zero bs=1k count=128 2>&1 | cat > /dev/null

done

and bombard the bash process with TERM signals:
#!/bin/bash
while :; do
kill -TERM $1 || break
usleep 100000
done


Roman Rakus

unread,
May 24, 2013, 8:12:41 AM5/24/13
to bug-...@gnu.org
I have done a bit of debugging and have some results:
The problem is that waitchld is called even if there aren't any children
running. The waitchld() is called more then once. I don't know what is
the logic behind it.

Will check, if there is any child, helps?

RR

Roman Rakus

unread,
May 24, 2013, 9:30:05 AM5/24/13
to bug-...@gnu.org
The race is in do-while loop in wait_for(). At the start of wait_for()
we are blocking SIGCHLD, however echo process ends during the loop and
we don't register it (don't handle SIGCHLD, which is sent).
Looking at the code, there is MUST_UNBLOCK_CHLD. May it make any harm to
enable it by default?

RR

Chet Ramey

unread,
May 24, 2013, 10:00:34 AM5/24/13
to Roman Rakus, bug-...@gnu.org, chet....@case.edu
On 5/24/13 9:30 AM, Roman Rakus wrote:
> The race is in do-while loop in wait_for(). At the start of wait_for() we
> are blocking SIGCHLD, however echo process ends during the loop and we
> don't register it (don't handle SIGCHLD, which is sent).
> Looking at the code, there is MUST_UNBLOCK_CHLD. May it make any harm to
> enable it by default?

I spent a lot of time looking at this yesterday, and I have it pretty much
fixed. The problem is in the trap handling code, not wait_for(). The trap
problem ends up corrupting the jobs data structure, which is why wait_for
misbehaves.

Chet

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/

Roman Rakus

unread,
Jun 12, 2013, 3:00:30 AM6/12/13
to chet....@case.edu, bug-...@gnu.org
On 05/24/13 16:00, Chet Ramey wrote:
> On 5/24/13 9:30 AM, Roman Rakus wrote:
>> The race is in do-while loop in wait_for(). At the start of wait_for() we
>> are blocking SIGCHLD, however echo process ends during the loop and we
>> don't register it (don't handle SIGCHLD, which is sent).
>> Looking at the code, there is MUST_UNBLOCK_CHLD. May it make any harm to
>> enable it by default?
> I spent a lot of time looking at this yesterday, and I have it pretty much
> fixed. The problem is in the trap handling code, not wait_for(). The trap
> problem ends up corrupting the jobs data structure, which is why wait_for
> misbehaves.
>
> Chet
>
Hi Chet,
Will be patch available somewhere? I checked devel branch on git. I
would like to apply or backport a patch to older version.

RR

Chet Ramey

unread,
Jun 12, 2013, 10:07:05 AM6/12/13
to Roman Rakus, bug-...@gnu.org, chet....@case.edu
I just pushed May's changes up to the devel branch. Look at the changes
from 5/23.
0 new messages