Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Possible bug: Race condition when calling external commands during trap handling

10 views
Skip to first unread message

Tillmann...@telekom.de

unread,
May 2, 2012, 7:16:23 AM5/2/12
to bug-...@gnu.org
Hi,

I have a problem with a trap handler in a script, which is doing some logging and needs external commands (date and hostname). In some cases there seems to be a race condition causing a syntax error every once in a while. I am assuming it is a race condition, because the syntax errors only happen very very infrequently.

I have produced the following script as a small example:

-----------------------

#!/bin/bash

log() {
local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1"
echo $text >> /dev/null
}

thread() {
while true; do
log "Thread is running"
kill -ALRM $$
sleep 1
done
}

trap "log 'received ALRM'" ALRM

thread &
trap "kill $?; exit 0" INT TERM


while true; do
log "Main is running"
sleep 1
done

-----------------------

Very infrequently this script will fail with a syntax error in line 5 (echo $text >> /dev/null). The actual error message is:

> /path/to/script.sh: command substitution: line 5: syntax error near unexpected token `)'
> /path/to/script.sh: command substitution: line 5: `hostname -s) $1'

Since there is not "hostname -s) $1" in line 5, I am assuming there also is an off-by-one error and line 4 is actually meant (local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1").

I have encountered this problem both on bash 4.2.24(1)-release (x86_64-pc-linux-gnu) on ubuntu 12.04 as well as on bash 4.1.2(1)-release (x86_64-redhat-linux-gnu) on RHEL 6.2.

There may be something wrong with the way traps are used in this case, but the documentation is very sparse on this topic. I also opened a question on StackOverflow.com (http://stackoverflow.com/questions/10194837/concurrent-logging-in-bash-scripts) but did not receive any usefull answers yet.

Since this is a race condition, it might take a while for the bug to hit. In some cases the script was running up to 30 minutes before the bug triggered.

Please let me know if you have any furhter questions or hints on how to resolve this issue.

Thank you,
Till Crueger

Bob Proulx

unread,
May 3, 2012, 3:07:50 AM5/3/12
to Tillmann...@telekom.de, bug-...@gnu.org
Tillmann...@telekom.de wrote:
> I have produced the following script as a small example:

A good small example! I do not understand the problem but I do have a
question about one of the lines in it and a comment about another.

> trap "kill $?; exit 0" INT TERM

What did you intend with "kill $?; exit 0"? Did you mean "kill $$"
instead?

> local text="$(date +'%Y-%m-%d %H:%M:%S') $(hostname -s) $1"

Note that GNU date can use "+%F %T" as a shortcut for "%Y-%m-%d %H:%M:%S".
It is useful to save typing.

And lastly I will comment that you are doing quite a bit inside of an
interrupt routine. Typically in a C program it is not safe to perform
any operation that may call malloc() within an interupt service
routine since malloc isn't reentrant. Bash is a C program and I
assume the same restriction would apply.

Bob

Tillmann...@telekom.de

unread,
May 3, 2012, 4:05:42 AM5/3/12
to bug-...@gnu.org
Yes, you are correct, that line is buggy and contains a typo. I added it later in a hurry after I could reproduce the error, to ensure a clean shutdown of the script. What I meant to type was:

> trap "kill $!; exit 0" INT TERM

However thinking about it, this also does not work as intended.

The problem exists, though, even if that line is deleted (one just has to kill all remaining threads manually after the crash or after ^C).

If you need, I can update the script with a INT and TERM handler, which actually kills, however since this is not relevant to the problem in question, I did not sent a correction after I noticed the typo.


I am also aware of the strict restrictions on operations allowed during signal handling in C and C++. I tried to find any documentation on allowed operations during trap handlers for bash, but even after a prolonged search in the man and info pages as well as online, I could not find an resources on that topic. The low number of responses to the same question on SO also seems to show, that hardly anyone is aware of such restrictions. If such documentation exists of course this is not bug. In that case my personal suggestion would be to somehow mention the available documentation in the man pages. This would be especially usefull, since it is not very clear, what operations would need a malloc() internally (note that in C most kinds of exec() do malloc() and therefore are not thread safe, however executing external commands is very common in bash, so the restrictions cannot just be derived from the C-side).

I also noted, that the behaviour is different from problems during signal handling. The most likely result of a forbidden operation during signal handling would be a deadlock (since the operation will try to lock the same resource twice in the same thread). However in this case somehow the parser seems to mess up it's internal state, resulting in the parser error I am seeing.

I hope this makes the problem more clear.

Thank you for your feedback,
Till

-----Ursprüngliche Nachricht-----
Von: Bob Proulx [mailto:b...@proulx.com]
Gesendet: Donnerstag, 3. Mai 2012 09:08
An: Crueger, Tillmann
Cc: bug-...@gnu.org
Betreff: Re: Possible bug: Race condition when calling external commands during trap handling

Andreas Schwab

unread,
May 3, 2012, 4:06:09 AM5/3/12
to Tillmann...@telekom.de, bug-...@gnu.org
Bob Proulx <b...@proulx.com> writes:

> And lastly I will comment that you are doing quite a bit inside of an
> interrupt routine. Typically in a C program it is not safe to perform
> any operation that may call malloc() within an interupt service
> routine since malloc isn't reentrant. Bash is a C program and I
> assume the same restriction would apply.

Traps are executed only at command boundaries. Executing them in a
signal handler would make them completely unusable, of course.

Andreas.

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."

Linda Walsh

unread,
May 21, 2012, 8:06:28 PM5/21/12
to Andreas Schwab, Tillmann...@telekom.de, bug-...@gnu.org


Andreas Schwab wrote:

> Bob Proulx <b...@proulx.com> writes:
>
>> And lastly I will comment that you are doing quite a bit inside of an
>> interrupt routine. Typically in a C program it is not safe to perform
>> any operation that may call malloc() within an interupt service
>> routine since malloc isn't reentrant. Bash is a C program and I
>> assume the same restriction would apply.
>
> Traps are executed only at command boundaries. Executing them in a
> signal handler would make them completely unusable, of course.
>
> Andreas.

---
But a trap HAS to be executed in the trap handler
to reset it self, otherwise you risk losing a signal.

As soon as you exit your trap handler,
your trap handler had better already be in place.
If it isn't and trap comes in before you catch it,
you've lost it.

the "perlipc" manpage explains some of the details
behind this problem. Their suggested handler:

use POSIX ":sys_wait_h";
sub REAPER {
my $child;
# If a second child dies while in the signal handler caused by the
# first death, we won't get another signal. So must loop here else
# we will leave the unreaped child as a zombie. And the next time
# two children die we get another zombie. And so on.
while (($child = waitpid(-1, WNOHANG)) > 0) {
$Kid_Status{$child} = $?;
}
$SIG{CHLD} = \&REAPER; # still loathe SysV
}
$SIG{CHLD} = \&REAPER;
# do something that forks...

---
I think it was sysV compat libs that are unsafe (you have to reset
handler each signal)...


Andreas Schwab

unread,
May 22, 2012, 3:46:19 AM5/22/12
to Linda Walsh, Tillmann...@telekom.de, bug-...@gnu.org
A trap is not a signal. It doesn't "come in". A trap handler is
executed because some condition is true at a command boundary.

Linda Walsh

unread,
May 22, 2012, 7:47:01 AM5/22/12
to Andreas Schwab, Tillmann...@telekom.de, bug-...@gnu.org


Andreas Schwab wrote:

> A trap is not a signal. It doesn't "come in". A trap handler is
> executed because some condition is true at a command boundary.
>
> Andreas.
>

That still begs the question...

If you are in your trap handler, and you don't reset the signal --
how can you guarantee that your signal handler will be reset
before another even that would cause a trap occurs -- say a child
dying. If you don't have a trap in place and a child dies, do you
lose the indication that a child died? Suppose 2 die and now you install your
trap handler, will you get a second call to your trap handler immediately upon
exit from the first?

I honestly don't know how bash would handle some of these things.

I do know that the input subsystem can drop keys when it switches
from raw to cooked. I don't know if it should, but it happens if
you use something like drops keys when you switch modes alot
like calling
> more testin
#!/bin/bash
stty raw
if read -n1 -t0 char; then
stty cooked
read -n1 -t1 char
echo "($char)"
else
stty cooked
echo "(undef)"
fi
repetitively from a perl script (perl's mechanism for doing this never times out
and returns if a sigwinch comes in...)...so amusingly, it's safer (though a bit
flakey) to periodically call a shell script to poll the keyboard... ;-)...

But the point is, real time events are a pain to get right -- if you don't want
to lose traps corresponding to interrupts, in perl and C, at least, you need to
reset the event handler before returning from processing the current event.

But bash may be different...I was just warning of the possibility of their being
a problem NOT doing it in the handler, as such exists in many other similar
handlers.




Greg Wooledge

unread,
May 22, 2012, 8:28:52 AM5/22/12
to Linda Walsh, bug-...@gnu.org
On Tue, May 22, 2012 at 04:47:01AM -0700, Linda Walsh wrote:
> If you are in your trap handler, and you don't reset the signal --
> how can you guarantee that your signal handler will be reset
> before another even that would cause a trap occurs

You are using the wrong words for everything.

A signal is an asynchronous event, potentially sent by another process.

A signal handler is a bit of code that you write, and then register to
be executed when you receive a signal.

A trap is the same as a signal handler. ''trap'' is the name of the
bash command to register a signal handler (assigning it to a set of
signals).

I do not know the answers to "How does bash implement traps? Is there
a guarantee that no signals will be lost?" Hopefully someone else does.

The man page has only a partial answer. Under JOB CONTROL:

Any trap on SIGCHLD is executed for each child that exits.

Roman Rakus

unread,
May 22, 2012, 8:41:57 AM5/22/12
to bug-...@gnu.org
On 05/22/2012 02:28 PM, Greg Wooledge wrote:
> I do not know the answers to "How does bash implement traps? Is there
> a guarantee that no signals will be lost?" Hopefully someone else does.
I can just imagine a situation when the bash is reading trap from the
source (is going through the script and is on line where trap is) or not
read it yet and the signal is received. Then, of course, your trap
handler is not installed yet.
Another situation: You had previous trap handler and you are installing
new one. The received signals are "paused" for a while and are processed
right after the installation of new trap handler. There was a bug report
against this, I'm not sure if it is fixed.

RR

Chet Ramey

unread,
May 22, 2012, 9:01:58 AM5/22/12
to Roman Rakus, bug-...@gnu.org, chet....@case.edu
On 5/22/12 8:41 AM, Roman Rakus wrote:

> Another situation: You had previous trap handler and you are installing new
> one. The received signals are "paused" for a while and are processed right
> after the installation of new trap handler. There was a bug report against
> this, I'm not sure if it is fixed.

Can you provide a reference to that report?


--
``The lyf so short, the craft so long to lerne.'' - Chaucer
``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU ch...@case.edu http://cnswww.cns.cwru.edu/~chet/

Roman Rakus

unread,
May 22, 2012, 9:16:33 AM5/22/12
to bug-...@gnu.org
On 05/22/2012 03:01 PM, Chet Ramey wrote:
> On 5/22/12 8:41 AM, Roman Rakus wrote:
>
>> Another situation: You had previous trap handler and you are installing new
>> one. The received signals are "paused" for a while and are processed right
>> after the installation of new trap handler. There was a bug report against
>> this, I'm not sure if it is fixed.
> Can you provide a reference to that report?
>
>
http://lists.gnu.org/archive/html/bug-bash/2011-04/msg00045.html

RR

Chet Ramey

unread,
May 22, 2012, 11:15:57 AM5/22/12
to Roman Rakus, bug-...@gnu.org, chet....@case.edu
According to the bash change log, it was fixed three days later.
0 new messages