This is a bug report for perl from je...@hedden.us
generated with the help of perlbug 1.34 running under perl v5.8.2.
I have uncovered the following 'issue' (I feel it is a bug) with
Perl: Namely,
it is possible for a signal handler defined locally inside an eval
block to be
executed outside the scope of that eval block.
There appears to be a 'gap' between the end of an eval block and the
restoration of the %SIG hash to remove any signal handlers defined
locally
inside the eval block.
>From perl581delta.pod: Perl no longer handles signals immediately but
instead
"between opcodes" when it is safe to do so.
However, if one or more opcodes mark the end of the eval block, and
further
opcodes follow to restoring the %SIG hash, but a signal occurs in
between them,
then the (now defunct) signal handler defined inside the eval block
will get
executed outside of the eval block. This appears to be what is
happening.
It is not enough to suppress signal handling between opcodes. Perl
needs to go
further and ensure safe signal handling when the %SIG hash is
modified inside
an eval block.
In one system I have developed, I have encounted this bug in two very
different
scenarios. Further, I have been able to create a Perl script that
reproduces
this bug (below).
In my system, I can workaround this bug by nesting the eval block that
contains the locally defined signal handlers inside another eval
block:
eval {
eval {
local $SIG{'CHLD'} = sub { die("SIGCHLD\n"); };
# Do work
};
};
However, this may not be a suitable for every situation.
The following code reproduces the bug:
#!/usr/bin/perl
#####
#
# Test program to reproduce the following Perl bug:
# It is possible for a signal handler defined locally inside an
# eval blocks to be executed outside the scope of the eval block.
#
# Just execute this Perl script.
# It will eventually (after a few minutes) exit when the bug occurs.
#
#####
use strict;
use warnings;
use Time::HiRes qw( usleep );
my $CHILD_MAX = 25; # Max number of children to run
my $child_count = 0; # Count of children currently running
my $child_done = 0; # Flag that a child has terminated
my %child_pids; # Holds child processes' PIDs
# Set the flag that a child has terminated
$SIG{'CHLD'} = sub { $child_done = 1; };
# Loop until the bug occurs
do {
# Cleanup any terminated children
if ($child_done) {
$child_done = 0;
# Check all child processes using non-blocking waitpid() call
foreach my $pid (keys(%child_pids)) {
if (waitpid($pid, 1) == $pid) { # 1 = POSIX::WNOHANG
delete($child_pids{$pid});
$child_count--;
}
}
}
# Start more children
while ($child_count < $CHILD_MAX) {
my $pid;
if (($pid = fork()) == 0) {
# Child sleeps for a random amount of time and then exits
my $usec = 950000 + int(rand(100000));
usleep($usec);
exit(0);
}
# Parent remembers the child's PID for later cleanup
$child_pids{$pid} = undef;
$child_count++;
}
# Try to recreate the bug
eval {
eval {
# Local signal handler to 'kill' the sleep() call below
local $SIG{'CHLD'} = sub { die("SIGCHLD\n"); };
sleep(1); # Hang around a bit
};
# Set the flag for cleaning up terminated child processes
if ($@ && ($@ =~ /CHLD/)) {
$child_done = 1;
}
};
# Keep looping until the bug occurs
} while (! $@);
# When we get here, it shows that the signal handler
# defined inside the inner eval block above was
# executed OUTSIDE the scope of the inner eval block.
print("Bug detected: $@");
exit(1);
# EOF
---
Flags:
category=core
severity=high
---
Site configuration information for perl v5.8.2:
Configured by Gerrit at Fri Nov 7 12:03:56 2003.
Summary of my perl5 (revision 5.0 version 8 subversion 2)
configuration:
Platform:
osname=cygwin, osvers=1.5.5(0.9432),
archname=cygwin-thread-multi-64int
uname='cygwin_nt-5.0 troubardix 1.5.5(0.9432) 2003-09-20 16:31
i686 unknown unknown cygwin '
config_args='-de -Dmksymlinks -Duse64bitint -Dusethreads
-Doptimize=-O2 -Dman3ext=3pm'
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=undef uselongdouble=undef
usemymalloc=y, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing',
optimize='-O2',
cppflags='-DPERL_USE_SAFE_PUTENV -fno-strict-aliasing'
ccversion='', gccversion='3.3.1 (cygming special)',
gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define,
longdblsize=12
ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='ld2', ldflags =' -s -L/usr/local/lib'
libpth=/usr/local/lib /usr/lib /lib
libs=-lgdbm -ldb -lcrypt -lgdbm_compat
perllibs=-lcrypt -lgdbm_compat
libc=/usr/lib/libc.a, so=dll, useshrplib=true, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=dll, d_dlsymun=undef, ccdlflags=' -s'
cccdlflags=' ', lddlflags=' -s -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.2:
/usr/lib/perl5/5.8.2/cygwin-thread-multi-64int
/usr/lib/perl5/5.8.2
/usr/lib/perl5/site_perl/5.8.2/cygwin-thread-multi-64int
/usr/lib/perl5/site_perl/5.8.2
/usr/lib/perl5/site_perl
.
---
Environment for perl v5.8.2:
HOME=/home/jhedden
LANG=C
LANGUAGE=C
LC_ALL=C
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/jhedden/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin
:/cygdrive/c/blp/API/dde:/cygdrive/c/WINNT/system32:/cygdrive/c/WINNT:
/cygdrive/c/WINNT/System32/Wbem:/cygdrive/c/blp/API:/cygdrive/c/Progra
m Files/Hummingbird/Connectivity/7.10/Accessories/:.
PERL_BADLANG (unset)
SHELL (unset)
Thanks for your detailed report.
In fact perl restores the outer context and while doing so restores
the un-local()ized %SIG.
What you're suggesting is that %SIG should be restored before the
eval block context be destroyed ; or something like that. This is a
tricky area.
> In my system, I can workaround this bug by nesting the eval block that
> contains the locally defined signal handlers inside another eval
> block:
> eval {
> eval {
> local $SIG{'CHLD'} = sub { die("SIGCHLD\n"); };
>
> # Do work
> };
> };
> However, this may not be a suitable for every situation.
is the outer eval really necessary ? ie. can't
do {
eval {
local $SIG{'CHLD'} = sub { die("SIGCHLD\n"); };
# Do work
};
};
suit your needs as well ?
Did you mean the other way around: eval { do { ... } }? IIUC, the
problem he is having is that the die is sometimes not trapped (hence
the extra eval layer). Someone who knows the safe signals code well
ought to take a look at this.
That's right of course.
> Someone who knows the safe signals code well
> ought to take a look at this.
Safe signal experts, please raise hands.
I'm suggesting that the restoration of the context outside the eval
block plus the restoration of the $SIG hash be made into an atomic
operation. I.e., signal process needs to be suspended from the point
just before the eval block context is being destroyed until after the
%SIG has is restored.
I'll try this. Thanks.
Experts? Not sure we have any...
All Safe signals did was make C level handler set a flag
rather than call code. Then add something to op dispatch to check the
flag and call the code. If we are in op dispatch we ain't in malloc()
so the original un-safe behaviour is fixed.
There isn't a concept of "signal mask" (cf interrupt mask on HW),
it is just a question of when you look at the flag.
What we need is a scope-exit expert. To be used this way
the local-ness needs to be undone before the eval-ness.
(With before used in sense of ops and so polling of flag.)
The code looks dubious anyway - there is no certaintly that
sleep() will be interrupted to a high enough level to
reach an op-dispatch and do the die. It works most places
because of test for EINTR but if you have SA_RESTART on
you may not get there.
We don't have a mechanism to do that as far as I know.
Having looked into this further, the problem is that although a single
opcode (leavetry) is responsible for both popping the eval context and then
leaving scope (and so restoring the old signal handler), a
PERL_ASYNC_CHECK() is called as part of that restoration.
In particular, the act of restoring the old (magical) value of $SIG{CHLD}
causes Perl_magic_setsig() be called, which is responsible for updating
PL_psig_ptr[i] etc. However, Perl_magic_setsig() does roughly the
following:
block signals
ENTER;
save a destructor that will restore old sigmask
PERL_ASYNC_CHECK();
PL_psig_ptr[i] = old value of $SIG{CHLD};
LEAVE;
Presumably the logic is intended to process any pending signals using the
old handler before restoring the new.
However, the calling sequence will be something along the lines of:
pp_an_op_of_some_decription()
PERL_ASYNC_CHECK()
(*)
pp_leavetry()
POP eval context
LEAVE
*(SV**)ptr = old value of $SIG{CHLD};
SvSETMAGIC(value)
Perl_magic_setsig()
(**) block signals
PERL_ASYNC_CHECK()
PL_psig_ptr[i] = old value of $SIG{CHLD};
PERL_ASYNC_CHECK()
pp_another_op_of_some_decription()
(where indentation shows things called from the outdented thing above it)
So if a SIGHLD arrives between (*) and (**), the signal will be handled
after the eval context is popped, but before the old handler has been
fully restored.
That's the diagnosis. Its too late in the day for my brain to come
up with any solutions.
Happy 2004 and all that, p5pers!
Dave.
--
print+qq&$}$"$/$s$,$*${d}$g$s$@$.$q$,$:$.$q$^$,$@$*$~$;$.$q$m&if+map{m,^\d{0\,},,${$::{$'}}=chr($"+=$&||1)}q&10m22,42}6:17*2~2.3@3;^2dg3q/s"&=~m*\d\*.*g