SIGCONT misbehaviour in Linux

Eric PAIRE

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to linux-...@vger.rutgers.edu, bug-...@gnu.org

Hi Linux gurus,

Michael Snyder is currently integrating my linuxthreads debugging support
inside the source tree of GDB at Cygnus, and he notified what I think is a
generic kernel bug in the signal handling:

When a process blocked in the kernel receives a stopping signal (POSIX says
SIGSTOP, SIGTSTP, SIGTTIN and SIGTTOU), then the process stops, and this is
correctly implemented by Linux. *BUT*, when such a process receives a SIGCONT,
then it must continue, whatever signal handling is configured in the process.

The specific problem here is that, if the process is blocked in
sys_nanosleep(), then receiving a SIGSTOP will make it exit from
sys_nanosleep() and enter into TASK_STOPPED state in do_signal().
When it will be awaken via a SIGCONT, then it will exit immediately
from the kernel, whatever time it remains to sleep, even if no signal
handler is attached to SIGCONT, which is not the correct POSIX semantics
(It should only return if there is a signal handler attached to SIGCONT).
Notice also that the remaining time does not take into account the time
during which the process has been stopped.

The general problem here is that the kernel seems to *ALWAYS* return EINTR
when signals have been sent during system calls, *EVEN* when there is no
signal handler attached to the signal, which seems to be in contradiction
with the generic POSIX semantics of EINTR. I have added the glibc-bug
mailing list because I don't know whether the POSIX behaviour should be
handled correctly in the libc or in the kernel.

BTW, a funny user test to show this misbehaviour is to type the following
commands in bash:

sleep 1000
^Z
fg

and the process running sleep 1000 immediatly returns on Linux. I tested it
on other systems and it works correctly (the sleep continue).

Best regards,
-Eric
P.S. The original problem of Michael was with PTRACE_ATTACH, which side effect
is to make a process executing nanosleep() ot immediatly exit from
nanosleep() wheen attached by GDB, which make gdb intrusive in the
process behaviour....
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web : http://www.ri.silicomp.com/~paire | Group SILICOMP - Research Institute
Email: eric....@ri.silicomp.com | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71 | F-38610 Gieres
Fax : +33 (0) 476 51 05 32 | FRANCE

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Simon Kirby

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to Eric PAIRE

Hmm...This works properly on libc5 systems, btw. (glibc2.0 and glibc2.1
use nanosleep(), libc5 uses alarm() and sigsuspend()).

Simon-

[ Stormix Technologies Inc. ][ NetNation Communcations Inc. ]
[ s...@stormix.com ][ s...@netnation.com ]
[ Opinions expressed are not necessarily those of my employers. ]

Eric Paire

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to Simon Kirby

This works for the special case of sleep(), which is the example I took,
just because the libc5 sleep implementation looks for the return value;
but what about the other blocking system calls (like nanosleep) ? do they
check properly on EINTR errno that the SIGCONT received signal did have a
signal handling function at the time they received the signal, and restart
automagically the system call that should not have been interrupted ?
This is the reason why my guess is that this feature should be fixed
in the kernel (if Linux is to be POSIX-compliant).

-Eric

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web : http://www.ri.silicomp.com/~paire | Group SILICOMP - Research Institute
Email: eric....@ri.silicomp.com | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71 | F-38610 Gieres
Fax : +33 (0) 476 51 05 32 | FRANCE

H. Peter Anvin

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to linux-...@vger.rutgers.edu

Followup to: <1999120810...@stormix.com>
By author: Simon Kirby <s...@stormix.com>
In newsgroup: linux.dev.kernel

> >
> > and the process running sleep 1000 immediatly returns on Linux. I tested it
> > on other systems and it works correctly (the sleep continue).
>
> Hmm...This works properly on libc5 systems, btw. (glibc2.0 and glibc2.1
> use nanosleep(), libc5 uses alarm() and sigsuspend()).
>

It really could be argued what is the right behaviour here. When a
system call is interrupted by the signal, the normal thing is to
return EINTR.

-hpa

--
<h...@transmeta.com> at work, <h...@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."

Ulrich Drepper

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to H. Peter Anvin

h...@transmeta.com (H. Peter Anvin) writes:

> > Hmm...This works properly on libc5 systems, btw. (glibc2.0 and glibc2.1
> > use nanosleep(), libc5 uses alarm() and sigsuspend()).
> >
>
> It really could be argued what is the right behaviour here. When a
> system call is interrupted by the signal, the normal thing is to
> return EINTR.

Right. The problem is that the ptrace() call to continue the process
(which implicitly sends a SIGCONT) also wakes up the process. We have
a test program which, if you'd run it normally, would not finish in
aeons. If you run it under gdb with all the ptrace() calls to stop
and continue all the threads, it finishes. This change in behaviour
is not wanted nor can it be avoided by gdb without a kernel change.

--
---------------. drepper at gnu.org ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com `------------------------

Richard B. Johnson

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to H. Peter Anvin

On 8 Dec 1999, H. Peter Anvin wrote:

> Followup to: <1999120810...@stormix.com>
> By author: Simon Kirby <s...@stormix.com>
> In newsgroup: linux.dev.kernel
> > >
> > > and the process running sleep 1000 immediatly returns on Linux. I tested it
> > > on other systems and it works correctly (the sleep continue).
> >

> > Hmm...This works properly on libc5 systems, btw. (glibc2.0 and glibc2.1
> > use nanosleep(), libc5 uses alarm() and sigsuspend()).
> >
>
> It really could be argued what is the right behaviour here. When a
> system call is interrupted by the signal, the normal thing is to
> return EINTR.
>

> -hpa
>

It becomes a definition of BSD_SIGNALS. If I remember correctly,
they, by default, use SA_RESTART as a flag. This way, sleep()
and other system calls automatically restart after a signal. At
the kernel level, any signal delivered to a process, causes a
co-pending system call to return to the caller with -EINTR. It
is the 'C' runtime library that decides, based upon this flag,
if the system call should be restarted or if -1 should be returned
to the caller with errno set to EINTR.

Cheers,
Dick Johnson

Penguin : Linux version 2.3.13 on an i686 machine (400.59 BogoMips).
Warning : The end of the world as we know it requires a new calendar.
Seconds : 2013503 (until Y2K)

Andrea Arcangeli

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to Richard B. Johnson

On Wed, 8 Dec 1999, Richard B. Johnson wrote:

>co-pending system call to return to the caller with -EINTR. It
>is the 'C' runtime library that decides, based upon this flag,
>if the system call should be restarted or if -1 should be returned
>to the caller with errno set to EINTR.

glibc could also return to run the syscall without waiting again from the
beginning by looking at the 'struct timespec *rem'. If there wouldn't be
the `rem` parameter in nanosleep, glibc couldn't wrap the -EINTR
trasparently. But there is.

NOTE: I can as well fix the kernel for this, but I agree with Peter that
returning -INTR looks like the right thing to do. (I don't know which is
the official semantic for the syscall though)

Andrea

Richard B. Johnson

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to Andrea Arcangeli

On Wed, 8 Dec 1999, Andrea Arcangeli wrote:

> On Wed, 8 Dec 1999, Richard B. Johnson wrote:
>
> >co-pending system call to return to the caller with -EINTR. It
> >is the 'C' runtime library that decides, based upon this flag,
> >if the system call should be restarted or if -1 should be returned
> >to the caller with errno set to EINTR.
>
> glibc could also return to run the syscall without waiting again from the
> beginning by looking at the 'struct timespec *rem'. If there wouldn't be
> the `rem` parameter in nanosleep, glibc couldn't wrap the -EINTR
> trasparently. But there is.
>
> NOTE: I can as well fix the kernel for this, but I agree with Peter that
> returning -INTR looks like the right thing to do. (I don't know which is
> the official semantic for the syscall though)
>
> Andrea
>

I think the kernel provides the correct result. The caller either has
to use '_BSD_SIGNALS_' or use code like this:

#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <string.h>

void foo(int unused) { puts("\7Alarm"); }

main(int x)
{
struct sigaction sa;
char buf[1];
int i;
memset(&sa, 0x00, sizeof(sa));
if(x > 1)
sa.sa_flags = SA_RESTART;
sa.sa_handler = foo;
sigaction(SIGALRM, &sa, NULL);
alarm(1);
i = read(0, buf, 1);
printf("%d, %s\n", i, strerror(errno));
}

Depending upon whether anything is on the command-line, the SA_RESTART
flag is set. This allows one to get both kinds of behavior with no
problems. I think the kernel code is correct.

Cheers,
Dick Johnson

Penguin : Linux version 2.3.13 on an i686 machine (400.59 BogoMips).
Warning : The end of the world as we know it requires a new calendar.

Seconds : 2010151 (until Y2K)

Richard B. Johnson

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to Simon Kirby

> By author: Simon Kirby <s...@stormix.com>
> In newsgroup: linux.dev.kernel
>
> and the process running sleep 1000 immediatly returns on Linux.
> I tested it on other systems and it works correctly (the sleep
> continue).

This shows the operation of the SA_RESTART flag. If you don't want
the system call to return to the caller with -1 and EINTR, you
have to use this.

#include <stdio.h>
#include <signal.h>
#include <errno.h>
#include <string.h>

void foo(int unused) { puts("\7Alarm"); }

main(int x)
{
struct sigaction sa;
char buf[1];
int i;
memset(&sa, 0x00, sizeof(sa));
if(x > 1)
sa.sa_flags = SA_RESTART;
sa.sa_handler = foo;
sigaction(SIGALRM, &sa, NULL);
alarm(1);
i = read(0, buf, 1);
printf("%d, %s\n", i, strerror(errno));
}

Cheers,
Dick Johnson

Penguin : Linux version 2.3.13 on an i686 machine (400.59 BogoMips).
Warning : The end of the world as we know it requires a new calendar.

Seconds : 2010448 (until Y2K)

Ulrich Drepper

unread,

Dec 8, 1999, 8:00:00 AM12/8/99

to Andrea Arcangeli

Andrea Arcangeli <and...@suse.de> writes:

> NOTE: I can as well fix the kernel for this, but I agree with Peter that
> returning -INTR looks like the right thing to do. (I don't know which is
> the official semantic for the syscall though)

You don't understand the initial problem. This is that

kill(SIGSTOP);
ptrace(PTRACE_CONTINUE)

is interrupting syscalls as well. It is fine if signals in general
interrrupt syscalls. But SIGSTOP & friends, undone by a ptrace() call
should not return since these kind of things happen when because of
reasons outside the program. I user hitting ^Z or gdb stopping and
restarting a process. The behaviour of the program is changed
dramatically.

--
---------------. drepper at gnu.org ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com `------------------------

-

Andrea Arcangeli

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to Ulrich Drepper

On 8 Dec 1999, Ulrich Drepper wrote:

>You don't understand the initial problem. This is that

I am not even considering it now. I was considering what the kernel should
do after a:

kill(SIGSTOP);
kill(SIGCONT);

Richard was talking about what happens after a _signal_ and not after a
ptrace_continue. These are two different things and we can make them
behave in completly different way inside the kernel. I don't think you
should compare the SIGSTOP+SIGCONG with SIGSTOP+PTRACE_CONTINUE.

>reasons outside the program. I user hitting ^Z or gdb stopping and

I think we should make difference between ^Z and gdb. The signal code is
filled by ugly special cases exactly because they are different things
AFIK.

Do you agree that ^Z is just correct returning -EINTR immediatly at
SIGCONT time (aka `fg` time)?

Should we make PTRACE_CONTINUE to force nanosleep to continue (unlike the
SIGCONT case?)? BTW, I am not sure if nanosleep is the only place that you
may like to change in this respect...

Andrea

Ulrich Drepper

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to Andrea Arcangeli

Andrea Arcangeli <and...@suse.de> writes:

> Do you agree that ^Z is just correct returning -EINTR immediatly at
> SIGCONT time (aka `fg` time)?

This is not what happens on other platforms. At least with my limited
testing I found that if you do on Solaris

sleep 10
^Z
fg

the process will continue to sleep.

> Should we make PTRACE_CONTINUE to force nanosleep to continue (unlike the
> SIGCONT case?)?

This is the least what has to happen.

> BTW, I am not sure if nanosleep is the only place that you may like
> to change in this respect...

No, it's not the only place (e.g., blocking read call). I think this
is a general change. Whenever the continue happens throug
PTRACE_CONTINUE no EINTR should be generated.

--
---------------. drepper at gnu.org ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com `------------------------

-

Jason Gunthorpe

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to Ulrich Drepper

On 8 Dec 1999, Ulrich Drepper wrote:

> is interrupting syscalls as well. It is fine if signals in general
> interrrupt syscalls. But SIGSTOP & friends, undone by a ptrace() call
> should not return since these kind of things happen when because of

I've noticed some general dysfunction with Linux and attaching strace to
running processes. It seems that strace cannot attach without effecting
the state of the process it is attaching too - I never had time to trace
the particular problem down, but from this it sounds like a plausible
explanation [strace causes a slow system call to return?].

Jason

Andrea Arcangeli

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to Ulrich Drepper

On 8 Dec 1999, Ulrich Drepper wrote:

>This is not what happens on other platforms. At least with my limited
>testing I found that if you do on Solaris
>
> sleep 10
> ^Z
> fg
>
>the process will continue to sleep.

That's not enough to tell what the kernel is doing, maybe they have a bit
smarter sleep(1) program. `sleep` can be changed to run nanosleep again if
it received -EINTR and `req` is not null. You only have to pass as `req`
the `rem` that you got back from the previous nanosleep call.

>> Should we make PTRACE_CONTINUE to force nanosleep to continue (unlike the
>> SIGCONT case?)?
>
>This is the least what has to happen.

Ok.

>> BTW, I am not sure if nanosleep is the only place that you may like
>> to change in this respect...
>
>No, it's not the only place (e.g., blocking read call). I think this
>is a general change. Whenever the continue happens throug
>PTRACE_CONTINUE no EINTR should be generated.

Ok.

Andrea

Andrea Arcangeli

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to Jason Gunthorpe

On Wed, 8 Dec 1999, Jason Gunthorpe wrote:

>I've noticed some general dysfunction with Linux and attaching strace to
>running processes. It seems that strace cannot attach without effecting

There are things that you should expect to break. For example if you
SIGSTOP your parent (that is always strace) while you are traced, then
you'll deadlock the first time you'll try to return to userspace
immediatly after you sent the signal to strace. This is normal and it's
not trivial to fix it.

Ulrich Drepper

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to Andrea Arcangeli

Andrea Arcangeli <and...@suse.de> writes:

> That's not enough to tell what the kernel is doing, maybe they have a bit
> smarter sleep(1) program. `sleep` can be changed to run nanosleep again if
> it received -EINTR and `req` is not null. You only have to pass as `req`
> the `rem` that you got back from the previous nanosleep call.

I ran it under truss, you can do the same. The syscall does not return.

--
---------------. drepper at gnu.org ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com `------------------------

-

Jason Gunthorpe

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to Andrea Arcangeli

On Thu, 9 Dec 1999, Andrea Arcangeli wrote:

> There are things that you should expect to break. For example if you
> SIGSTOP your parent (that is always strace) while you are traced, then

Er, I'm sorry, I ment a case like:

sleep 100 &
strace -p `pidof sleep`

Particularly I've been doing this to wu-ftpd alot to try to isolate what
is causing it to get 'stuck' Often the strace command will 'unstick'
wu-ftpd :< Strace is not the parent in this instance.

Jason

Andrea Arcangeli

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to Jason Gunthorpe

On Wed, 8 Dec 1999, Jason Gunthorpe wrote:

>wu-ftpd :< Strace is not the parent in this instance.

strace always became the parent or it won't get the SIGCHLD from the task
when it gets stopped (after the SIGCHLD strace wakeups and looks at the
syscall param or at the syscall retval depending if it's an entry or exit
kernel).

Nevertheless I agree that probably the wu-ftpd different behaviour may be
made more close to reality ;). OTOH additional PF_PTRACE* special cases
are not very nice...

Andrea

H. Peter Anvin

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to linux-...@vger.rutgers.edu

Followup to: <m3bt81n...@localhost.localnet>
By author: Ulrich Drepper <dre...@cygnus.com>
In newsgroup: linux.dev.kernel

>
> Andrea Arcangeli <and...@suse.de> writes:
>
> > That's not enough to tell what the kernel is doing, maybe they have a bit
> > smarter sleep(1) program. `sleep` can be changed to run nanosleep again if
> > it received -EINTR and `req` is not null. You only have to pass as `req`
> > the `rem` that you got back from the previous nanosleep call.
>
> I ran it under truss, you can do the same. The syscall does not return.
>

Yes, tracing a program having this effect is not acceptable. I
consider this to be a tracing problem, however. I have seen the same
thing with strace -- in fact, stracing programs that rely on SIGSTOP
is largely impossible.

-hpa
--
<h...@transmeta.com> at work, <h...@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."

-

Eric Paire

unread,

Dec 9, 1999, 8:00:00 AM12/9/99

to H. Peter Anvin

h...@transmeta.com (H. Peter Anvin) writes:

> > Hmm...This works properly on libc5 systems, btw. (glibc2.0 and glibc2.1
> > use nanosleep(), libc5 uses alarm() and sigsuspend()).
> >
>
> It really could be argued what is the right behaviour here. When a
> system call is interrupted by the signal, the normal thing is to
> return EINTR.
>

I think I did not explained the problem clearly:

My reading of the The POSIX philosophy is that it is not legal for a
blocking system call to return EINTR when it has been interrupted by a
signal that does not have a signal handler attached to it at the time
the signal has been delivered to the process.

The behaviour of the Linux kernel for that is that it always returns
EINTR when a blocking system call has been interrupted by a signal,
whether there was a signal handler attached *OR NOT*. THIS IS THE
REASON WHY I STATED THAT LINUX IS NOT POSIX-COMPLIANT ON THIS SPECIAL
SIGNAL MANAGEMENT.

I gave the example of sleep because this was a very simple one that
everybody can exercise with bash. BUT THIS IS IMHO A MORE GENERAL PROBLEM.
Notice that this has only to deal with signals which effect is to start/stop
the process. The management of other signals is perfectly POSIX-compliant.

IMHO, the SIGSTOP management (which is much simpler than the others since
the signal can never be ignored nor caught) should be taken into account
in the schedule loop, and not in the signal management on syscall return.
Notice that SIGTSTP, SIGTTIN and SIGTTOU should be handled at the same
place when the default signal behaviour is applied, as well as some other
special cases like ignored SIGCHLD,... Part of this code is currently in
machine-dependent do_signal() function.

The advantage of such modification is that a blocking system call will
remain in the actual schedule loop whenever SIGSTOP/SIGTSTP and SIGCONT
are sent to him (thus eliminating the EINTR problem, and being POSIX
compatible). The other advantage is that for a traced process, the SIGSTOP
handling may also be managed in the schedule loop, thus avoiding the side
effect of being awaken by PTRACE_ATTACH/PTRACE_CONTINUE.

*BUT*, the schedule loop should only be managed by knowledgable people.
And I let them see the problem and what could be the actual fix...

-Eric
BTW, notice that the SA_RESTART is *NOT* the right solution, since it is
only handled when a signal handler has been attached. If no signal
handler is attached (which is the default behaviour), then this
solution does not work.

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web : http://www.ri.silicomp.com/~paire | Group SILICOMP - Research Institute
Email: eric....@ri.silicomp.com | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71 | F-38610 Gieres
Fax : +33 (0) 476 51 05 32 | FRANCE

Andi Kleen

unread,

Dec 10, 1999, 8:00:00 AM12/10/99

to Eric Paire

pa...@ri.silicomp.fr (Eric Paire) writes:
>
> The behaviour of the Linux kernel for that is that it always returns
> EINTR when a blocking system call has been interrupted by a signal,
> whether there was a signal handler attached *OR NOT*. THIS IS THE
> REASON WHY I STATED THAT LINUX IS NOT POSIX-COMPLIANT ON THIS SPECIAL
> SIGNAL MANAGEMENT.

send_sig_info has this code

/* Optimize away the signal, if it's a signal that can be
handled immediately (ie non-blocked and untraced) and
that is ignored (either explicitly or by default). */

if (ignored_signal(sig, t))
goto out;

So when the signal is blocked it should never even reach the process.

-Andi

--
This is like TV. I don't like TV.

Simon Patience

unread,

Dec 10, 1999, 8:00:00 AM12/10/99

to linux-...@vger.rutgers.edu, Andrea Arcangeli

In article <82mvls$aq...@fido.engr.sgi.com>, you write:
|> On 8 Dec 1999, Ulrich Drepper wrote:
|> >This is not what happens on other platforms. At least with my limited
|> >testing I found that if you do on Solaris
|> >
|> > sleep 10
|> > ^Z
|> > fg
|> >
|> >the process will continue to sleep.
|>

|> That's not enough to tell what the kernel is doing, maybe they have a bit
|> smarter sleep(1) program. `sleep` can be changed to run nanosleep again if
|> it received -EINTR and `req` is not null. You only have to pass as `req`
|> the `rem` that you got back from the previous nanosleep call.

No, the problem is that you shouldn't have interrupted it in the first
place. What is the point of interrupting a blocked process so that you
can block it?

Simon.

--
Simon Patience Phone: (650) 933-4644
Silicon Graphics, Inc FAX: (650) 962-8404
1600 Amphitheatre Pkwy Email: s...@sgi.com
Mountain View, CA 94043-1389

Simon Patience

unread,

Dec 10, 1999, 8:00:00 AM12/10/99

to linux-...@vger.rutgers.edu, Eric Paire

Eric Paire wrote:
|> My reading of the The POSIX philosophy is that it is not legal for a
|> blocking system call to return EINTR when it has been interrupted by a
|> signal that does not have a signal handler attached to it at the time
|> the signal has been delivered to the process.

I agree. If you look at the description of EINTR, that is quite clear.

[snip]

|> IMHO, the SIGSTOP management (which is much simpler than the others since
|> the signal can never be ignored nor caught) should be taken into account
|> in the schedule loop, and not in the signal management on syscall return.

You really don't want job control to be implemented in the scheduler! It
should be implemented in the joc control code on syscall/trap return. I
know there isn't such code at the moment but that is why you are seeing the
problems :-)

|> Notice that SIGTSTP, SIGTTIN and SIGTTOU should be handled at the same
|> place when the default signal behaviour is applied, as well as some other

Agreed.

|> special cases like ignored SIGCHLD,... Part of this code is currently in
|> machine-dependent do_signal() function.

|> The advantage of such modification is that a blocking system call will
|> remain in the actual schedule loop whenever SIGSTOP/SIGTSTP and SIGCONT
|> are sent to him (thus eliminating the EINTR problem, and being POSIX
|> compatible). The other advantage is that for a traced process, the SIGSTOP
|> handling may also be managed in the schedule loop, thus avoiding the side
|> effect of being awaken by PTRACE_ATTACH/PTRACE_CONTINUE.

I don't see this as an advantage. Stop signals should stop the process from
advancing in user space. You don't need to do anything to them while they
are in the kernel.

Ulrich Drepper

unread,

Dec 10, 1999, 8:00:00 AM12/10/99

to Simon Patience

s...@albion.engr.sgi.com (Simon Patience) writes:

> |> > sleep 10
> |> > ^Z
> |> > fg
> |> >
> |> >the process will continue to sleep.

> [...]

> No, the problem is that you shouldn't have interrupted it in the first
> place. What is the point of interrupting a blocked process so that you
> can block it?

I agrre, that is what seems to happen. With one little addition: at
least on Solaris the syscall in the end nevertheless returns EINTR.
I'm not sure whether this is useful but it might be ok since
a) code today already has to handle EINTR
b) it provides the user more information (e.g., that she could find out
that the process has possibly slept for a long time)

--
---------------. drepper at gnu.org ,-. 1325 Chesapeake Terrace
Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA
Cygnus Solutions `--' drepper at cygnus.com `------------------------

-

Simon Patience

unread,

Dec 10, 1999, 8:00:00 AM12/10/99

to linux-...@vger.rutgers.edu

|> This is not what happens on other platforms. At least with my limited
|> testing I found that if you do on Solaris
|>

|> sleep 10
|> ^Z
|> fg
|>
|> the process will continue to sleep.

I am with Ulrich on this one. The problem with job control signals is
that they are not really signals directed towards the process, they are
signals directed to the kernel to do something to the process. In the
case of STOP/CONT it is a request to not allow/allow the process to
make forward progress _in user space_. Sending the signal should not
interrupt the process at all.

Sending SIGSTOP should simply mark the process as not to return to user
space. If the process happens to be already blocked in the kernel
waiting for something, there is no reason to interrupt so it can be
blocked somewhere else in the kernel. If it wakes up then it can
complete the system call successfully but then block before returning
to user space. SIGCONT would simply clear that flag and wake the
process up if it was in a job control stop.

Simon.

----

Simon Patience Phone: (650) 933-4644
Silicon Graphics, Inc FAX: (650) 962-8404
1600 Amphitheatre Pkwy Email: s...@sgi.com
Mountain View, CA 94043-1389

-

Brian Pomerantz

unread,

Dec 10, 1999, 8:00:00 AM12/10/99

to linux-...@vger.rutgers.edu

On Fri, Dec 10, 1999 at 07:29:25AM -0800, Simon Patience wrote:
> In article <82mvls$aq...@fido.engr.sgi.com>, you write:
> |> On 8 Dec 1999, Ulrich Drepper wrote:

> |> >This is not what happens on other platforms. At least with my limited
> |> >testing I found that if you do on Solaris
> |> >
> |> > sleep 10
> |> > ^Z
> |> > fg
> |> >
> |> >the process will continue to sleep.
> |>

> |> That's not enough to tell what the kernel is doing, maybe they have a bit
> |> smarter sleep(1) program. `sleep` can be changed to run nanosleep again if
> |> it received -EINTR and `req` is not null. You only have to pass as `req`
> |> the `rem` that you got back from the previous nanosleep call.
>

> No, the problem is that you shouldn't have interrupted it in the first
> place. What is the point of interrupting a blocked process so that you
> can block it?
>

Isn't a process in a blocked state when it is waiting on I/O? I often
will hit ^Z for a long tarball extraction and run it in the
background. When I hit ^Z, the process could be waiting for I/O when
the signal comes through, thus a time when I want to interrupt a
blocked process to block it.

BAPper

Simon Patience

unread,

Dec 10, 1999, 8:00:00 AM12/10/99

to linux-...@vger.rutgers.edu

Brian Pomerantz wrote:
|> On Fri, Dec 10, 1999 at 07:29:25AM -0800, Simon Patience wrote:
|> > In article <82mvls$aq...@fido.engr.sgi.com>, you write:
|> > |> On 8 Dec 1999, Ulrich Drepper wrote:
|> > |> >This is not what happens on other platforms. At least with my limited
|> > |> >testing I found that if you do on Solaris
|> > |> >
|> > |> > sleep 10
|> > |> > ^Z
|> > |> > fg
|> > |> >
|> > |> >the process will continue to sleep.
|> > |>
|> > |> That's not enough to tell what the kernel is doing, maybe they have a bit
|> > |> smarter sleep(1) program. `sleep` can be changed to run nanosleep again if
|> > |> it received -EINTR and `req` is not null. You only have to pass as `req`
|> > |> the `rem` that you got back from the previous nanosleep call.
|> >
|> > No, the problem is that you shouldn't have interrupted it in the first
|> > place. What is the point of interrupting a blocked process so that you
|> > can block it?
|>
|> Isn't a process in a blocked state when it is waiting on I/O? I often
|> will hit ^Z for a long tarball extraction and run it in the
|> background. When I hit ^Z, the process could be waiting for I/O when
|> the signal comes through, thus a time when I want to interrupt a
|> blocked process to block it.

My point was that you could just leave it blocked waiting for the I/O.
If you type fg before the I/O completes then the process will just
execute and return as if nothing had happened when the I/O finally
completes. If the I/O completes first then the processes winds its way
back to the return from the system call, notices that it is stopped and
blocks itself there. When the SIGCONT arrives, the process unblocks and
returns from the system call normally.

This is far better (less code, less complexity) than unblocking the I/O
(which may have partial results) getting the process to block somewhere
else and then trying to work out what on earth the right thing to do is
when SIGCONT arrives.

Simon.

--
Simon Patience Phone: (650) 933-4644
Silicon Graphics, Inc FAX: (650) 962-8404
1600 Amphitheatre Pkwy Email: s...@sgi.com
Mountain View, CA 94043-1389

-

Eric Paire

unread,

Dec 15, 1999, 8:00:00 AM12/15/99

to Simon Patience

> Eric Paire wrote:
>
> [snip]
>
> |> IMHO, the SIGSTOP management (which is much simpler than the others since
> |> the signal can never be ignored nor caught) should be taken into account
> |> in the schedule loop, and not in the signal management on syscall return.
>
> You really don't want job control to be implemented in the scheduler! It
> should be implemented in the joc control code on syscall/trap return. I
> know there isn't such code at the moment but that is why you are seeing the
> problems :-)
>

No. my opinion was to locate only STOP/START management in the scheduling loop
in order to avoid exiting it for being managed very lately (just before
returning in user mode). So that if a process is stopped and then restarted
without any signal handler, then it will remain blocked in the scheduler
(which is transparent for functions that blocks a process).

>
> [snip]

>
> |> special cases like ignored SIGCHLD,... Part of this code is currently in
> |> machine-dependent do_signal() function.
>
> |> The advantage of such modification is that a blocking system call will
> |> remain in the actual schedule loop whenever SIGSTOP/SIGTSTP and SIGCONT
> |> are sent to him (thus eliminating the EINTR problem, and being POSIX
> |> compatible). The other advantage is that for a traced process, the SIGSTOP
> |> handling may also be managed in the schedule loop, thus avoiding the side
> |> effect of being awaken by PTRACE_ATTACH/PTRACE_CONTINUE.
>
> I don't see this as an advantage. Stop signals should stop the process from
> advancing in user space. You don't need to do anything to them while they
> are in the kernel.
>

My point is that processes that are stopped and restarted, exit from the
main schduler loop, and prepare themselves for returning EINTR in user space
(which is *not* POSIX-compliant, and make GDB very intrusive), since the
current implementation of restart does not force them to return to the
scheduler loop for those in INTERRUPTED state. The idea of managing stop
restart without signal handlers within schedule() is to make a simple
machine-independent modification to correct this signal mishandling.

Any scheduler guru opinion ???

Best regards,
-Eric

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web : http://www.ri.silicomp.com/~paire | Group SILICOMP - Research Institute
Email: eric....@ri.silicomp.com | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71 | F-38610 Gieres
Fax : +33 (0) 476 51 05 32 | FRANCE

-

Simon Patience

unread,

Dec 15, 1999, 8:00:00 AM12/15/99

to Eric Paire

Eric Paire wrote:

> > Eric Paire wrote:
> > |> IMHO, the SIGSTOP management (which is much simpler than the others since
> > |> the signal can never be ignored nor caught) should be taken into account
> > |> in the schedule loop, and not in the signal management on syscall return
> .
> >
> > You really don't want job control to be implemented in the scheduler! It
> > should be implemented in the joc control code on syscall/trap return. I
> > know there isn't such code at the moment but that is why you are seeing the
> > problems :-)
> >
> No. my opinion was to locate only STOP/START management in the scheduling loop
> in order to avoid exiting it for being managed very lately (just before
> returning in user mode). So that if a process is stopped and then restarted
> without any signal handler, then it will remain blocked in the scheduler
> (which is transparent for functions that blocks a process).

Why are you trying to do this? I can't see the objection to code just before
return to user space that says, if I am stopped, wait for sigcont. As you
haven't interrupted the processes you won't get EINTR. You don't have to
muck with the scheduler, which is always a tricky thing to do, and
everything works wonderfully.

> > I don't see this as an advantage. Stop signals should stop the process from
> > advancing in user space. You don't need to do anything to them while they
> > are in the kernel.
> >
> My point is that processes that are stopped and restarted, exit from the
> main schduler loop, and prepare themselves for returning EINTR in user space
> (which is *not* POSIX-compliant, and make GDB very intrusive), since the

But you don't need to change the scheduler to fix that, just don't send
interrupt the process when it gets the STOP signal in the first place.
Mark the process as stopped, having SIGSTOP in the pending set is good
enough but don't wake the process up. Then in do_signal() you special
case STOP signals and wait on a semaphore or something (actually a
synchronization/condition variable would be good for this situation but
Linux doesn't have them). When someone sends SIGCONT, they clear the STOP
signal from the pending set (as today) and then signal the semaphore.
No interrupt, no scheduler hack, POSIX compliant, simple.

> current implementation of restart does not force them to return to the
> scheduler loop for those in INTERRUPTED state. The idea of managing stop
> restart without signal handlers within schedule() is to make a simple
> machine-independent modification to correct this signal mishandling.
>
> Any scheduler guru opinion ???

Simon

Simon Patience Phone: (650) 933-4644
Silicon Graphics, Inc FAX: (650) 962-8404
1600 Amphitheatre Pkwy Email: s...@sgi.com
Mountain View, CA 94043-1389

-

Eric Paire

unread,

Dec 16, 1999, 8:00:00 AM12/16/99

to Simon Patience

I agree that your idea to transfer the STOP/CONT management in the calling
process rather the in the managed process seems good. But, you will have to
also transfer from do_signal() in ptrace(), the STOP/CONT management of a
traced process (which is similar to STOP/CONT), in order to avoid ptrace
to modify the process scheduling (gdb would be intrusive otherwise).

> > current implementation of restart does not force them to return to the
> > scheduler loop for those in INTERRUPTED state. The idea of managing stop
> > restart without signal handlers within schedule() is to make a simple
> > machine-independent modification to correct this signal mishandling.
> >
> > Any scheduler guru opinion ???
>

-Eric

+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ Eric PAIRE
Web : http://www.ri.silicomp.com/~paire | Group SILICOMP - Research Institute
Email: eric....@ri.silicomp.com | 2, avenue de Vignate
Phone: +33 (0) 476 63 48 71 | F-38610 Gieres
Fax : +33 (0) 476 51 05 32 | FRANCE

-