Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Dangling background jobs in opoenssh

1 view
Skip to first unread message

Nico Kadel-Garcia

unread,
Jul 12, 2001, 8:47:44 AM7/12/01
to

Hi, folks. I'm working with openssh-2.9p2, and I've noticed a change in
behavior from ssh-2.2.27. When I connect to the OpenSSH server, and try
to leave a background job running before I exit, I cannot exit the ssh
session until after the background job completes. For example:

ssh mybox
sleep 3600 &
exit

If the server is ssh-1.2.27, I exit immediately and successfully and the
background job remains executing. If the server is openssh-2.9p2 or
several other recent versions I've tested, the connection remains open
until the background job completes.

Has anyone else noticed this? Can I do anything to fix it?

Markus Friedl

unread,
Jul 13, 2001, 11:02:59 AM7/13/01
to
In <3B4D9C72...@bellatlantic.net> Nico Kadel-Garcia <nka...@bellatlantic.net> writes:


>Hi, folks. I'm working with openssh-2.9p2, and I've noticed a change in
>behavior from ssh-2.2.27. When I connect to the OpenSSH server, and try
>to leave a background job running before I exit, I cannot exit the ssh
>session until after the background job completes. For example:

> ssh mybox
> sleep 3600 &
> exit

does
sleep 3600 < /dev/null > /dev/null 2>&1

hang, too?

Richard E. Silverman

unread,
Jul 15, 2001, 3:53:07 AM7/15/01
to

> Hi, folks. I'm working with openssh-2.9p2, and I've noticed a change in
> behavior from ssh-1.2.27. When I connect to the OpenSSH server, and try

> to leave a background job running before I exit, I cannot exit the ssh
> session until after the background job completes. For example:
>
> ssh mybox
> sleep 3600 &
> exit
>
> If the server is ssh-1.2.27, I exit immediately and successfully and the
> background job remains executing. If the server is openssh-2.9p2 or
> several other recent versions I've tested, the connection remains open
> until the background job completes.

You will find the same behavior in recent versions of SSH1 as well, 1.2.29
and later.

The reason is that your background process does not close its stdout file
descriptor. sshd waits to exit until it receives end-of-file on its end
of those pipes, and that can't happen until all references to the other
ends have been closed. Thus, your shell exiting is not sufficient, since
"sleep" inherited references to those pipes.

The reason for the change was that sshd can't know which behavior the user
wants -- close the connection when the remote command exits, or when sshd
receives EOF. Using the former rule in all cases can cause data loss,
notably with scp. Imagine using scp to copy a file from server to client:
sshd uses the shell to start scp on the server side, and waits for the
shell to exit. Eventually, the remote scp writes the last of the file to
its stdout, and exits, causing the shell to exit. The data flowing from
the remote scp to sshd is not buffered through the shell, though; it goes
directly from scp to sshd via scp's inherited copy of the shell's stdout.
It is not defined which will happen first: that sshd receives a SIGCHLD
notifying it of the shell's demise, or that the last of the data from scp
will be delivered via the pipe. If the latter happens first, then the
file copy loses data from the end of the file.

To prevent this problem, sshd was modified to take the more conservative
approach. All you have to do to get around this is redirect stdout,
e.g.

ssh server 'sleep 5 &'

will wait, but

ssh server 'sleep 5 > /dev/null &'

will not. Note that you need to redirect stderr as well if there is a pty
present (as in your interactive example), so:

ssh mybox
sleep 3600 >& /dev/null &
exit

--
Richard Silverman
sl...@shore.net

Nico Kadel-Garcia

unread,
Jul 16, 2001, 8:07:12 AM7/16/01
to

"Richard E. Silverman" wrote:

> > Hi, folks. I'm working with openssh-2.9p2, and I've noticed a change in
> > behavior from ssh-1.2.27. When I connect to the OpenSSH server, and try
> > to leave a background job running before I exit, I cannot exit the ssh
> > session until after the background job completes. For example:
> >
> > ssh mybox
> > sleep 3600 &
> > exit
> >
> > If the server is ssh-1.2.27, I exit immediately and successfully and the
> > background job remains executing. If the server is openssh-2.9p2 or
> > several other recent versions I've tested, the connection remains open
> > until the background job completes.
>
> You will find the same behavior in recent versions of SSH1 as well, 1.2.29
> and later.
>
> The reason is that your background process does not close its stdout file
> descriptor. sshd waits to exit until it receives end-of-file on its end
> of those pipes, and that can't happen until all references to the other
> ends have been closed. Thus, your shell exiting is not sufficient, since
> "sleep" inherited references to those pipes.

Gack. Ick. P-thah. You are a nice man for explaining this, but this is
amazingly silly, and as near as I can tell, a change in behavior that is
entirely undocumented.

> To prevent this problem, sshd was modified to take the more conservative
> approach. All you have to do to get around this is redirect stdout,
> e.g.
>
> ssh server 'sleep 5 &'
>
> will wait, but
>
> ssh server 'sleep 5 > /dev/null &'
>
> will not. Note that you need to redirect stderr as well if there is a pty
> present (as in your interactive example), so:
>
> ssh mybox
> sleep 3600 >& /dev/null &
> exit

Hmm. Let me think on this and test it a bit. It's a bit of non-transparancey
to the use of ssh rather than a local session that seems pretty awkward.

Simon Tatham

unread,
Jul 16, 2001, 8:55:07 AM7/16/01
to
Nico Kadel-Garcia <nka...@bellatlantic.net> wrote:
> Hmm. Let me think on this and test it a bit. It's a bit of
> non-transparancey to the use of ssh rather than a local session that
> seems pretty awkward.

It's not adding any non-transparency that wasn't already there.
Suppose, using the old behaviour, you'd done something less
innocuous than sleep:

ssh host 'large_compile_command &'

and the large compile command had tried to report an error at the
end of its run. Of course the error output couldn't have come back
down the ssh connection, because the connection would have closed.
Bingo, you've lost transparency.

Conclusion: backgrounding a task _at the remote end_ was always
going to lose you some degree of transparency. If instead you'd done

ssh host large_compile_command &

then the _local_ ssh process would be backgrounded, and the large
compile command could report all the errors it wanted to, and you
really would have proper transparency.
--
Simon Tatham "Every person has a thinking part that wonders what
<ana...@pobox.com> the part that isn't thinking isn't thinking about."

Nico Kadel-Garcia

unread,
Jul 25, 2001, 6:22:40 AM7/25/01
to

"Markus Friedl" <msfr...@cip.informatik.uni-erlangen.de> wrote in message
news:9in2j3$4sa$1...@rznews2.rrze.uni-erlangen.de...

Give the man a kewpie doll. Everybody else who forgot about the stderr
redirection: a slap on the wrist for forgetting stitches inside the patient.

And for all of you who used "2>&1" for redirection, please don't forget
about the /bin/tcsh users out there and explain what that is *for*.


Nico Kadel-Garcia

unread,
Jul 30, 2001, 8:00:52 AM7/30/01
to

Simon Tatham wrote:

> Nico Kadel-Garcia <nka...@bellatlantic.net> wrote:
> > Hmm. Let me think on this and test it a bit. It's a bit of
> > non-transparancey to the use of ssh rather than a local session that
> > seems pretty awkward.
>
> It's not adding any non-transparency that wasn't already there.
> Suppose, using the old behaviour, you'd done something less
> innocuous than sleep:
>
> ssh host 'large_compile_command &'
>
> and the large compile command had tried to report an error at the
> end of its run. Of course the error output couldn't have come back
> down the ssh connection, because the connection would have closed.
> Bingo, you've lost transparency.
>
> Conclusion: backgrounding a task _at the remote end_ was always
> going to lose you some degree of transparency. If instead you'd done
>
> ssh host large_compile_command &
>
> then the _local_ ssh process would be backgrounded, and the large
> compile command could report all the errors it wanted to, and you
> really would have proper transparency.

Point taken. But when I background a series of different jobs in a screen,
I don't want them all sitting there suspended. I want them backgrounded,
gosh darn it. And if I log into the remove machine:

ssh host
large_compile_command &
exit

I want my job control back. And under the current model, I dont get it.
I have to leave the window dangling there with that particular command
running in order for the job to complete remotely, and that seems
Wrong(tm).

I understand the reasons for the change, for scp, but they also weren't
documented anywhere I noticed. This group was the only one that
explained it. And due to the issues with stderr, most of the people who
suggested a solution got it wrong....

0 new messages