Patch for avoiding infinite loop on network disconnect

18 views
Skip to first unread message

Paul Ackersviller

unread,
Oct 27, 2019, 3:50:27 PM10/27/19
to vim...@googlegroups.com
With athena gui version on AIX, vim will consistently go into an
infinite loop if the network connection drops. Trussing such a process
points to a select system call, so I found this one without any check on
the return value. This patch mostly prevents the problem, although
not quite 100% of the time.

My EINTR check is just what I'm guessing some other OSes might need, it
doesn't seem to matter for AIX. In fact the error-handling I'm doing
below doesn't seem to matter either, as I can't get it to execute, nor
cai I stop there in a debugger, but the check seems enough for exiting
the process via somewhere in X libraries.

Though this code isn't gui-specific, I haven't seen this behaviour in a
terminal, however I mostly use screen for network sessions.


diff --git a/src/os_unix.c b/src/os_unix.c
index a3c09f75e..c04fd3061 100644
--- a/src/os_unix.c
+++ b/src/os_unix.c
@@ -587,6 +587,8 @@ mch_delay(long msec, int ignoreinput)

if (ignoreinput)
{
+ int poll_result = 0;
+
/* Go to cooked mode without echo, to allow SIGINT interrupting us
* here. But we don't want QUIT to kill us (CTRL-\ used in a
* shell may produce SIGQUIT). */
@@ -628,7 +630,7 @@ mch_delay(long msec, int ignoreinput)
usleep((unsigned int)(msec * 1000));
# else
# ifndef HAVE_SELECT
- poll(NULL, 0, (int)msec);
+ poll_result = poll(NULL, 0, (int)msec);
# else
{
struct timeval tv;
@@ -639,7 +641,14 @@ mch_delay(long msec, int ignoreinput)
* NOTE: Solaris 2.6 has a bug that makes select() hang here. Get
* a patch from Sun to fix this. Reported by Gunnar Pedersen.
*/
- select(0, NULL, NULL, NULL, &tv);
+ poll_result = select(0, NULL, NULL, NULL, &tv);
+ }
+ if (poll_result < 0 && errno != EINTR)
+ {
+ OUT_STR(strerror(errno));
+ OUT_STR("Vim: poll or select failed, exiting\n");
+ out_flush();
+ getout(errno);
}
# endif /* HAVE_SELECT */
# endif /* HAVE_NANOSLEEP */

Bram Moolenaar

unread,
Oct 27, 2019, 5:07:22 PM10/27/19
to vim...@googlegroups.com, Paul Ackersviller

Paul Ackersviller wrote:

> With athena gui version on AIX, vim will consistently go into an
> infinite loop if the network connection drops. Trussing such a process
> points to a select system call, so I found this one without any check on
> the return value. This patch mostly prevents the problem, although
> not quite 100% of the time.
>
> My EINTR check is just what I'm guessing some other OSes might need, it
> doesn't seem to matter for AIX. In fact the error-handling I'm doing
> below doesn't seem to matter either, as I can't get it to execute, nor
> cai I stop there in a debugger, but the check seems enough for exiting
> the process via somewhere in X libraries.
>
> Though this code isn't gui-specific, I haven't seen this behaviour in a
> terminal, however I mostly use screen for network sessions.

It looks like this code depends on undocumented or system-specific
behavior. At least for what I could find poll() and select() called
with no file descriptors will always wait until the timeout and then
return zero. Do you have documentation about when the error code would
be returned?

Also, I don't see how a hang can occur here when poll() or select()
returns without waiting. Vim would simply continue. Or is the delay
critical in some situation?


--
I used to be indecisive, now I'm not sure.

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

Paul Ackersviller

unread,
Oct 28, 2019, 8:46:21 PM10/28/19
to Bram Moolenaar, vim...@googlegroups.com
On Sun, Oct 27, 2019 at 10:07:14PM +0100, Bram Moolenaar wrote:
> Paul Ackersviller wrote:
> > With athena gui version on AIX, vim will consistently go into an
> > infinite loop if the network connection drops. Trussing such a process
> > points to a select system call, so I found this one without any check on
> > the return value. This patch mostly prevents the problem, although
> > not quite 100% of the time.
> >
> > My EINTR check is just what I'm guessing some other OSes might need, it
> > doesn't seem to matter for AIX. In fact the error-handling I'm doing
> > below doesn't seem to matter either, as I can't get it to execute, nor
> > can I stop there in a debugger, but the check seems enough for exiting
> > the process via somewhere in X libraries.
>
> It looks like this code depends on undocumented or system-specific
> behavior. At least for what I could find poll() and select() called
> with no file descriptors will always wait until the timeout and then
> return zero. Do you have documentation about when the error code would
> be returned?

I can pass on man pages if you want to know possible errno values, but
that won't help with EINTR, as no OSes I use have that behaviour. I put
that check in only to mimic how vim is already handling select() errors
elsewhere, i.e. in RealWaitForChar() also in os_unix.c, as well as
can_write_buf_line() in the channel.c file.

> Also, I don't see how a hang can occur here when poll() or select()
> returns without waiting. Vim would simply continue. Or is the delay
> critical in some situation?

Yes, continues infinitely, which is the issue... chewing up 100% of a CPU
until killed. Not sure how you got th idea of a hang.

Bram Moolenaar

unread,
Oct 28, 2019, 10:03:54 PM10/28/19
to vim...@googlegroups.com, Paul Ackersviller

Paul Ackersviller wrote:

> > > With athena gui version on AIX, vim will consistently go into an
> > > infinite loop if the network connection drops. Trussing such a process
> > > points to a select system call, so I found this one without any check on
> > > the return value. This patch mostly prevents the problem, although
> > > not quite 100% of the time.
> > >
> > > My EINTR check is just what I'm guessing some other OSes might need, it
> > > doesn't seem to matter for AIX. In fact the error-handling I'm doing
> > > below doesn't seem to matter either, as I can't get it to execute, nor
> > > can I stop there in a debugger, but the check seems enough for exiting
> > > the process via somewhere in X libraries.
> >
> > It looks like this code depends on undocumented or system-specific
> > behavior. At least for what I could find poll() and select() called
> > with no file descriptors will always wait until the timeout and then
> > return zero. Do you have documentation about when the error code would
> > be returned?
>
> I can pass on man pages if you want to know possible errno values, but
> that won't help with EINTR, as no OSes I use have that behaviour. I put
> that check in only to mimic how vim is already handling select() errors
> elsewhere, i.e. in RealWaitForChar() also in os_unix.c, as well as
> can_write_buf_line() in the channel.c file.

Not errno values, but just why it would return an error at all. It's
documented that select() without any file descriptors can be used to
wait with sub-second accuracy, for systems that don't have usleep(). But
nowhere does it say it returns any error.

> > Also, I don't see how a hang can occur here when poll() or select()
> > returns without waiting. Vim would simply continue. Or is the delay
> > critical in some situation?
>
> Yes, continues infinitely, which is the issue... chewing up 100% of a CPU
> until killed. Not sure how you got th idea of a hang.

Where does it loop then? The place where you have the change doesn't
loop, it returns.

--
All good vision statements are created by groups of people with bloated
bladders who would rather be doing anything else.
(Scott Adams - The Dilbert principle)

Paul Ackersviller

unread,
Oct 29, 2019, 9:56:31 PM10/29/19
to Bram Moolenaar, vim...@googlegroups.com
On Tue, Oct 29, 2019 at 03:03:43AM +0100, Bram Moolenaar wrote:
>
> Paul Ackersviller wrote:
>
> > > > With athena gui version on AIX, vim will consistently go into an
> > > > infinite loop if the network connection drops. Trussing such a process
> > > > points to a select system call, so I found this one without any check on
> > > > the return value. This patch mostly prevents the problem, although
> > > > not quite 100% of the time.
> > >
> > > It looks like this code depends on undocumented or system-specific
> > > behavior. At least for what I could find poll() and select() called
> > > with no file descriptors will always wait until the timeout and then
> > > return zero. Do you have documentation about when the error code would
> > > be returned?
> >
> > I can pass on man pages if you want to know possible errno values, but
> > that won't help with EINTR, as no OSes I use have that behaviour. I put
> > that check in only to mimic how vim is already handling select() errors
> > elsewhere, i.e. in RealWaitForChar() also in os_unix.c, as well as
> > can_write_buf_line() in the channel.c file.
>
> Not errno values, but just why it would return an error at all. It's
> documented that select() without any file descriptors can be used to
> wait with sub-second accuracy, for systems that don't have usleep(). But
> nowhere does it say it returns any error.

I'm attaching the system's select man page, and it looks like EINTR is
about the only candidate in this situation. I'd say it's ambiguous
whether an error could happen waiting on a timeout, it not mentioned.

> > > Also, I don't see how a hang can occur here when poll() or select()
> > > returns without waiting. Vim would simply continue. Or is the delay
> > > critical in some situation?
> >
> > Yes, continues infinitely, which is the issue... chewing up 100% of a CPU
> > until killed. Not sure how you got th idea of a hang.
>
> Where does it loop then? The place where you have the change doesn't
> loop, it returns.

You've got me wondering if the loop isn't really in Athena or X, and the
change I did is affecting timing somehow... I'll let you know if I get
anywhere, and thanks for your attention.
select-aix.txt
Reply all
Reply to author
Forward
0 new messages