Lot's of test errors in package zmq4 with Go version 1.14, no errors with earlier versions

679 views
Skip to first unread message

Peter Kleiweg

unread,
Feb 26, 2020, 6:33:05 AM2/26/20
to golang-nuts
With Go version 1.14 I get a lot of errors when I run:

    go test -v github.com/pebbe/zmq4

I didn't see this with Go 1.13.8 or any earlier version.

Is this a problem with Go 1.14, or am I doing something wrong and just got lucky until now?

How do I debug this? The errors are different for each run. Below is a sample of some errors.
Line numbers are not always accurate, because I inserted some calls to test.Log().

    === RUN   TestSocketEvent
        TestSocketEvent: socketevent_test.go:73: rep.Bind: interrupted system call


    === RUN   TestMultipleContexts
        TestMultipleContexts: zmq4_test.go:131: sock1.Connect: interrupted system call

    freeze:
    === RUN   TestMultipleContexts
    ^CFAIL  github.com/pebbe/zmq4   30.226s

    freeze:
    === RUN   TestMultipleContexts
        TestMultipleContexts: zmq4_test.go:148: sock1.RecvMessage: expected <nil> [tcp://127.0.0.1:9997 tcp://127.0.0.1:9997], got interrupted system call []
    ^CFAIL  github.com/pebbe/zmq4   21.445s



    freeze:
    === RUN   TestSecurityCurve
    ^CFAIL  github.com/pebbe/zmq4   31.143s



    freeze:
    === RUN   TestSecurityNull
        TestSecurityNull: zmq4_test.go:1753: server.Recv 1: resource temporarily unavailable
    ^CFAIL  github.com/pebbe/zmq4   44.828s


    === RUN   TestDisconnectInproc
        TestDisconnectInproc: zmq4_test.go:523: Poll: interrupted system call
        TestDisconnectInproc: zmq4_test.go:623: isSubscribed

    === RUN   TestHwm
        TestHwm: zmq4_test.go:823: bind_socket.Bind: interrupted system call
        TestHwm: zmq4_test.go:1044: test_inproc_bind_first(0, 0): expected 10000, got -1

    freeze:
    === RUN   TestSecurityPlain
    ^CFAIL  github.com/pebbe/zmq4   46.395s

    === RUN   TestPairIpc
        TestPairIpc: zmq4_test.go:1124: client.Send SNDMORE|DONTWAIT: interrupted system call

Peter Kleiweg

unread,
Feb 26, 2020, 6:34:14 AM2/26/20
to golang-nuts
This is with go version go1.14 linux/amd64


Op woensdag 26 februari 2020 12:33:05 UTC+1 schreef Peter Kleiweg:

Manlio Perillo

unread,
Feb 26, 2020, 7:05:40 AM2/26/20
to golang-nuts
On Wednesday, February 26, 2020 at 12:33:05 PM UTC+1, Peter Kleiweg wrote:
With Go version 1.14 I get a lot of errors when I run:

    go test -v github.com/pebbe/zmq4

I didn't see this with Go 1.13.8 or any earlier version.

Is this a problem with Go 1.14, or am I doing something wrong and just got lucky until now?

How do I debug this? The errors are different for each run. Below is a sample of some errors.
Line numbers are not always accurate, because I inserted some calls to test.Log().

The errors are probably caused by https://golang.org/doc/go1.14#runtime.

The solution is to update zmq4  to explicitly handle interrupted system calls.
However it is strange that they happen in the tests.  Is this cause by SIPIPE?

> [...]


Manlio 

Peter Kleiweg

unread,
Feb 26, 2020, 9:51:54 AM2/26/20
to golang-nuts
Op woensdag 26 februari 2020 13:05:40 UTC+1 schreef Manlio Perillo:
On Wednesday, February 26, 2020 at 12:33:05 PM UTC+1, Peter Kleiweg wrote:
With Go version 1.14 I get a lot of errors when I run:

    go test -v github.com/pebbe/zmq4

I didn't see this with Go 1.13.8 or any earlier version.

Is this a problem with Go 1.14, or am I doing something wrong and just got lucky until now?

How do I debug this? The errors are different for each run. Below is a sample of some errors.
Line numbers are not always accurate, because I inserted some calls to test.Log().

The errors are probably caused by https://golang.org/doc/go1.14#runtime.

The solution is to update zmq4  to explicitly handle interrupted system calls.

Often the program freezes before I get an interrupted system call. It hangs inside a ZeroMQ C++ library function.
zmq4 is just a wrapper for ZeroMQ. I can't "fix" ZeroMQ to make it work with Go.

Is there a way to stop Go from interrupting my system calls? It happens rather randomly all over the place.
 
However it is strange that they happen in the tests.  Is this cause by SIPIPE?

 I don't know about this. The only flag to `go test` I used was -v.



Gregor Best

unread,
Feb 26, 2020, 10:08:04 AM2/26/20
to golan...@googlegroups.com

This looks like fallout of the 1.14 changes that made Goroutines preemptively schedulable.

It seems likely that this code hasn't worked before either, just that the failure cases were masked because less signals got delivered (and thus had less chance of interrupting system calls).

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/cdb7319d-542a-45ab-842b-bc1b5d838e93%40googlegroups.com.
-- 
--
  Gregor Best
  be...@pferdewetten.de

Manlio Perillo

unread,
Feb 26, 2020, 10:11:10 AM2/26/20
to golang-nuts
On Wednesday, February 26, 2020 at 3:51:54 PM UTC+1, Peter Kleiweg wrote:
Op woensdag 26 februari 2020 13:05:40 UTC+1 schreef Manlio Perillo:
On Wednesday, February 26, 2020 at 12:33:05 PM UTC+1, Peter Kleiweg wrote:
With Go version 1.14 I get a lot of errors when I run:

    go test -v github.com/pebbe/zmq4

I didn't see this with Go 1.13.8 or any earlier version.

Is this a problem with Go 1.14, or am I doing something wrong and just got lucky until now?

How do I debug this? The errors are different for each run. Below is a sample of some errors.
Line numbers are not always accurate, because I inserted some calls to test.Log().

The errors are probably caused by https://golang.org/doc/go1.14#runtime.

The solution is to update zmq4  to explicitly handle interrupted system calls.

Often the program freezes before I get an interrupted system call. It hangs inside a ZeroMQ C++ library function.
zmq4 is just a wrapper for ZeroMQ. I can't "fix" ZeroMQ to make it work with Go.

Is there a way to stop Go from interrupting my system calls? It happens rather randomly all over the place.


ZeroMQ may return an EINTR error , but zmq4 does not list it in errors.go.
ZeroMQ asks the caller to handle EINTR, so zmq4 should handle it internally or return it to the caller.

https://golang.org/doc/go1.14#runtime should have mentioned that not only programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors, but also programs that use Cgo.


Manlio

Ian Lance Taylor

unread,
Feb 26, 2020, 10:14:38 AM2/26/20
to Manlio Perillo, golang-nuts
I don't know ZeroMQ. If the ZeroMQ calls correspond closely to system
calls, then it could work for them to return EINTR. In that case the
fix is going to be for the Go wrapper around ZeroMQ to check whether
the error returned is syscall.EINTR, and to retry the call if it is.

Ian

Manlio Perillo

unread,
Feb 26, 2020, 12:10:48 PM2/26/20
to golang-nuts
On Wednesday, February 26, 2020 at 4:14:38 PM UTC+1, Ian Lance Taylor wrote:

Amnon Baron Cohen

unread,
Feb 26, 2020, 12:33:08 PM2/26/20
to golang-nuts
from https://www.jwz.org/doc/worse-is-better.html

Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem.

The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called ``PC loser-ing'' because the PC is being coerced into ``loser mode,'' where ``loser'' is the affectionate name for ``user'' at MIT.


The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing.


The New Jersey guy said that the Unix solution was right because the design philosophy of Unix was simplicity and that the right thing was too complex. Besides, programmers could easily insert this extra test and loop. The MIT guy pointed out that the implementation was simple but the interface to the functionality was complex. The New Jersey guy said that the right tradeoff has been selected in Unix-namely, implementation simplicity was more important than interface simplicity.

Peter Kleiweg

unread,
Feb 26, 2020, 12:34:46 PM2/26/20
to golang-nuts
That leaves the problem that often, the program just waits forever in C code, not returning an interrupted system call.

Op woensdag 26 februari 2020 16:14:38 UTC+1 schreef Ian Lance Taylor:

Robert Engels

unread,
Feb 26, 2020, 12:50:44 PM2/26/20
to Peter Kleiweg, golang-nuts
It is especially difficult when the driver has complex timing and/or hardware flags. Interrupting these calls usually lead to unretriable failures. 

On Feb 26, 2020, at 11:41 AM, Peter Kleiweg <pkle...@xs4all.nl> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

pboam...@gmail.com

unread,
Feb 26, 2020, 6:31:04 PM2/26/20
to golang-nuts
How should I handle EINTR from syscall.Dup2 (Linux)?

- The effect of SA_RESTART on dup2(2) is undocumented.
- If dup2 returns EINTR, can I be sure that nothing has been done? (Otherwise retrying is racy.)
- libuv uv__dup2_cloexec [1] never retries the syscall but treats EINTR as a failure.
- Python os.dup2 [2] also never retries, but it seems that it treats EINTR as success ("ignores EINTR error").
- See also unresolved question: https://stackoverflow.com/questions/15930013/can-dup2-really-return-eintr

All very confusing.
Also, can os.File.Close / syscall.Close return EINTR? Is it affected by SA_RESTART?

[1] https://github.com/libuv/libuv/issues/462
[2] https://www.python.org/dev/peps/pep-0475

Ian Lance Taylor

unread,
Feb 26, 2020, 8:15:14 PM2/26/20
to Manlio Perillo, golang-nuts
Thanks for the links. Note that these issues don't really have
anything to do with Go. For certain system calls, you need to handle
EINTR one way or another. The Go runtime does as much as it can to
avoid these problems, but on Unix systems it is impossible to avoid
them entirely.

Ian

Michael Jones

unread,
Feb 26, 2020, 8:43:24 PM2/26/20
to Ian Lance Taylor, Manlio Perillo, golang-nuts
There is the BSD notion of sa_restart, a convenience to loop for the caller as appropriate.


Go could adopt such a notion if desired.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.


--
Michael T. Jones
michae...@gmail.com

Ian Lance Taylor

unread,
Feb 26, 2020, 8:48:51 PM2/26/20
to Michael Jones, Manlio Perillo, golang-nuts
On Wed, Feb 26, 2020 at 5:42 PM Michael Jones <michae...@gmail.com> wrote:
>
> There is the BSD notion of sa_restart, a convenience to loop for the caller as appropriate.
>
> https://www.freebsd.org/cgi/man.cgi?query=sigaction
>
> Go could adopt such a notion if desired.

We already do. We install all signal handlers with the SA_RESTART flag set.

Unfortunately if you peruse "man 7 signal" on a GNU/Linux system you
will see a list of system calls that return EINTR even if the handler
has SA_RESTART set.

Ian

Michael Jones

unread,
Feb 26, 2020, 8:51:50 PM2/26/20
to Ian Lance Taylor, Manlio Perillo, golang-nuts
Sorry...I meant the go system signal interface could loop if desired. (Not recommending, just saying that panicky people could be coddled if desired)

Ian Lance Taylor

unread,
Feb 26, 2020, 9:39:32 PM2/26/20
to Michael Jones, Manlio Perillo, golang-nuts
On Wed, Feb 26, 2020 at 5:51 PM Michael Jones <michae...@gmail.com> wrote:
>
> Sorry...I meant the go system signal interface could loop if desired. (Not recommending, just saying that panicky people could be coddled if desired)

Ah, I see. Except, no, I don't. Could we really do that? Even if
the signal arrived while executing some function written in C and
called via cgo?

Robert Engels

unread,
Feb 26, 2020, 11:38:36 PM2/26/20
to Ian Lance Taylor, Michael Jones, Manlio Perillo, golang-nuts
The problem is that Go designers are taking the position that any sys call should be able to be interrupted. This is invalid. For the vast majority or “unix” os an interrupt is a very rare condition and so they treat it as an error. If you issue interrupts continually you are creating an unexpected context.

> On Feb 26, 2020, at 8:39 PM, Ian Lance Taylor <ia...@golang.org> wrote:
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcW_rGLEGY2d%3Df_OacR%3D%3DSv9kPwp%2BUN8SG3N981S1quobQ%40mail.gmail.com.

Michael Jones

unread,
Feb 27, 2020, 12:33:56 AM2/27/20
to Robert Engels, Ian Lance Taylor, Manlio Perillo, golang-nuts
Ian, I guess not. I’d not thought of that indirectness. I withdraw my musings. 

Manlio Perillo

unread,
Feb 27, 2020, 6:05:52 AM2/27/20
to golang-nuts
On Thursday, February 27, 2020 at 2:15:14 AM UTC+1, Ian Lance Taylor wrote:
On Wed, Feb 26, 2020 at 9:11 AM Manlio Perillo <manlio...@gmail.com> wrote:
>
> On Wednesday, February 26, 2020 at 4:14:38 PM UTC+1, Ian Lance Taylor wrote:
>>
>> On Wed, Feb 26, 2020 at 7:11 AM Manlio Perillo <manlio...@gmail.com> wrote:
> [...]

>> >
>> > https://stackoverflow.com/questions/36040547/zeromq-how-to-react-on-different-signal-types-on-eintr
>> >
>> > ZeroMQ may return an EINTR error , but zmq4 does not list it in errors.go.
>> > ZeroMQ asks the caller to handle EINTR, so zmq4 should handle it internally or return it to the caller.
>> >
>> > https://golang.org/doc/go1.14#runtime should have mentioned that not only programs that use packages like syscall or golang.org/x/sys/unix will see more slow system calls fail with EINTR errors, but also programs that use Cgo.
>>
>> I don't know ZeroMQ.  If the ZeroMQ calls correspond closely to system
>> calls, then it could work for them to return EINTR.  In that case the
>> fix is going to be for the Go wrapper around ZeroMQ to check whether
>> the error returned is syscall.EINTR, and to retry the call if it is.
>>
>
> Unfortunately it is not that simple:
>
> http://250bpm.com/blog:12
> https://alobbs.com/post/54503240599/close-and-eintr
> http://man7.org/linux/man-pages/man7/signal.7.html
> https://github.com/golang/go/issues/11180
> https://www.python.org/dev/peps/pep-0475/
>
> The second entry about close and EINTR is enlightening.

Thanks for the links.  Note that these issues don't really have
anything to do with Go.  For certain system calls, you need to handle
EINTR one way or another.  The Go runtime does as much as it can to
avoid these problems, but on Unix systems it is impossible to avoid
them entirely.


I suspect people will complain whatever you do.
It EINTR is returned to the caller, people will complain that they don't want to handle it.  Probably they don't even know what EINTR means. 
If EINTR is handled by the runtime and all the exported API of the stdlib never return EINTR, some people may complain that they want to handle EINTR in their program, since only the main program can decide how to handle EINTR.

I ported the example http://250bpm.com/blog:12 to Go:

For a server it is the right thing to do to ignore EINTR, but for an interactive client it is not.
By the way, cmd/go use this pattern, closing the Interrupted channel when receiving a SIGINT or SIGQUIT signals.

One thing Go can probably do is to not treat EINTR as a temporary error (as defined by Errno.Temporary method in syscall), and instead add a new method Errno.Interrupted.
In this way the stdlib have more control over EINTR.


Thanks
Manlio 

Jesper Louis Andersen

unread,
Feb 27, 2020, 6:29:37 AM2/27/20
to Amnon Baron Cohen, golang-nuts
PC loser-ing is a brilliant thing which also finds its use in various proofs about programming languages.

I guess the primary problem is that you have code out there which doesn't assume the presence of a signal handler in the code, yet suddenly you get EINTR back because of signal delivery in other libraries. Correctly programming Unix systems requires attention to detail :)

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.


--
J.

Ian Lance Taylor

unread,
Feb 27, 2020, 10:52:25 AM2/27/20
to Robert Engels, Michael Jones, Manlio Perillo, golang-nuts
On Wed, Feb 26, 2020 at 8:36 PM Robert Engels <ren...@ix.netcom.com> wrote:
>
> The problem is that Go designers are taking the position that any sys call should be able to be interrupted. This is invalid. For the vast majority or “unix” os an interrupt is a very rare condition and so they treat it as an error. If you issue interrupts continually you are creating an unexpected context.

You're right, I do disagree. I don't think it's the Go developers who
are taking that position. It's the Unix library developers. While a
main program can make its own choices about signals, a library has to
consider the possibility that it will be included in a program that
uses signals. A rare error is still an error. Just because Go 1.14
interrupts system calls more often doesn't mean the system calls were
not interrupted. And non-Go programs use signals too, e.g. SIGURG and
SIGIO.

Ian

Robert Engels

unread,
Feb 27, 2020, 12:30:03 PM2/27/20
to Ian Lance Taylor, Michael Jones, Manlio Perillo, golang-nuts
I think the difference is that a user program can block signals when working with certain devices. With Go that is not possible so the only choice is to not use Go.

Unless I am missing something else?

> On Feb 27, 2020, at 9:52 AM, Ian Lance Taylor <ia...@golang.org> wrote:

Robert Engels

unread,
Feb 27, 2020, 12:42:20 PM2/27/20
to Ian Lance Taylor, Michael Jones, Manlio Perillo, golang-nuts

I re-read your comments, and I agree that a rare error is still and error, and needs to be handled, but if it the platform is introducing lots of errors, is that the library writers issue?

Maybe an easy solution is a flag to disable the signal usage for tight-loop preemption as a "backwards compatibility" mode ?

As the OP pointed out, he can't really change ZeroMQ, and this is a fairly established product, maybe more so than Go, so doesn't it make more sense that Go adapts rather than the other way around?
>To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcUdX7G1QM5o-HUVH__YUpva6X%2Bj516sqa-zrYdXu%2By-wg%40mail.gmail.com.

Ian Lance Taylor

unread,
Feb 27, 2020, 12:50:56 PM2/27/20
to Robert Engels, Michael Jones, Manlio Perillo, golang-nuts
On Thu, Feb 27, 2020 at 9:41 AM Robert Engels <ren...@ix.netcom.com> wrote:
>
>
> I re-read your comments, and I agree that a rare error is still and error, and needs to be handled, but if it the platform is introducing lots of errors, is that the library writers issue?
>
> Maybe an easy solution is a flag to disable the signal usage for tight-loop preemption as a "backwards compatibility" mode ?
>
> As the OP pointed out, he can't really change ZeroMQ, and this is a fairly established product, maybe more so than Go, so doesn't it make more sense that Go adapts rather than the other way around?

We already have that flag: GODEBUG=noasyncpreempt=1.

The discussion upthread explains that the Go wrapper for ZeroMQ should
handle EINTR, and take the appropriate action such as retrying the
operation when appropriate. The response to that was a bit of
distraction, as it discussed generic problems with EINTR. At this
point there is no reason to assume that any of those problems actually
apply to using ZeroMQ.

Ian

Peter Kleiweg

unread,
Feb 27, 2020, 12:59:25 PM2/27/20
to golang-nuts
Op donderdag 27 februari 2020 18:50:56 UTC+1 schreef Ian Lance Taylor:
On Thu, Feb 27, 2020 at 9:41 AM Robert Engels <ren...@ix.netcom.com> wrote:
>
>
> I re-read your comments, and I agree that a rare error is still and error, and needs to be handled, but if it the platform is introducing lots of errors, is that the library writers issue?
>
> Maybe an easy solution is a flag to disable the signal usage for tight-loop preemption as a "backwards compatibility" mode ?
>
> As the OP pointed out, he can't really change ZeroMQ, and this is a fairly established product, maybe more so than Go, so doesn't it make more sense that Go adapts rather than the other way around?

We already have that flag: GODEBUG=noasyncpreempt=1.

The discussion upthread explains that the Go wrapper for ZeroMQ should
handle EINTR, and take the appropriate action such as retrying the
operation when appropriate.  The response to that was a bit of
distraction, as it discussed generic problems with EINTR.  At this
point there is no reason to assume that any of those problems actually
apply to using ZeroMQ.

Yes, a lot is said about handling EINTR.

Nothing is said about the code just freezing. How to handle that?

Robert Engels

unread,
Feb 27, 2020, 1:07:54 PM2/27/20
to Peter Kleiweg, golang-nuts
Does it freeze if you use GODEBUG=noasyncpreempt=1 ?
-----Original Message-----
From: Peter Kleiweg
Sent: Feb 27, 2020 11:59 AM
To: golang-nuts
Subject: Re: [go-nuts] Re: Lot's of test errors in package zmq4 with Go version 1.14, no errors with earlier versions

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Ian Lance Taylor

unread,
Feb 27, 2020, 1:55:40 PM2/27/20
to Robert Engels, Peter Kleiweg, golang-nuts
On Thu, Feb 27, 2020 at 10:07 AM Robert Engels <ren...@ix.netcom.com> wrote:
>
> Does it freeze if you use GODEBUG=noasyncpreempt=1 ?

Sorry, I got it wrong earlier. It's GODEBUG=asyncpreemptoff=1.

I can verify that the tests seem to pass with
GODEBUG=asyncpreemptoff=1, and hang without it.

I took a quick look at TestMultipleContexts, which sometimes freezes.
I haven't verified this but I think it's because the server is
failing, probably due to an EINTR error. The server is reporting an
error on a channel, but the test carries on and only looks at the
channel at the end. My guess is that since the server has failed the
test is hanging waiting for the server to respond.

Ian
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/394898919.2193.1582826837964%40wamui-lucy.atl.sa.earthlink.net.

Peter Kleiweg

unread,
Feb 27, 2020, 3:25:51 PM2/27/20
to golang-nuts
GODEBUG=noasyncpreempt=1 makes no difference.

I added the option -race and I got some warnings from that, all happening in a call to reactor.Run().
When I disable all tests that use reactor.Run() the test run no longer freezes. So I have to look at
the implementation of the reactor. 

I still get the interrupted system calls, so I have to fix those too.

It looks like these are two different issues.


Op donderdag 27 februari 2020 19:07:54 UTC+1 schreef Robert Engels:
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.

Robert Engels

unread,
Feb 27, 2020, 4:36:20 PM2/27/20
to Peter Kleiweg, golang-nuts
As Ian pointed out you need to use 

GODEBUG=asyncpreemptoff=1

On Feb 27, 2020, at 2:26 PM, Peter Kleiweg <pkle...@xs4all.nl> wrote:


To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/a95feca4-17f4-4a43-80c7-3adc76d0cabf%40googlegroups.com.

Peter Kleiweg

unread,
Feb 28, 2020, 10:04:01 AM2/28/20
to golang-nuts
Retry after EINTR solved the code lock-up too.

Op woensdag 26 februari 2020 12:33:05 UTC+1 schreef Peter Kleiweg:
With Go version 1.14 I get a lot of errors when I run:

    go test -v github.com/pebbe/zmq4

I didn't see this with Go 1.13.8 or any earlier version.

Is this a problem with Go 1.14, or am I doing something wrong and just got lucky until now?

How do I debug this? The errors are different for each run. Below is a sample of some errors.
Line numbers are not always accurate, because I inserted some calls to test.Log().

    === RUN   TestSocketEvent
        TestSocketEvent: socketevent_test.go:73: rep.Bind: interrupted system call


    === RUN   TestMultipleContexts
        TestMultipleContexts: zmq4_test.go:131: sock1.Connect: interrupted system call

    freeze:
    === RUN   TestMultipleContexts
    ^CFAIL  github.com/pebbe/zmq4   30.226s

    freeze:
    === RUN   TestMultipleContexts
        TestMultipleContexts: zmq4_test.go:148: sock1.RecvMessage: expected <nil> [tcp://127.0.0.1:9997 tcp://127.0.0.1:9997], got interrupted system call []
    ^CFAIL  github.com/pebbe/zmq4   21.445s



    freeze:
    === RUN   TestSecurityCurve
    ^CFAIL  github.com/pebbe/zmq4   31.143s



    freeze:
    === RUN   TestSecurityNull
        TestSecurityNull: zmq4_test.go:1753: server.Recv 1: resource temporarily unavailable
    ^CFAIL  github.com/pebbe/zmq4   44.828s


    === RUN   TestDisconnectInproc
        TestDisconnectInproc: zmq4_test.go:523: Poll: interrupted system call
        TestDisconnectInproc: zmq4_test.go:623: isSubscribed

    === RUN   TestHwm
        TestHwm: zmq4_test.go:823: bind_socket.Bind: interrupted system call
        TestHwm: zmq4_test.go:1044: test_inproc_bind_first(0, 0): expected 10000, got -1

    freeze:
    === RUN   TestSecurityPlain
    ^CFAIL  github.com/pebbe/zmq4   46.395s

    === RUN   TestPairIpc
        TestPairIpc: zmq4_test.go:1124: client.Send SNDMORE|DONTWAIT: interrupted system call

Robert Engels

unread,
Feb 28, 2020, 10:13:50 AM2/28/20
to Peter Kleiweg, golang-nuts

Can you clarify that a bit? Did you change the code to look for EINTR errors and then retry the system call?
-----Original Message-----
From: Peter Kleiweg
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Peter Kleiweg

unread,
Feb 28, 2020, 10:18:26 AM2/28/20
to golang-nuts
Op vrijdag 28 februari 2020 16:13:50 UTC+1 schreef Robert Engels:

Can you clarify that a bit? Did you change the code to look for EINTR errors and then retry the system call?

Yes, I did. But as an option that must be enabled by the user.
 
To unsubscribe from this group and stop receiving emails from it, send an email to golan...@googlegroups.com.

Ian Lance Taylor

unread,
Feb 28, 2020, 10:57:00 AM2/28/20
to Peter Kleiweg, golang-nuts
On Fri, Feb 28, 2020 at 7:18 AM Peter Kleiweg <pkle...@xs4all.nl> wrote:
>
> Op vrijdag 28 februari 2020 16:13:50 UTC+1 schreef Robert Engels:
>>
>>
>> Can you clarify that a bit? Did you change the code to look for EINTR errors and then retry the system call?
>
>
> Yes, I did. But as an option that must be enabled by the user.

I don't understand why you're making it an option. The README
suggests that you would not want to enable it if you want to handle
^C, but in Go the ^C will be delivered on a channel, presumably to a
separate goroutine. At that point your program will either exit or do
some other operation. If the program doesn't exit, then it's not
going to want the interrupted system call to fail. It's going to want
it to be retried.

(As a minor side note, calls like getsockopt will never return EINTR,
it's not necessary to retry them. But it doesn't hurt.)

Ian
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/7659417a-8b56-4a7d-9ae4-91878c7899e1%40googlegroups.com.

Peter Kleiweg

unread,
Feb 28, 2020, 11:13:14 AM2/28/20
to golang-nuts
Op vrijdag 28 februari 2020 16:57:00 UTC+1 schreef Ian Lance Taylor:
On Fri, Feb 28, 2020 at 7:18 AM Peter Kleiweg <pkle...@xs4all.nl> wrote:
>
> Op vrijdag 28 februari 2020 16:13:50 UTC+1 schreef Robert Engels:
>>
>>
>> Can you clarify that a bit? Did you change the code to look for EINTR errors and then retry the system call?
>
>
> Yes, I did. But as an option that must be enabled by the user.

I don't understand why you're making it an option.  The README
suggests that you would not want to enable it if you want to handle
^C, but in Go the ^C will be delivered on a channel, presumably to a
separate goroutine.  At that point your program will either exit or do
some other operation.  If the program doesn't exit, then it's not
going to want the interrupted system call to fail.  It's going to want
it to be retried.

I leave it to the end user to decide. I was inspired by this: http://250bpm.com/blog:12
 
(As a minor side note, calls like getsockopt will never return EINTR,
it's not necessary to retry them.  But it doesn't hurt.)

zmq_getsockopt can return EINTR says the man page.

And some zmq functions can return EINTR even though their man page doesn't mention it.

Manlio Perillo

unread,
Feb 28, 2020, 11:27:15 AM2/28/20
to golang-nuts
On Friday, February 28, 2020 at 4:57:00 PM UTC+1, Ian Lance Taylor wrote:
On Fri, Feb 28, 2020 at 7:18 AM Peter Kleiweg <pkle...@xs4all.nl> wrote:
>
> Op vrijdag 28 februari 2020 16:13:50 UTC+1 schreef Robert Engels:
>>
>>
>> Can you clarify that a bit? Did you change the code to look for EINTR errors and then retry the system call?
>
>
> Yes, I did. But as an option that must be enabled by the user.

I don't understand why you're making it an option.  The README
suggests that you would not want to enable it if you want to handle
^C, but in Go the ^C will be delivered on a channel, presumably to a
separate goroutine.  At that point your program will either exit or do
some other operation.  If the program doesn't exit, then it's not
going to want the interrupted system call to fail.  It's going to want
it to be retried.


But what if the program don't want to call os.Exit from the goroutine handling signals, because the function calling a slow syscall want to return from the function normally?


Manlio

Ian Lance Taylor

unread,
Feb 28, 2020, 11:36:09 AM2/28/20
to Manlio Perillo, golang-nuts
To me that sounds like a theoretical argument that will never arise in
an actual program. There is a reason to write C programs that way,
because it's annoying to have multiple threads of execution, but there
is no reason to ever write Go programs that way. Go programs always
have multiple threads of execution. Just let a goroutine sit in the
slow syscall; who cares?

Ian

Ian Lance Taylor

unread,
Feb 28, 2020, 11:58:49 AM2/28/20
to Peter Kleiweg, golang-nuts
On Fri, Feb 28, 2020 at 8:13 AM Peter Kleiweg <pkle...@xs4all.nl> wrote:
>
> Op vrijdag 28 februari 2020 16:57:00 UTC+1 schreef Ian Lance Taylor:
>>
>> On Fri, Feb 28, 2020 at 7:18 AM Peter Kleiweg <pkle...@xs4all.nl> wrote:
>> >
>> > Op vrijdag 28 februari 2020 16:13:50 UTC+1 schreef Robert Engels:
>> >>
>> >>
>> >> Can you clarify that a bit? Did you change the code to look for EINTR errors and then retry the system call?
>> >
>> >
>> > Yes, I did. But as an option that must be enabled by the user.
>>
>> I don't understand why you're making it an option. The README
>> suggests that you would not want to enable it if you want to handle
>> ^C, but in Go the ^C will be delivered on a channel, presumably to a
>> separate goroutine. At that point your program will either exit or do
>> some other operation. If the program doesn't exit, then it's not
>> going to want the interrupted system call to fail. It's going to want
>> it to be retried.
>
>
> I leave it to the end user to decide. I was inspired by this: http://250bpm.com/blog:12

That blog post is about programming in C. Go is different. In C it
makes sense to let the program decide how to handle EINTR. In Go, in
my opinion, it does not.


>> (As a minor side note, calls like getsockopt will never return EINTR,
>> it's not necessary to retry them. But it doesn't hurt.)
>
>
> zmq_getsockopt can return EINTR says the man page.

Odd. OK.

Ian

Manlio Perillo

unread,
Feb 28, 2020, 12:14:29 PM2/28/20
to golang-nuts
On Friday, February 28, 2020 at 5:36:09 PM UTC+1, Ian Lance Taylor wrote:
An user running a client program from a terminal may care.
If it takes too long to read data from a remote server, an user expects that ^C will interrupt the program.

However a solution is to register an atexit handler using a closure to do some cleanup, so probably this is not an issue worth making the Go runtime more complex.

Manlio
Ian

Ian Lance Taylor

unread,
Feb 28, 2020, 12:29:56 PM2/28/20
to Manlio Perillo, golang-nuts
In Go, a ^C will interrupt a program if you write code like

c := make(chan os.Signal, 1)
signal.Notify(c, syscall.SIGINT)
go func() {
<-c
fmt.Printf("exiting due to ^C")
os.Exit(1)
}()

That process is entirely independent of whether the function zmq4.Poll
returns EINTR or not.

Ian

Manlio Perillo

unread,
Feb 28, 2020, 12:38:46 PM2/28/20
to golang-nuts
On Friday, February 28, 2020 at 6:29:56 PM UTC+1, Ian Lance Taylor wrote:
On Fri, Feb 28, 2020 at 9:14 AM Manlio Perillo <manlio...@gmail.com> wrote:
>
> On Friday, February 28, 2020 at 5:36:09 PM UTC+1, Ian Lance Taylor wrote:
>>
>> On Fri, Feb 28, 2020 at 8:27 AM Manlio Perillo <manlio...@gmail.com> wrote:
>> >
> [...] 
>>  Go programs always
>> have multiple threads of execution.  Just let a goroutine sit in the
>> slow syscall; who cares?
>>
>
> An user running a client program from a terminal may care.
> If it takes too long to read data from a remote server, an user expects that ^C will interrupt the program.
>
> However a solution is to register an atexit handler using a closure to do some cleanup, so probably this is not an issue worth making the Go runtime more complex.

In Go, a ^C will interrupt a program if you write code like

c := make(chan os.Signal, 1)
signal.Notify(c, syscall.SIGINT)
go func() {
    <-c
    fmt.Printf("exiting due to ^C")
    os.Exit(1)
}()

That process is entirely independent of whether the function zmq4.Poll
returns EINTR or not.


Calling os.Exit will cause the deferred functions to not be called.


Manlio 

Ian Lance Taylor

unread,
Feb 28, 2020, 2:36:20 PM2/28/20
to Manlio Perillo, golang-nuts
Sure, there are many different ways to organize this. You're right:
you have to be aware that zmq4.Poll can block for a while, and won't
be interrupted if a signal occurs. That is exactly how every other Go
I/O function works; for example, net.Conn.Read and net.Conn.Write
behave that way. That's how Go works.

Ian

Liam

unread,
Mar 3, 2020, 10:21:06 AM3/3/20
to golang-nuts
Reply all
Reply to author
Forward
0 new messages