net/http test failure on windows

788 views
Skip to first unread message

Dave Cheney

unread,
Apr 24, 2012, 9:54:09 PM4/24/12
to golan...@googlegroups.com
Hi,

I would like to exclude windows from running the faulting http test until someone has the time to look at it.

What does everyone think?

Dave

Russ Cox

unread,
Apr 24, 2012, 10:11:20 PM4/24/12
to Dave Cheney, golan...@googlegroups.com
Please hg undo gustavo's CL instead.
That's better motivation to fix things than
a disabled test and a TODO we'll forget about.

Thanks.
Russ

Dave Cheney

unread,
Apr 24, 2012, 10:49:49 PM4/24/12
to r...@golang.org, golan...@googlegroups.com
I would prefer to leave the test broken on win64 rather than roll back
the CL. I think it may be a local problem with that builder as it
hasn't broken on any other builder, even the 32bit version of windows.

When http/conn.serve() exits, it calls c.close() which will send a FIN
to the other side. This error looks more like an RST is being sent to
the client, but not from the server.

Random googling suggests that this might be related to SYN protection
inside the OS, http://msdn.microsoft.com/en-us/library/ms187005.aspx.

Cheers

Dave

Russ Cox

unread,
Apr 24, 2012, 11:23:31 PM4/24/12
to Dave Cheney, golan...@googlegroups.com
On Tue, Apr 24, 2012 at 22:49, Dave Cheney <da...@cheney.net> wrote:
> I would prefer to leave the test broken on win64 rather than roll back
> the CL. I think it may be a local problem with that builder as it
> hasn't broken on any other builder, even the 32bit version of windows.

It's not okay to leave the test broken. Then when we break other
things we don't find out.

I will roll back the CL. It isn't a judgement about the CL, and
I certainly expect the rollback to be rolled back once we understand
things. But for now it is keeping us from noticing other breakages
and that can't be allowed to go on too long.

Russ

Gustavo Niemeyer

unread,
May 16, 2012, 11:02:23 PM5/16/12
to r...@golang.org, Dave Cheney, golan...@googlegroups.com
Has anyone had time to investigate why Windows 64 breaks when a
connection is closed on its face?

It's sad to have the http package misbehaving with clients asking for
Connection: close. It'd be great to have that issue sorted in time for
Go 1.1.
gustavo @ http://niemeyer.net

brainman

unread,
May 17, 2012, 3:42:09 AM5/17/12
to golan...@googlegroups.com, r...@golang.org, Dave Cheney
On Thursday, 17 May 2012 13:02:23 UTC+10, Gustavo Niemeyer wrote:
Has anyone had time to investigate why Windows 64 breaks when a
connection is closed on its face?



The test that fails is net/http.TestServerExpect. I am not sure that test logic is valid. Here is I took http out of the picture to demonstrate similar problem:

func TestTCPClose(t *testing.T) {
       msg := []byte("hello")
       l, err := Listen("tcp", "127.0.0.1:0")
       if err != nil {
               t.Fatalf("Listen failed: %v", err)
       }
       defer l.Close()

       go func() {
               b := make([]byte, len(msg))
               for {
                       c, err := l.Accept()
                       if err != nil {
                               break
                       }

                       _, err = io.ReadFull(c, b)
                       if err != nil {
                               c.Close()
                               t.Fatalf("Server read failed: %v", err)
                       }

                       _, err = c.Write(msg)
                       if err != nil {
                               c.Close()
                               t.Fatalf("Server write failed: %v", err)
                       }

                       c.Close()
               }
       }()

       for i := 0; i < 100; i++ {
               b := make([]byte, len(msg))

               c, err := Dial("tcp", l.Addr().String())
               if err != nil {
                       t.Fatalf("Dial failed: %v", err)
               }

               _, err = c.Write(msg)
               if err != nil {
                       c.Close()
                       t.Fatalf("Client write failed: %v", err)
               }

               _, err = c.Write(msg)
               if err != nil {
                       c.Close()
                       t.Fatalf("Client write failed: %v", err)
               }

               _, err = io.ReadFull(c, b)
               if err != nil {
                       c.Close()
                       t.Fatalf("Client read failed: %v", err)
               }
               c.Close()
       }
}

Should this PASS or FAIL ?

Peer reads / writes are mismatched:

1) client sends packet;
2) server receives packets (1);
... so far they are in sync ...
3) client sends another packet;
4) server sends packet too;
... now we have a problem - client sent packet, but server have no
plans to receive it ...
5) server done its task - it closes connection;
6) client receives packet (4);
7) client done its task - it closes connection.

Client sends some data and server intentionaly ignores it. I think we should decide if that is behaviour we want to support and in what form.

If we say that our net package should support this, then, it looks to me, that the problem is in net/fd_windows.go during connection closing.

The way I understand our code, it uses syscall.SetNonblock to make sure that socketclose would block if there is pending data to be sent. But windows syscall.SetNonblock does nothing - it is a NOOP.

If that is what we want to do, then we should use syscall.setsockopt(SO_LINGER) instead - see http://goo.gl/BgBBA for closesocket doco. We do not set SO_LINGER anywhere, and default setting here is l_onoff = 0, l_linger = 0 - means closesocket closes socket hard and returns immediately. So that is what we are seeing. Alternatively we could call shutdown(READ|WRITE) before closesocket.

Also windows syscall.Linger is wrong: it should be 2 x uint16, not 2 x int32. We have never used syscall.Linger, so it has never been tested. So, if we are to use it, we would have to correct the structure, and that means "api change".

So here is what I need:

1) If we could decide that TestTCPClose should PASS;
1) If someone could confirm my assumption that "we want closesocket to block";
2) If we are happy to use "LINGER" to control that;
3) If we are OK to do "api change" to accomodate change of syscall.Linger;
4) Should I include above mentioned TestTCPClose as part of go build?

Then I will send changes and we could reapply

changeset:   13081:97d027b3aa68
date:        Mon Apr 23 22:00:16 2012 -0300
summary:     net/http: allow clients to disable keep-alive

again.

Thank you.

Alex

Gustavo Niemeyer

unread,
May 17, 2012, 11:28:08 PM5/17/12
to brainman, golan...@googlegroups.com, r...@golang.org, Dave Cheney
On Thu, May 17, 2012 at 4:42 AM, brainman <alex.b...@gmail.com> wrote:
> Peer reads / writes are mismatched:
(...)
> Client sends some data and server intentionaly ignores it. I think we should
> decide if that is behaviour we want to support and in what form.

I don't quite get how this is an issue in the case of the net/http
test. The sequence of operations in the test are:

(1) Client connects to server
(2) Client sends POST
(3) Client attempts to send body
(4) Concurrently with 3, server responds with unauthorized and closes
(5) Concurrently with 2, 3 and 4, client tries to read response from server
(6) Client gets connection closed while *reading*!? What about the
data from (4)!?

This is the error from (6):

--- FAIL: TestServerExpect (0.01 seconds)
serve_test.go:691: ReadString: WSARecv tcp 127.0.0.1:62344: An
existing connection was forcibly closed by the remote host.

(why "forcibly", btw?)

I'm sending attached a dump from the network packets in case you want
to investigate a successful run. It seems quite straightforward.. I'm
curious to see an equivalent dump for a failed run.


gustavo @ http://niemeyer.net
unauthorized-100-bytes-body.dump

brainman

unread,
May 18, 2012, 12:53:43 AM5/18/12
to golan...@googlegroups.com, brainman, r...@golang.org, Dave Cheney
On Friday, 18 May 2012 13:28:08 UTC+10, Gustavo Niemeyer wrote:
> > Peer reads / writes are mismatched:
> (...)
>
> I don't quite get how this is an issue in the case of the net/http
> test.

In net/http test server disregards client's data (3) and request connection to be closed before io is completed. So, it is similar to my example.

> ... The sequence of operations in the test are:

>
> (1) Client connects to server
> (2) Client sends POST
> (3) Client attempts to send body
> (4) Concurrently with 3, server responds with unauthorized and closes
> (5) Concurrently with 2, 3 and 4, client tries to read response from server
> (6) Client gets connection closed while *reading*!? What about the
> data from (4)!?

Like I said before (please re-read my previous message), our TCP connections are configured to be closed *immediately*. I AM JUST GUESSING HERE, but, perhaps, FIN packet is sent immediately, regardless of where data from (4) is - it could be sitting in the server network buffers for all I know. FIN packet arrives to the client, but network stack sees there are still some unacknowledged io or something. So it says, you know what, I am closing this connection, and client read gets an error.


>
> This is the error from (6):
>
> --- FAIL: TestServerExpect (0.01 seconds)
> serve_test.go:691: ReadString: WSARecv tcp 127.0.0.1:62344: An
> existing connection was forcibly closed by the remote host.
>
> (why "forcibly", btw?)
>

Like I just said.


> I'm sending attached a dump from the network packets in case you want
> to investigate a successful run. It seems quite straightforward.. I'm
> curious to see an equivalent dump for a failed run.

I tried to use WinDump program, but, it appears, it cannot monitor localhost. If you know a way, let me know.

Alex

Gustavo Niemeyer

unread,
May 18, 2012, 10:52:06 AM5/18/12
to brainman, golan...@googlegroups.com, r...@golang.org, Dave Cheney
On Fri, May 18, 2012 at 1:53 AM, brainman <alex.b...@gmail.com> wrote:
> Like I said before (please re-read my previous message), our TCP connections
> are configured to be closed *immediately*. I AM JUST GUESSING HERE, but,
> perhaps, FIN packet is sent immediately, regardless of where data from (4)

As I mentioned in my previous message, this sounds like buggy
behavior. If data + FIN is sent.. data should be read before socket is
considered closed.

> I tried to use WinDump program, but, it appears, it cannot monitor
> localhost. If you know a way, let me know.

Sorry, I know nothing about Windows these days.


gustavo @ http://niemeyer.net

Daniel Theophanes

unread,
May 18, 2012, 12:37:18 PM5/18/12
to golan...@googlegroups.com, brainman, r...@golang.org, Dave Cheney
On Thursday, May 17, 2012 9:53:43 PM UTC-7, brainman wrote:
I tried to use WinDump program, but, it appears, it cannot monitor localhost. If you know a way, let me know.

 
Would Fiddler2 be low level enough (http://www.fiddler2.com/)?

Russ Cox

unread,
May 18, 2012, 1:45:18 PM5/18/12
to brainman, golan...@googlegroups.com, Dave Cheney
On Thu, May 17, 2012 at 3:42 AM, brainman <alex.b...@gmail.com> wrote:
> 1) If we could decide that TestTCPClose should PASS;

I can't think of any reason it should fail. The client Close doesn't
know that the server isn't going to process the data, and the server
Close shouldn't care that there is data waiting to be read.

> 1) If someone could confirm my assumption that "we want closesocket to
> block";

I am not sure what block means in this context. We certainly want
Close to flush any written data so that it gets sent over the network.
If that's what block means, then yes.

That is, if there were a test that did

server:
100x:
accept, read "hello", close

client:
100x:
dial, write "hello", close

Then the server should see "hello" in all 100 cases. The close should
not keep the client from sending the written data.

> 2) If we are happy to use "LINGER" to control that;

I'm happy to use whatever we need to use.

> 3) If we are OK to do "api change" to accomodate change of syscall.Linger;

The current definition is completely wrong in syscall, so I think we
can correct it. An alternative would be to define the right structure
as syscall.linger and translate during the call.

> 4) Should I include above mentioned TestTCPClose as part of go build?

Sure.

Russ

brainman

unread,
May 20, 2012, 6:14:54 AM5/20/12
to golan...@googlegroups.com, brainman, r...@golang.org, Dave Cheney
On Saturday, May 19, 2012 12:52:06 AM UTC+10, Gustavo Niemeyer wrote:

As I mentioned in my previous message, this sounds like buggy
behavior. If data + FIN is sent.. data should be read before socket is
considered closed.


I changed my test little bit to do SetLinger(0) at the start

package net

import (
"testing"
"io"
)

func TestTCPClose(t *testing.T) {
       msg := []byte("hello")
       l, err := Listen("tcp", "127.0.0.1:0")
       if err != nil {
               t.Fatalf("Listen failed: %v", err)
       }
       defer l.Close()
       go func() {
               b := make([]byte, len(msg))
               for {
                       c, err := l.Accept()
                       if err != nil {
                               break
                       }
                       err = c.(*TCPConn).SetLinger(0)
                       if err != nil {
                               c.Close()
                               t.Fatalf("SetLinger failed: %v", err)

and now it fails on Linux

$ go test -v -run=TCPClose
=== RUN TestTCPClose
--- FAIL: TestTCPClose (0.05 seconds)
        alex_test.go:54: Client write failed: write tcp 127.0.0.1:37913: connection reset by peer
FAIL
exit status 1
FAIL    net     0.078s

So it is Linger setting that is important here. We use whatever default value OS provides, and these are different on Windows and Linux.

Alex

brainman

unread,
May 20, 2012, 6:15:50 AM5/20/12
to golan...@googlegroups.com, brainman, r...@golang.org, Dave Cheney
On Saturday, May 19, 2012 2:37:18 AM UTC+10, Daniel Theophanes wrote:
 
Would Fiddler2 be low level enough (http://www.fiddler2.com/)?

Thank you. But I can reproduce the problem on Linux now.

Alex 

brainman

unread,
May 20, 2012, 6:32:13 AM5/20/12
to golan...@googlegroups.com, brainman, Dave Cheney, r...@golang.org
On Saturday, May 19, 2012 3:45:18 AM UTC+10, rsc wrote:

> 1) If we could decide that TestTCPClose should PASS;

I can't think of any reason it should fail. ...

There is outstanding data sent by the client and not received by the server before both issue connection close. I am certain, whatever we do in the net package, this will cause some problem one day. In tcp your reads and writes should be matched, especially if you want not to lose data at the end of your connection. I personally always use shutdown(write) and wait for read to return eof, if I care about not to lose any bits at the end.
 
The client Close doesn't
know that the server isn't going to process the data, and the server
Close shouldn't care that there is data waiting to be read.

I think, my TestTCPClose contradicts your assumptions.
 
I am not sure what block means in this context.  We certainly want
Close to flush any written data so that it gets sent over the network.
 If that's what block means, then yes.

See closesocket manual for details http://goo.gl/FXAfS. But, if we set Linger accordingly, we should get closer to what you want.
 
That is, if there were a test that did

server:
    100x:
        accept, read "hello", close

client:
    100x:
        dial, write "hello", close

Then the server should see "hello" in all 100 cases.  The close should
not keep the client from sending the written data.

Yes, setting linger appropriately here should give you that. Alternatively, you could use shutdown, as I described before.
 

> 3) If we are OK to do "api change" to accomodate change of syscall.Linger;

The current definition is completely wrong in syscall, so I think we
can correct it.  ...

Yes, it is completely wrong, and I would like to correct it. What should I do about api command failing during build?

Alex 

Gustavo Niemeyer

unread,
May 20, 2012, 3:19:30 PM5/20/12
to brainman, golang-dev
On Sun, May 20, 2012 at 7:09 AM, brainman <alex.b...@gmail.com> wrote:
> I just modified my program slightly - I added call to SetLinger(0) at the
> start of connection:
(...)
> and now my test successfully fails on Linux too:

Have you managed to make the original net/http test case fail with that?

I've tried, and it still works. Maybe I'm doing something wrong?

(...)
> some unaccounted data there that client sends and server never receives. If
> you want data to be "ignored" by the server, then the server should read
> data from connection and discard it,

In the particular case of the net/http failing test case, the server
is sending data before closing the connection, and the client has not
closed the connection yet at this point. My understanding is still
that this data should arrive.


gustavo @ http://niemeyer.net

brainman

unread,
May 20, 2012, 7:48:12 PM5/20/12
to golan...@googlegroups.com, brainman
On Monday, 21 May 2012 05:19:30 UTC+10, Gustavo Niemeyer wrote:

Have you managed to make the original net/http test case fail with that?

I've tried, and it still works. Maybe I'm doing something wrong?


If I apply this:

diff -r 3b1efbd100bb src/pkg/net/http/serve_test.go
--- a/src/pkg/net/http/serve_test.go    Mon May 14 10:05:39 2012 +1000
+++ b/src/pkg/net/http/serve_test.go    Mon May 21 09:42:48 2012 +1000
@@ -630,7 +630,7 @@
     // so it never reads the body.
     {100, "100-continue", false, "401 Unauthorized"},
     // Likewise without 100-continue:
-    {100, "", false, "401 Unauthorized"},
+    {1000000, "", false, "401 Unauthorized"},
 
     // Non-standard expectations are failures
     {0, "a-pony", false, "417 Expectation Failed"},
@@ -660,11 +660,19 @@
         if err != nil {
             t.Fatalf("Dial: %v", err)
         }
+        err = conn.(*net.TCPConn).SetLinger(0)
+        if err != nil {
+            t.Fatalf("Linger: %v", err)
+        }
         defer conn.Close()
         sendf := func(format string, args ...interface{}) {
             _, err := fmt.Fprintf(conn, format, args...)
             if err != nil {
-                t.Fatalf("On test %#v, error writing %q: %v", test, format, err)
+                f := format
+                if len(f) > 20 {
+                    f = f[:20] + "..."
+                }
+                t.Fatalf("On test %#v, error writing %q: %v", test, f, err)
             }
         }
         go func() {
 
to "hg id"=3b1efbd100bb, then I get

=== RUN TestServerExpect
--- FAIL: TestServerExpect (0.04 seconds)
    serve_test.go:675: On test http_test.serverExpectTest{contentLength:1000000, expectation:"", readBody:false, expectedResponse:"401 Unauthorized"}, error writing "AAAAAAAAAAAAAAAAAAAA...": write tcp 127.0.0.1:43127: connection reset by peer
FAIL
exit status 1
FAIL    net/http    0.071s

Like I said earlier, result is unpredictable in corner cases.

Alex

Gustavo Niemeyer

unread,
May 20, 2012, 9:59:39 PM5/20/12
to brainman, golan...@googlegroups.com
On Sun, May 20, 2012 at 8:48 PM, brainman <alex.b...@gmail.com> wrote:
> On Monday, 21 May 2012 05:19:30 UTC+10, Gustavo Niemeyer wrote:
>> Have you managed to make the original net/http test case fail with that?
>>
>> I've tried, and it still works. Maybe I'm doing something wrong?
>
> If I apply this:

You've changed the test in a different way, and it's also failing in a
different way.


gustavo @ http://niemeyer.net

brainman

unread,
May 20, 2012, 10:08:35 PM5/20/12
to golan...@googlegroups.com, brainman
On Monday, 21 May 2012 11:59:39 UTC+10, Gustavo Niemeyer wrote:

You've changed the test in a different way, and it's also failing in a
different way.


Can you be more specific. Which of my changes are unacceptable in this test and why? Thank you.

Alex

Gustavo Niemeyer

unread,
May 20, 2012, 10:22:18 PM5/20/12
to brainman, golan...@googlegroups.com
I haven't said your changes were unacceptable. I've stated that you
changed the test in a different way, and got a different failure in a
different location. If you simply put Linger to zero on Linux, the
net/http test still does not fail.

That said, if the connection is closed and one continues writing on
it, it will fail eventually of course. That's not a reason to change
the default value for lingering.

What seems curious on the Windows 64 failed test, though, is that the
failure is happening on the read side, completely ignoring the data
that was written by the server.



gustavo @ http://niemeyer.net

brainman

unread,
May 20, 2012, 10:50:44 PM5/20/12
to golan...@googlegroups.com, brainman
On Monday, 21 May 2012 12:22:18 UTC+10, Gustavo Niemeyer wrote:
> ... If you simply put Linger to zero on Linux, the net/http test still does not fail.

I see your point now. Fair enough. But my point here is that we are talking about http here. It should be irrelevant how our tcp connection is configured and how big our read / writes are. It should work regardless.


> What seems curious on the Windows 64 failed test, though, is that the
> failure is happening on the read side, completely ignoring the data
> that was written by the server.

As I said before, according to closesocket manual (http://goo.gl/FXAfS), if you have linger=0, then all bets are off.

Alex

Gustavo Niemeyer

unread,
May 20, 2012, 11:24:53 PM5/20/12
to brainman, golan...@googlegroups.com
On Sun, May 20, 2012 at 11:50 PM, brainman <alex.b...@gmail.com> wrote:
> On Monday, 21 May 2012 12:22:18 UTC+10, Gustavo Niemeyer wrote:
>> ... If you simply put Linger to zero on Linux, the net/http test still
>> does not fail.
>
> I see your point now. Fair enough. But my point here is that we are talking
> about http here. It should be irrelevant how our tcp connection is
> configured and how big our read / writes are. It should work regardless.

I'm not sure about what you mean here, so I apologize in advance if
I'm mischaracterizing your point.

If what you mean is that we should configure the server so that client
side writes on a connection that has been closed will never fail, that
seems wrong. An HTTP server is not forced to read garbage it's not
interested in. Besides common sense, this is also clearly stated in
the RFC:

"(...) If it responds with a final status code, it MAY close the
transport connection or it MAY continue to read and discard the rest
of the request. (...)"

>> What seems curious on the Windows 64 failed test, though, is that the
>> failure is happening on the read side, completely ignoring the data
>> that was written by the server.
>
> As I said before, according to closesocket manual (http://goo.gl/FXAfS), if
> you have linger=0, then all bets are off.

That's not what this page points out.

"If the l_onoff member of the LINGER structure is zero on a stream
socket, the closesocket call will return immediately and does not
receive WSAEWOULDBLOCK whether the socket is blocking or nonblocking.
However, any data queued for transmission will be sent, if possible,
before the underlying socket is closed. This is also called a graceful
disconnect or close. In this case, the Windows Sockets provider cannot
release the socket and other resources for an arbitrary period, thus
affecting applications that expect to use all available sockets. This
is the default behavior for a socket."

On the table:

l_onoff: zero
l_linger: Do not care
Type of close: Graceful close
Wait for close?: No

if that's not working with 100 bytes on a localhost connection, that
documentation is wrong.


gustavo @ http://niemeyer.net

brainman

unread,
May 21, 2012, 12:26:22 AM5/21/12
to golan...@googlegroups.com, brainman
On Monday, 21 May 2012 13:24:53 UTC+10, Gustavo Niemeyer wrote:

> An HTTP server is not forced to read garbage it's not interested in.

Sure. But then it can't expect client to work as expectd.

> ... if that's not working with 100 bytes on a localhost connection, that documentation is wrong.

You right about that. I did misread documentation, rereading it now, it appears, default linger setting is suppose to do what we need. But it does not. Perhaps, there is bug somewhere there. I will play some more.

Alex

Russ Cox

unread,
May 22, 2012, 1:35:15 PM5/22/12
to brainman, golan...@googlegroups.com
Just to close this thread, this moved to CLs. CL 6221059 should fix this bug.
Reply all
Reply to author
Forward
0 new messages