code review 7126043: syscall: introduce WriteNB (issue 7126043)

105 views
Skip to first unread message

da...@cheney.net

unread,
Jan 15, 2013, 7:13:49 PM1/15/13
to golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
Reviewers: golang-dev_googlegroups.com,

Message:
Hello golan...@googlegroups.com,

I'd like you to review this change to
https://code.google.com/p/go


Description:
syscall: introduce WriteNB

Update issue 3412.

This proposal introduces the underlying syscall.WriteNB mechanism to be
used by net.Conn.Write as discussed in CL 6813046. The WriteNB function
is defined for all platforms, but for the moment only issues a
RawSyscall on linux.

Please review this at https://codereview.appspot.com/7126043/

Affected files:
M src/pkg/syscall/syscall_darwin.go
M src/pkg/syscall/syscall_freebsd.go
M src/pkg/syscall/syscall_linux.go
M src/pkg/syscall/syscall_netbsd.go
M src/pkg/syscall/syscall_openbsd.go
M src/pkg/syscall/syscall_unix.go
M src/pkg/syscall/zsyscall_darwin_386.go
M src/pkg/syscall/zsyscall_darwin_amd64.go
M src/pkg/syscall/zsyscall_freebsd_386.go
M src/pkg/syscall/zsyscall_freebsd_amd64.go
M src/pkg/syscall/zsyscall_freebsd_arm.go
M src/pkg/syscall/zsyscall_linux_386.go
M src/pkg/syscall/zsyscall_linux_amd64.go
M src/pkg/syscall/zsyscall_linux_arm.go
M src/pkg/syscall/zsyscall_netbsd_386.go
M src/pkg/syscall/zsyscall_netbsd_amd64.go
M src/pkg/syscall/zsyscall_openbsd_386.go
M src/pkg/syscall/zsyscall_openbsd_amd64.go


ia...@golang.org

unread,
Jan 15, 2013, 7:57:45 PM1/15/13
to da...@cheney.net, golan...@googlegroups.com, re...@codereview-hr.appspotmail.com
exec_windows.go has a SetNonblock call; should Windows also have WriteNB
for consistency?

A random thought: the syscall package knows which descriptors are
non-blocking: they are the ones that have been passed to SetNonblock but
not Close. We could in principle track that and automatically use
non-blocking system calls if the descriptor is non-blocking.



https://codereview.appspot.com/7126043/diff/3019/src/pkg/syscall/syscall_darwin.go
File src/pkg/syscall/syscall_darwin.go (right):

https://codereview.appspot.com/7126043/diff/3019/src/pkg/syscall/syscall_darwin.go#newcode195
src/pkg/syscall/syscall_darwin.go:195: // TODO(dfc) check if writeNB is
truly non blocking on darwin
This TODO seems a bit obscure. You probably mean something like "check
if writes to descriptors with O_NONBLOCK set can never block on Darwin."

https://codereview.appspot.com/7126043/diff/3019/src/pkg/syscall/syscall_unix.go
File src/pkg/syscall/syscall_unix.go (right):

https://codereview.appspot.com/7126043/diff/3019/src/pkg/syscall/syscall_unix.go#newcode154
src/pkg/syscall/syscall_unix.go:154: // non-blocking mode. Do not call
this with a descriptor in blocking mode.
"set to non-blocking mode" by SetNonblock.

https://codereview.appspot.com/7126043/

brad...@golang.org

unread,
Jan 15, 2013, 8:18:34 PM1/15/13
to da...@cheney.net, golan...@googlegroups.com, ia...@golang.org, re...@codereview-hr.appspotmail.com
I agree Windows should have WriteNB for consistency.

> A random thought: the syscall package knows which
> descriptors are non-blocking:
> they are the ones that have been passed to SetNonblock
> but not Close. We could in principle track that
> and automatically use non-blocking system calls if the
> descriptor is non-blocking.

I thought of that too, but any way I could imagine to do that
bookkeeping feels too expensive for what should be just a syscall.


https://codereview.appspot.com/7126043/

Dave Cheney

unread,
Jan 15, 2013, 8:38:58 PM1/15/13
to Brad Fitzpatrick, ia...@golang.org, re...@codereview-hr.appspotmail.com, golan...@googlegroups.com

I would like to disagreed. This change is intended to work hand in hand with the change to fd_unix.go in the net package.  If I updated the description to say this was only for Unix platforms would that help make my case.

That is not saying windows may be able to implement a WriteNB syscall later on, but I don't want to make the scope of this change larger than necessary if possible.

Brad Fitzpatrick

unread,
Jan 15, 2013, 8:42:38 PM1/15/13
to Dave Cheney, ia...@golang.org, re...@codereview-hr.appspotmail.com, golan...@googlegroups.com

Oh, if package net can remain a one line patch I'm fine with pkg syscall inconsistency.

I had forgotten where in net the call site was.

I just didn't want to +build-split up net for one call.

I withdraw my confusion.

Dave Cheney

unread,
Jan 15, 2013, 8:45:33 PM1/15/13
to Brad Fitzpatrick, ia...@golang.org, re...@codereview-hr.appspotmail.com, golan...@googlegroups.com

Yes, we're lucky that this can be a surgical incision. Plan9 and Windows may be able to use a similar approach, but it isn't proven to work on Unix, and may not work the same way on non Unix platforms. 

da...@cheney.net

unread,
Jan 15, 2013, 9:40:41 PM1/15/13
to golan...@googlegroups.com, ia...@golang.org, brad...@golang.org, re...@codereview-hr.appspotmail.com
Please take another look

@iant. as discussed I have avoided adding support for windows at this
time. I like the idea of keeping a record of the blocking state of fds
in the syscall package, and making write/writeNB based on that. If we
can successfully activate writeNB for all unix's, then lets investigate
it then.


https://codereview.appspot.com/7126043/diff/3019/src/pkg/syscall/syscall_darwin.go
File src/pkg/syscall/syscall_darwin.go (right):

https://codereview.appspot.com/7126043/diff/3019/src/pkg/syscall/syscall_darwin.go#newcode195
src/pkg/syscall/syscall_darwin.go:195: // TODO(dfc) check if writeNB is
truly non blocking on darwin
On 2013/01/16 00:57:45, iant wrote:
> This TODO seems a bit obscure. You probably mean something like
"check if
> writes to descriptors with O_NONBLOCK set can never block on Darwin."

Done.

https://codereview.appspot.com/7126043/diff/3019/src/pkg/syscall/syscall_unix.go
File src/pkg/syscall/syscall_unix.go (right):

https://codereview.appspot.com/7126043/diff/3019/src/pkg/syscall/syscall_unix.go#newcode154
src/pkg/syscall/syscall_unix.go:154: // non-blocking mode. Do not call
this with a descriptor in blocking mode.
On 2013/01/16 00:57:45, iant wrote:
> "set to non-blocking mode" by SetNonblock.

Done.

https://codereview.appspot.com/7126043/

sebastien...@gmail.com

unread,
Jan 16, 2013, 3:27:12 AM1/16/13
to da...@cheney.net, golan...@googlegroups.com, ia...@golang.org, brad...@golang.org, re...@codereview-hr.appspotmail.com
Hi Dave,

Here's a bench on a "representative" (to us) box.

Median of 10 runs, 12 cores @ 2.27GHz, linux-2.6.32/amd64:

BenchmarkTCPOneShot 173746 167022 -3.87%
BenchmarkTCPOneShot-2 87490 89568 +2.38%
BenchmarkTCPOneShot-4 45649 45761 +0.25%
BenchmarkTCPOneShot-8 32219 32217 -0.01%
BenchmarkTCPOneShot-12 304442 304319 -0.04%
BenchmarkTCPOneShot-16 309886 309437 -0.14%

BenchmarkTCPOneShotTimeout 198172 197808 -0.18%
BenchmarkTCPOneShotTimeout-2 92151 91371 -0.85%
BenchmarkTCPOneShotTimeout-4 45978 46050 +0.16%
BenchmarkTCPOneShotTimeout-8 32311 32381 +0.22%
BenchmarkTCPOneShotTimeout-12 302204 304874 +0.88%
BenchmarkTCPOneShotTimeout-16 315046 305314 -3.09%

BenchmarkTCPPersistent 55874 56666 +1.42%
BenchmarkTCPPersistent-2 28429 28297 -0.46%
BenchmarkTCPPersistent-4 17330 17380 +0.29%
BenchmarkTCPPersistent-8 18793 18872 +0.42%
BenchmarkTCPPersistent-12 306567 306536 -0.01%
BenchmarkTCPPersistent-16 310066 309814 -0.08%

BenchmarkTCPPersistentTimeout 59690 59889 +0.33%
BenchmarkTCPPersistentTimeout-2 28978 28883 -0.33%
BenchmarkTCPPersistentTimeout-4 17356 17421 +0.37%
BenchmarkTCPPersistentTimeout-8 18847 18815 -0.17%
BenchmarkTCPPersistentTimeout-12 306600 306463 -0.04%
BenchmarkTCPPersistentTimeout-16 309747 310213 +0.15%

So not really conclusive, but not really damaging either.

Sebastien

https://codereview.appspot.com/7126043/

sebastien...@gmail.com

unread,
Jan 16, 2013, 5:02:16 AM1/16/13
to da...@cheney.net, golan...@googlegroups.com, ia...@golang.org, brad...@golang.org, re...@codereview-hr.appspotmail.com
As Remy judiciously noted, actually using the WriteNB call might help in
getting useful results.

I shouldn't run benchmarks on the early morning, sorry for the noise.

Here's an updated run, I've double check the assembly to make sure the
correct RawSyscall was made:

BenchmarkTCPOneShot 173746 142280 -18.11%
BenchmarkTCPOneShot-2 87490 96902 +10.76%
BenchmarkTCPOneShot-4 45649 47590 +4.25%
BenchmarkTCPOneShot-8 32219 31521 -2.17%
BenchmarkTCPOneShot-10 302213 301983 -0.08%
BenchmarkTCPOneShot-12 304442 317153 +4.18%
BenchmarkTCPOneShot-16 309886 312512 +0.85%

BenchmarkTCPOneShotTimeout 198172 214151 +8.06%
BenchmarkTCPOneShotTimeout-2 92151 101076 +9.69%
BenchmarkTCPOneShotTimeout-4 45978 48018 +4.44%
BenchmarkTCPOneShotTimeout-8 32311 31608 -2.18%
BenchmarkTCPOneShotTimeout-10 322381 320882 -0.46%
BenchmarkTCPOneShotTimeout-12 302204 312676 +3.47%
BenchmarkTCPOneShotTimeout-16 315046 305763 -2.95%

BenchmarkTCPPersistent 55874 56794 +1.65%
BenchmarkTCPPersistent-2 28429 32224 +13.35%
BenchmarkTCPPersistent-4 17330 16939 -2.26%
BenchmarkTCPPersistent-8 18793 14062 -25.17%
BenchmarkTCPPersistent-10 303544 302887 -0.22%
BenchmarkTCPPersistent-12 306567 304997 -0.51%
BenchmarkTCPPersistent-16 310066 308193 -0.60%

BenchmarkTCPPersistentTimeout 59690 56994 -4.52%
BenchmarkTCPPersistentTimeout-2 28978 30653 +5.78%
BenchmarkTCPPersistentTimeout-4 17356 17080 -1.59%
BenchmarkTCPPersistentTimeout-8 18847 14125 -25.05%
BenchmarkTCPPersistentTimeout-10 303528 302658 -0.29%
BenchmarkTCPPersistentTimeout-12 306600 304994 -0.52%
BenchmarkTCPPersistentTimeout-16 309747 307902 -0.60%

Results are bit more interesting, provided you're using at least 8 cores
(and not more than eight due to the spurious behavior then) restricted
to the `TCPPersistent' use-case..

As a side note, whilst some syscall are non-blocking, they're still not
free. The performance deterioration might just come from not "cheating"
by artificially increasing the thread count.

Sebastien

https://codereview.appspot.com/7126043/

Dave Cheney

unread,
Jan 17, 2013, 6:04:18 AM1/17/13
to da...@cheney.net, golan...@googlegroups.com, ia...@golang.org, brad...@golang.org, sebastien...@gmail.com, re...@codereview-hr.appspotmail.com
Hi Sebastien,

Thank you for taking the time to benchmark this change. I've done some
benchmarking on a small 4 core 386 system and found results that are
equally unimpressive. In summary (see results.txt), I can see no
benefit large enought to offset the code and Symbol bloat this change
would cost -- which really sucks. On paper it makes sense, but in
practice, right now, it just doesn't pay its way.

Do you think that there is anything I have missed ?

Cheers

Dave
results.txt

Sébastien

unread,
Jan 18, 2013, 4:34:08 PM1/18/13
to Dave Cheney, dvy...@google.com, golan...@googlegroups.com, ia...@golang.org, brad...@golang.org, re...@codereview-hr.appspotmail.com
Hi Dave,

Do you think that there is anything I have missed ?

Given that I made independently made the same tests & bench one few weeks ago, with the same conclusion I doubt you have. Tagging accept and connect as non-blocking has the same adverse effect.

My best (guess-only) explanations are still that syscalls switched to non-blocking
 - do occasionally block (suggested by Iant)
 - happen to adversely disturb the scheduler (by offering less/different resumption points)
 - are not cheap enough (whilst being non-blocking) and parking goroutines actually masquerades part of that cost.

Blaming the scheduler is tempting, but I can't see any other rational explanation.

Dmitriy may however have a deeper insight/feeling on the subject..

Cheers,
Sebastien


<results.txt>

Dave Cheney

unread,
Jan 18, 2013, 5:40:57 PM1/18/13
to Sébastien, dvy...@google.com, golan...@googlegroups.com, ia...@golang.org, brad...@golang.org, re...@codereview-hr.appspotmail.com
> Given that I made independently made the same tests & bench one few weeks
> ago, with the same conclusion I doubt you have. Tagging accept and connect
> as non-blocking has the same adverse effect.

Indeed, from the profiles I've taken the CPU time does move, as
expected, from syscall.Syscall to syscall.RawSyscall, but the effect
on the bottom line runtime and throughput is unaffected.

>> My best (guess-only) explanations are still that syscalls switched to
> non-blocking
> - do occasionally block (suggested by Iant)

Yes, and even if they don't block under a specific version of Linux,
I'd struggle to believe that guarentee exists for all supported POSIX
systems.

> - happen to adversely disturb the scheduler (by offering less/different
> resumption points)
> - are not cheap enough (whilst being non-blocking) and parking goroutines
> actually masquerades part of that cost.

If you are suggesting the cost of the syscall event is more than the
goroutine park/handoff, I could buy that.

> Blaming the scheduler is tempting, but I can't see any other rational
> explanation.

Or maybe we should be blaming our TCP stacks ? Maybe that is just as
fast as we can push data through the syscall interface. I don't know.

dvy...@google.com

unread,
Jan 23, 2013, 2:56:27 AM1/23/13
to da...@cheney.net, golan...@googlegroups.com, ia...@golang.org, brad...@golang.org, sebastien...@gmail.com, re...@codereview-hr.appspotmail.com
There are other scalability bottlenecks (scheduler, epoll poller), so it
can be the case that contention increases in other bottlenecks, they
degrade and net result is small.

Eventually we need to handle net read/write calls better one way or
another. This looks like the simplest option.



https://codereview.appspot.com/7126043/
Reply all
Reply to author
Forward
0 new messages