Goroutines, Nonblocking I/O, And Memory Usage

272 views
Skip to first unread message

Rio

unread,
Mar 14, 2018, 2:58:46 PM3/14/18
to golang-nuts
While implementing a SOCKS proxy in Go, I ran into an issue which is better explained in details by Evan Klitzke in this post https://eklitzke.org/goroutines-nonblocking-io-and-memory-usage 

In my case, each proxied connection costs two goroutines and two buffers in blocking read. For TCP connections the buffer size can be small (e.g. 2kb), so the overhead per proxied TCP connection is 8kb (2 x 2kb goroutine stack + 2 x 2kb read buffer). For UDP connections the buffer size must be large enough to hold the largest packet due to the nature of packet-oriented network, so the overhead per proxied UDP connection is 132kb (2 x 2kb goroutine stack + 2 x 64kb read buffer for largest UDP packet). Handling 10,000 UDP proxied connections requires at least 1.25GB memory, which is unnecessary if there's a way to poll I/O readiness and use a shared read buffer. 

I'm wondering if there's a better way other than calling syscall.Epoll/Kqueue to create custom poller?

Ian Lance Taylor

unread,
Mar 14, 2018, 3:37:51 PM3/14/18
to Rio, golang-nuts
Even for TCP, that's an interesting point. I wonder if we should have
a way to specify a number of bytes to read such that we only allocate
the []byte when there is something to read.

Ian

Bakul Shah

unread,
Mar 14, 2018, 4:23:41 PM3/14/18
to Rio, golang-nuts
On Wed, 14 Mar 2018 11:58:46 -0700 Rio <m...@riobard.com> wrote:
>
> While implementing a SOCKS proxy in Go, I ran into an issue which is better
> explained in details by Evan Klitzke in this post
> https://eklitzke.org/goroutines-nonblocking-io-and-memory-usage
>
> In my case, each proxied connection costs two goroutines and two buffers in
> blocking read. For TCP connections the buffer size can be small (e.g. 2kb),
> so the overhead per proxied TCP connection is 8kb (2 x 2kb goroutine stack
> + 2 x 2kb read buffer). For UDP connections the buffer size must be large
> enough to hold the largest packet due to the nature of packet-oriented
> network, so the overhead per proxied UDP connection is 132kb (2 x 2kb
> goroutine stack + 2 x 64kb read buffer for largest UDP packet). Handling
> 10,000 UDP proxied connections requires at least 1.25GB memory, which is
> unnecessary if there's a way to poll I/O readiness and use a shared read
> buffer.

You will need 10K*64KiB*2 = 1.22GiB kernel bufferspace at a
minimum in any case. That is still one datagram's worth of
buffering per UDP socket so you can still lose some packets.
This is in addition to any user level buffering passd in to
recv() or read(). So you need at least 2.48GiB!

> I'm wondering if there's a better way other than calling
> syscall.Epoll/Kqueue to create custom poller?

I did such a custom poller for a network proxy. It was a pain
to get right (I only did the Eww!poll version). In my case all
the traffic was via tcp. It has been a while now but IIRC my
reason was not to save memory but to gain flexibility over Go
runtime's networking. I don't remember the details now though.

Rio

unread,
Mar 15, 2018, 11:58:10 AM3/15/18
to golang-nuts


On Thursday, March 15, 2018 at 3:37:51 AM UTC+8, Ian Lance Taylor wrote:

Even for TCP, that's an interesting point.  I wonder if we should have
a way to specify a number of bytes to read such that we only allocate
the []byte when there is something to read.

I was thinking a better approach is to split I/O readiness and actual I/O operations. Currently we combine the two in one step, e.g. 

    r.Read(buf) // blocks until r is readable, and then copy into buf

What if we could instead do

    for ok := runtime.WaitRead(r, timeout); ok {
        buf := bufPool.Get().([]byte) // get a buffer from a sync.Pool 
        r.Read(buf) // should not block here because r is readable
        process(buf) // perform application-specific actions on the read data
        bufPool.Put(buf) // return the buffer to sync.Pool so it could be reused by other goroutines blocked in similar manner
    }

The additional runtime.WaitRead function should be a rather simple change to expose the internal poller signal. In this model, the buffer is only needed when the read-ready signal comes.

Rio

unread,
Mar 15, 2018, 12:09:18 PM3/15/18
to golang-nuts

On Thursday, March 15, 2018 at 4:23:41 AM UTC+8, Bakul Shah wrote:

You will need 10K*64KiB*2 = 1.22GiB kernel bufferspace at a
minimum in any case.  That is still one datagram's worth of
buffering per UDP socket so you can still lose some packets.
This is in addition to any user level buffering passd in to
recv() or read(). So you need at least 2.48GiB!

Yeah, there's no way to avoid the kernel buffer waste. For the UDP socket I'm fine losing a few packets from time to time because it does not promise reliable delivery anyway.

 
I did such a custom poller for a network proxy. It was a pain
to get right (I only did the Eww!poll version). In my case all
the traffic was via tcp. It has been a while now but IIRC my
reason was not to save memory but to gain flexibility over Go
runtime's networking. I don't remember the details now though.

I tried and it was indeed a lot of work. So I'm trying to see if there's a middle ground that could keep using goroutine (2kb stack is acceptable) but avoid the data buffer waste. Please see my previous reply about exposing a tiny bit of information from the internal poller to address the issue. I'd like to see if there's any downside of that. Thanks! :) 

Ian Lance Taylor

unread,
Mar 15, 2018, 1:05:12 PM3/15/18
to Rio, golang-nuts
That's inherently racy, though. It's quite normal to have multiple
goroutines reading from the same socket. That is awkward if we split
up WaitRead and Read. Better to have a single wait-then-read
operation, as we do today.

Ian

Michael Jones

unread,
Mar 15, 2018, 1:24:38 PM3/15/18
to Ian Lance Taylor, Rio, golang-nuts
What about: "wait, then get a buffer from a pool, and return. client uses buffer, then restores it." The present API could be used if the input buffer was presumed to go directly to that pool if not null, or, if a null pool argument means what i suggest and a new ReadWithReturnedBufferToSubmitToYourSecretPool() entry point.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Michael T. Jones
michae...@gmail.com

Burak Serdar

unread,
Mar 15, 2018, 1:48:06 PM3/15/18
to Michael Jones, Ian Lance Taylor, Rio, golang-nuts
What about something like:

type TimedReader interface {
TimedRead(out []byte,timeout int) (int,error)
}

so r.TimedRead(buf,0) becomes a non-blocking read?
>> email to golang-nuts...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
>
>
> --
> Michael T. Jones
> michae...@gmail.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.

Bakul Shah

unread,
Mar 15, 2018, 3:02:01 PM3/15/18
to Ian Lance Taylor, Rio, golang-nuts
There is an old paper[1] about using a malloc like interface
for stream IO. The idea is ReadAlloc(n) returns a buffer of
upto n bytes, filled with data from the underlying stream or
file. This allowed the implementation to use mmap for files
or buffer management for streams. On the next call the buffer
returned by the previous call was assumed to be no longer in
use by the client. There was even ReadRealloc() &
ReadAllocAt()! The latter to reposition the file or stream.
For writes, a buffer was returned. On the next WriteAlloc() it
would be accepted to be written out (this decoupled the client
from actual IO and yet avoided extra copying). Final blocking
close() would push out any remaining writes to the actual
connection or file.

What Ian proposes makes me think this is a similar model.

[1] I finally remembered one name + web search brought up this
paper: "Exploiting the Advantages of Mapped Files for Stream
I/O" by Krieger, Stumm & Unrau. I used it as a model to
implement some C++ classes at RealNetworks. I used nonblocking
io & a select loop underneath.

Rio

unread,
Mar 15, 2018, 3:04:58 PM3/15/18
to golang-nuts


On Friday, March 16, 2018 at 1:05:12 AM UTC+8, Ian Lance Taylor wrote:

That's inherently racy, though.  It's quite normal to have multiple
goroutines reading from the same socket.  That is awkward if we split
up WaitRead and Read.  Better to have a single wait-then-read
operation, as we do today.

I might not understand the racy part completely. Could you please explain a bit more what would be wrong if multiple goroutines read from the same Reader? Is it when multiple goroutines will be woken up by WaitRead but then we don't know which one will actually Read? 

An alternative solution without introducing any new API could be that we somehow specify the behavior of Read(buf) when len(buf) == 0, so multiple goroutines can just do 

    r.Read(nil) // all goroutine blocked here till r is readable
    buf := bufPool.Get().([]byte) // get buf from sync.Pool
    r.Read(buf) // actual read here
    process(buf)
    bufPool.Put(buf)  // return buf to pool

This style should in theory avoid the racy question, right? 

Ian Lance Taylor

unread,
Mar 15, 2018, 3:26:55 PM3/15/18
to Rio, golang-nuts
On Thu, Mar 15, 2018 at 12:04 PM, Rio <m...@riobard.com> wrote:
>
> On Friday, March 16, 2018 at 1:05:12 AM UTC+8, Ian Lance Taylor wrote:
>>
>>
>> That's inherently racy, though. It's quite normal to have multiple
>> goroutines reading from the same socket. That is awkward if we split
>> up WaitRead and Read. Better to have a single wait-then-read
>> operation, as we do today.
>
>
> I might not understand the racy part completely. Could you please explain a
> bit more what would be wrong if multiple goroutines read from the same
> Reader? Is it when multiple goroutines will be woken up by WaitRead but then
> we don't know which one will actually Read?

Multiple goroutines will be woken up by WaitRead, they will all call
Read, only one of those Reads will succeed, the others will block.
For your case it may not matter but if there is other work for the
goroutine to do, the goroutine may expect that because WaitRead
succeeded, Read will not block, but that is not guaranteed. In effect
WaitRead is a trap: it means that Read may or may not block.


> An alternative solution without introducing any new API could be that we
> somehow specify the behavior of Read(buf) when len(buf) == 0, so multiple
> goroutines can just do
>
> r.Read(nil) // all goroutine blocked here till r is readable
> buf := bufPool.Get().([]byte) // get buf from sync.Pool
> r.Read(buf) // actual read here
> process(buf)
> bufPool.Put(buf) // return buf to pool
>
> This style should in theory avoid the racy question, right?

Still seems racy to me.

You may want to look at
https://github.com/golang/go/issues/15735#issuecomment-266574151 .

Ian

Alex Efros

unread,
Mar 16, 2018, 8:16:49 AM3/16/18
to golang-nuts
Hi!

On Thu, Mar 15, 2018 at 09:09:18AM -0700, Rio wrote:
> Yeah, there's no way to avoid the kernel buffer waste.

You can tune kernel buffer size to minimize it.

> > I did such a custom poller for a network proxy. It was a pain
> > to get right (I only did the Eww!poll version).
>
> I tried and it was indeed a lot of work.

There are some existing implementations which may helps a bit, for ex.
https://godoc.org/github.com/mailru/easygo/netpoll

--
WBR, Alex.
Reply all
Reply to author
Forward
0 new messages