Streaming RPC

983 views
Skip to first unread message

Andrew Gerrand

unread,
Feb 10, 2011, 2:10:31 PM2/10/11
to golang-dev, Noah Evans, ron minnich
I spent some time with Noah Evans at FOSDEM and we discussed adding to
the rpc package the ability to stream a response. His use case was in
the context of a large network of memory-constrained nodes that
receive large blocks of data (files), where they can't afford to store
them in memory.

In this situation there's also a large overhead to starting additional
TCP sessions, so his current approach is to make an RPC request that
initiates a file transfer and then to send the data on the network
connection directly. While this is happening the RPC package is
trusted not to talk over the connection, but for obvious reasons this
is far from perfect.

My straw man proposal (that I have implemented - will share code if
desired) works as follows.

A method of the form

func (t T) Name(args *Args, w io.Writer) (err os.Error)

is recognized as by Register a streaming RPC handler. That handler
method is expected to write to w and return.

To use this, the client now calls a new method on Client

func (client *Client) Stream(serviceMethod string, args interface{})
(io.Reader, os.Error)

If err is nil, the first argument will be a Reader that should be read
from until EOF.

See the end of this email for a working example.

As for the wire protocol, the server will send a gob'd Response
followed by a gob'd []byte. It does this until the RPC method has
returned, after which is sends a Response containing an error string
(if the RPC method returned with a non-nil error) followed by a []byte
of zero length. The client's Reader then returns 0, os.EOF.

I'd like some feedback on this API and wire protocol. There are a
number of ways of approaching this problem. Some choices I made:

- a Reader/Writer byte stream instead of a netchan-style stream of Go
values. I figured that if you want the latter you can just wrap the
Reader/Writer in a gob.Decoder/Encoder.
- at the wire level, instead of sending a Response followed by a
[]byte it could be some new type, eg Packet, which contains the same
fields as a Response and a []byte.
- instead of a new Stream method, the existing Call and Go methods
could instead be expected to take a pointer to an io.Reader value,
which then becomes the Reader. This would be more symmetrical but
slightly more verbose to use.

No doubt there are more aspects to this problem that I haven't yet
thought about. Your feedback most appreciated.

Andrew

---

type FileServer struct{}

func (s *FileServer) Get(filename *string, w io.Writer) (err os.Error) {
_, err := io.Copy(w, dummyData())
return err
}

func TestStream(t *testing.T) {
Register(&FileServer{})
l, err := net.Listen("tcp", "127.0.0.1:0")
if err != nil {
t.Fatal(err)
}
go Accept(l)

client, err := Dial("tcp", l.Addr().String())
if err != nil {
t.Fatal(err)
}
r, err := client.Stream("FileServer.Get", "foo.bar")
if err != nil {
t.Fatal(err)
}
var buf bytes.Buffer
_, err = buf.ReadFrom(r)
if err != nil {
t.Fatal(err)
}
if buf.String() != dummyData().String() {
t.Fatal("sent doesn't match received")
}
}

Rob 'Commander' Pike

unread,
Feb 10, 2011, 2:46:51 PM2/10/11
to Andrew Gerrand, golang-dev, Noah Evans, ron minnich
Streaming on top of RPC is a subtle, difficult problem to get right. I haven't seen it accomplished well yet, and am leaning towards the position that it's not possible. Your design does have the merit of simplicity, though.

Flow control. It's always about flow control.

Netchan actually sounds like a better match for his problem.

-rob

ron minnich

unread,
Feb 10, 2011, 2:58:33 PM2/10/11
to Rob 'Commander' Pike, Andrew Gerrand, golang-dev, Noah Evans
Streaming on top of RPC is one description of what we're doing. Maybe
a better one is that as part of the marshall/unmarshall, we'd like
some of the packet to end up in memory and some of it to end up in
files.

So let's recast the problem and see if that makes it more tractable. I
have a struct that looks like this:

struct whatever {
x, y, z int
name string
cmd string
cmddata []byte
}

I want to unmarshall the cmd into a file, not in-memory data. I have
an implementation of this that is working well (in one case a command
is a 1 Gbyte Windows disk image) but it would be nice if it were not
such a hack on top of gob.

I should mention that in the real thing there can be lots of
cmd/cmddata pairs. For now it looks like this:

struct whatever{
a, b, c int
files []os.fi /* fileinfo, which IIRC is in os */
totalbytes int
}

Once the whatever struct is read in, the code switches modes and sucks
in the data a bit at a time.

This is what I think got people locked onto the idea that it's a
stream. It need not be. It's just the easiest way I found to
unmarshall into files.

So, does recasting the problem in that way help? I want the unmarshall
process to direct some struct members to files? That's not really
streaming and maybe it is easier to manage.

thanks

ron

roger peppe

unread,
Feb 10, 2011, 4:05:47 PM2/10/11
to ron minnich, Noah Evans, Andrew Gerrand, Rob 'Commander' Pike, golang-dev

I just made a blog post in which I demonstrate a program that layers RPC on top of netchan. Maybe something like that could be sufficient here. It seems like a very flexible approach, although I have done no performance measurements at all.

http://rogpeppe.wordpress.com/2011/02/10/bidirectional-rpc-with-netchan/

roger peppe

unread,
Feb 10, 2011, 5:19:51 PM2/10/11
to ron minnich, Noah Evans, Andrew Gerrand, Rob 'Commander' Pike, golang-dev


On 10 Feb 2011 19:58, "ron minnich" <rmin...@gmail.com> wrote:
>
> Streaming on top of RPC is one description of what we're doing. Maybe
> a better one is that as part of the marshall/unmarshall, we'd like
> some of the packet to end up in memory and some of it to end up in
> files.
>
> So let's recast the problem and see if that makes it more tractable. I
> have a struct that looks like this:
>
> struct whatever {
>  x, y, z int
>  name string
>  cmd string
>  cmddata []byte
> }
>
> I want to unmarshall the cmd into a file, not in-memory data. I have
> an implementation of this that is working well (in one case a command
> is a 1 Gbyte Windows disk image) but it would be nice if it were not
> such a hack on top of gob.

Do you mean that cmddata holds a 1gb array, or that cmd is some kind of command that gets executed to produce the data?

ron minnich

unread,
Feb 10, 2011, 10:56:11 PM2/10/11
to roger peppe, Noah Evans, Andrew Gerrand, Rob 'Commander' Pike, golang-dev

I think I added to the confusion. Let me try again.

the 'cmd' is the name of a file. In version 1 of this protocol, which
was derived from older C code, the file was actually held in 'cmddata'
and we used gob to manage it all. That kind of worked ok for small
files of a few 10s of KB. But when people come to you and say 'I want
to move a gigabtye', well, that's not so ok.

So the struct was changed to essentially this:

struct command_and_files {
param1, param2, param3 int /* let's not worry about this part --
not germane to this discussion */
files []os.FileInfo
totalbytes int
}

My code uses the gob package to unpack this struct and and then my
code switches modes, and creates the files specified by the
os.FileInfo array, and uses the sizes in the FileInfo and the
totalbytes as guides on what to expect (the sum of the file sizes
should equal totalbytes; it's just a cheap check). The code takes over
the connection in essence and uses io.Copyn to move the data from the
stream to the file, file by file.

During discussions with Noah I said "So I'm switching modes to a kind
of streaming RPC". It was the wrong thing to say because it got our
minds stuck at the wrong point. All I really need is the ability to
marshall and unmarshall some parts of an RPC struct into files, not
memory.

It's important: even for not very large files and not many of them, I
saved a lot of virtual memory when I stopped bringing the data into
memory and then writing it into files, and the size of the program
dropped from 12 MB to about 7 MB. When you're running 500K to 5M VMs
with a very small allowable memory footprint these sizes matter quite
a bit. When the old code stored the file data into memory and then
wrote to files, it exercised the storage leak case and the programs
grew seemingly without limit. And, finally, people want to move
gigabyte files ...

Does that clear it up at all? Again, I have to wonder if there is not
some way to create RPC support that as part of sending data out takes
data from a file, not memory; and, on the receive side, does the
converse. That's what I'm doing right by hand now.

The performance of the current code is more than adequate: we're
limited by network and disk speeds, not the speed of the code. Go is a
real pleasure that way, it has a great runtime. The users are happy.

We're just wondering if there is a way to do this that is not quite so
specialized to our one program. It seems there ought to be.

thanks

ron

roger peppe

unread,
Feb 11, 2011, 8:43:26 AM2/11/11
to ron minnich, Noah Evans, Andrew Gerrand, Rob 'Commander' Pike, golang-dev
[...]

> We're just wondering if there is a way to do this that is not quite so
> specialized to our one program. It seems there ought to be.

how important is it to you that the file stream parts are
marshalled transparently along with the rest of the struct?

as i suggested earlier, it's easier to layer RPC on top of
a streaming protocol (in this case netchan) than the other
way around.

by way of illustration, i've written a program that does this.
the client can make an RPC request to ask for a list of files.
the server then streams the file data back over the same connection.

i've factored out some of the setting up code into a new package,
ncrpc, so that it doesn't seem so "special".

goinstall rog-go.googlecode.com/hg/cmd/rpcreader

it depends on a couple of changes i've recently made to the netchan
package, so unless they've been submitted, you'll need to
apply CL 4119055 to make the code work.

if this doesn't solve your problem, i haven't understood your problem
properly!

roger peppe

unread,
Feb 16, 2011, 9:50:03 AM2/16/11
to Andrew Gerrand, golang-dev, Noah Evans, ron minnich
after a little more discussion with ron, i have realised that it is quite
possible to do streaming RPC without layering it onto netchan
(although that remains a useful possibility)

with the aid of a cute little reflective package, typeapply,
it becomes quite straightforward.

proof of concept here: rog-go.googlecode.com/hg/exp/filemarshal

it can easily be turned into an rpc implementation:
rog-go.googlecode.com/hg/exp/filemarshal/rpc

typeapply itself (suggestion for a better name welcome) is
a pleasingly general tool. rog-go.googlecode.com/typeapply

this might be counted as abuse by some, but it
was fun to write. i've seen many packages that layer
a Reader onto a Reader, but none so far that piggyback
an Encoder onto an Encoder (it might be nice if that type
was defined somewhere generally known - package
"encoding" perhaps?)

PACKAGE

package typeapply
import "rog-go.googlecode.com/hg/typeapply"

FUNCTIONS

func Do(f interface{}, x interface{})
Do calls the function f, which must
be of the form func(T) for some type T, on
each publicly accessible member of x
and recursively on members of those.
It will fail with infinite recursion if the x
is cyclic.

Reply all
Reply to author
Forward
0 new messages