What's the best way to stream data over multiple HTTP requests?

1,358 views
Skip to first unread message

Sam Ghods

unread,
May 9, 2014, 3:57:00 AM5/9/14
to golan...@googlegroups.com
In a quest to build a file upload proxy (http://goo.gl/AgCMc9), I've run into a situation where my go program needs to take an incoming file upload from a user and stream it in real time to N other HTTP servers. I can't wait until the user finishes sending his file - I need to stream the data to the HTTP servers as it comes in from the user. What I'd really like is an io.Writer interface to http.Client or http.Request, but I can't find the best way to do it.

The only way to send a POST body using the http package seems to be by assigning an io.ReadCloser to a new http.Request object. I could implement a ReadCloser that reads data off the file upload stream and feeds it to the http client (or better yet, pass the original file upload Request.Body directly into the new Request), but this doesn't work as well when I want to duplicate the stream to multiple file servers with throttling and monitoring of the transfers.

The ideal interface that I can imagine is an io.Writer interface to the TCP connections, where I copy a []byte for each HTTP server transfer. But if I do that, do I have to completely stop using the http package and manage the entire http connection myself?

Thanks!

--
Sam

Jesse McNelis

unread,
May 9, 2014, 5:33:34 AM5/9/14
to Sam Ghods, golang-nuts
On Fri, May 9, 2014 at 5:57 PM, Sam Ghods <s...@box.com> wrote:
> The only way to send a POST body using the http package seems to be by
> assigning an io.ReadCloser to a new http.Request object. I could implement a
> ReadCloser that reads data off the file upload stream and feeds it to the
> http client (or better yet, pass the original file upload Request.Body
> directly into the new Request), but this doesn't work as well when I want to
> duplicate the stream to multiple file servers with throttling and monitoring
> of the transfers.

You want,
http://golang.org/pkg/io/#Pipe
Write() to one end of the pipe and give the other end of the pipe to
the http.Request.

If you need to write to many of these you can put the write end of the pipe in a
http://golang.org/pkg/io/#MultiWriter
and write to that.

Sam Ghods

unread,
May 9, 2014, 6:14:55 AM5/9/14
to Jesse McNelis, golang-nuts
Yes, exactly what I was looking for. Crazy how much stuff is in the standard lib. Thanks!
--
Sam

Rui Ueyama

unread,
May 9, 2014, 12:51:48 PM5/9/14
to Sam Ghods, Jesse McNelis, golang-nuts
It's worth noting that PipeReader/PipeWriter are synchronous, and there's no buffering in the pipe. So upload throughput would be limited by download time of the other client if you copy traffic using it, and vice versa.

If you fan out incoming traffic to multiple clients using MultiWriter, throughput would be limited to the slowest client in the group, as there's also no buffering in MultiWriter.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nicolas Hillegeer

unread,
Jun 25, 2014, 8:21:24 AM6/25/14
to golan...@googlegroups.com, s...@box.com, jes...@jessta.id.au
This could perhaps be alleviated a bit by wrapping the MultiWriter in a bufio.Writer, no? Wrapping the constituent writers in a bufio.Writer might trade yet more memory for less slowdown, but then the flush method couldn't be used directly.

Rui Ueyama

unread,
Jun 25, 2014, 2:54:23 PM6/25/14
to Nicolas Hillegeer, golang-nuts, Sam Ghods, Jesse McNelis
On Wed, Jun 25, 2014 at 5:21 AM, Nicolas Hillegeer <nicolash...@gmail.com> wrote:
This could perhaps be alleviated a bit by wrapping the MultiWriter in a bufio.Writer, no?

I think so, and I'd do that for better network throughput. If you use bufio the memory consumption would be unbounded, but you can write your own buffering Reader/Writer that buffers up to some maximum bytes.

Nicolas Hillegeer

unread,
Jun 25, 2014, 6:19:57 PM6/25/14
to golan...@googlegroups.com, nicolash...@gmail.com, s...@box.com, jes...@jessta.id.au
Actually I've been looking at the bufio package to make sure I wasn't spouting nonsense here. Here's what I found:

1) a bufio.Writer's memory usage is always bounded. As far as I can see, it never resizes its internal buffer. I might've overlooked that though. A default bufio.Writer has a buffer size of 4096 bytes.
2) It simply forwards calls to the underlying writter, in this case the io.MultiWriter. The io.MultiWriter also simply ranges over the underlying writers one by one. If one of them blocks (is slow), everything is blocked (slow).

So using a bufio.Writer only has the advantage that writes have at least the size of the internal buffer (4096 by default). This is good if you don't want to make a lot of tiny writes (syscalls). Yet it doesn't help in this case, a slow device is a slow device, even with batched writes. Thus a bufio.Writer will only slightly delay the inevitable.

There are a few alternatives, not equally effective, here's what comes up off the top of my head

1) create a BackgroundWriter, which not only buffers things up like the bufio.Writer, but also writes out its stream to the underlying writer in a separate goroutine. Then wrap each io.Writer in a BackgroundWriter and then create a MultiWriter out of those.
2) create a ConcurrentMultiWriter that queues up a goroutine for each child Writer, and waits until all of them are done. At least then a slow device in the beginning of the child writer slice won't hold up everything.

I believe these approaches to be combinable, as well. The downside being the buffer micro-management in the BackgroundWriter, which will need multiple buffers (one in use, one being filled up) and such, and the creation of n * 2 goroutines. Though the mantra is that goroutines are very cheap.


If I've made a mistake, which is very likely, please correct me.

Rui Ueyama

unread,
Jun 25, 2014, 6:41:41 PM6/25/14
to Nicolas Hillegeer, golang-nuts, Sam Ghods, Jesse McNelis
Ah, I'm sorry, Nicolas you are right. bufio's memory consumption is bounded. I was confusing it with bytes.Buffer.

Nicolas Hillegeer

unread,
Jun 26, 2014, 5:42:13 AM6/26/14
to golan...@googlegroups.com, nicolash...@gmail.com, s...@box.com, jes...@jessta.id.au
No problem, I just wanted to be clear to Sam (OP). I'd also like to add that if one were to use such a thing as the BackgroundWriter I suggested, it would alter the semantics of Write (but not more so than a buffered writer). Basically one would only know if an error occurred after forcefully flushing (which would just wait for the goroutine to signal that it has nothing left to write, and check for errors). It's lost a touch of simplicity in the implementation, but that might be recuperated through simplified and performant usage.
Reply all
Reply to author
Forward
0 new messages