Actually I've been looking at the bufio package to make sure I wasn't spouting nonsense here. Here's what I found:
1) a bufio.Writer's memory usage is always bounded. As far as I can see, it never resizes its internal buffer. I might've overlooked that though. A default bufio.Writer has a buffer size of 4096 bytes.
2) It simply forwards calls to the underlying writter, in this case the io.MultiWriter. The io.MultiWriter also simply ranges over the underlying writers one by one. If one of them blocks (is slow), everything is blocked (slow).
So using a bufio.Writer only has the advantage that writes have at least the size of the internal buffer (4096 by default). This is good if you don't want to make a lot of tiny writes (syscalls). Yet it doesn't help in this case, a slow device is a slow device, even with batched writes. Thus a bufio.Writer will only slightly delay the inevitable.
There are a few alternatives, not equally effective, here's what comes up off the top of my head
1) create a BackgroundWriter, which not only buffers things up like the bufio.Writer, but also writes out its stream to the underlying writer in a separate goroutine. Then wrap each io.Writer in a BackgroundWriter and then create a MultiWriter out of those.
2) create a ConcurrentMultiWriter that queues up a goroutine for each child Writer, and waits until all of them are done. At least then a slow device in the beginning of the child writer slice won't hold up everything.
I believe these approaches to be combinable, as well. The downside being the buffer micro-management in the BackgroundWriter, which will need multiple buffers (one in use, one being filled up) and such, and the creation of n * 2 goroutines. Though the mantra is that goroutines are very cheap.
If I've made a mistake, which is very likely, please correct me.