Pattern for asynchronously writing data to files during http requests

3,578 views
Skip to first unread message

Manuel Kiessling

unread,
Aug 20, 2012, 6:57:17 AM8/20/12
to golan...@googlegroups.com
Hi all,

I'm currently implementing an http server application. It obviously serves http requests, also, I need to write data to the file system.

I wondered how I could make the server respond to http requests as quickly as possible, even with requests which result in data getting written to the file system. Nevertheless, I want to ensure that no two routines write to the same file in parallel, risking file data corruption.

My current approach is to trigger the file-writing within a goroutine in the function that handles the request. In pseudo code:


package main

func main() {

  writes := make(chan(string))

  go func() {
    for {
      data = <-writes
      // write data into file...
    }
  }()

  func handleRequest() {
    // Read data from http request into theData...
    go func() {
      writes <- theData
    }()
    // Respond http request
  }

}

This way, the response to the request would be served immediately, while the file writing takes place "in the background". File writes would be guaranteed to happen serial, because they are piped through the channel.

Of course, the client requesting the server and receiving a response doesn't have a guaranty that the file write was successful, because I don't wait with the response until the write finishes - however, that is not important for my use case.

Any flaws in my thought process? Is there a better way?

Regards,
-- 
 Manuel

tomwilde

unread,
Aug 20, 2012, 11:37:00 AM8/20/12
to golan...@googlegroups.com
As far as I know, you need to acquire a file write-lock in order to write to it. That's why you have these 'modes' when opening a file.
So parallel writing to the same file is actually impossible. Maybe there are exceptions, though.

andrey mirtchovski

unread,
Aug 20, 2012, 2:04:13 PM8/20/12
to tomwilde, golan...@googlegroups.com
> So parallel writing to the same file is actually impossible.

if you're careful you can avoid data races while spraying a file with
data from multiple goroutines, just make sure that two goroutines
don't execute an overlapping WriteAt. here's an example, a
more-or-less straightforward translation of Plan9's fcp (fast copy,
useful when the remote filesystem if accessed over a high-latency
connection): http://mirtchovski.com/go/cp

if you want to give it a test you can do:

$ dd if=/dev/urandom of=randtest bs=100m count=1
1+0 records in
1+0 records out
104857600 bytes transferred in 8.604219 secs (12186766 bytes/sec)
$ for i in `seq 1 100`; do
echo -n $i;
GOMAXPROCS=$i time ./cp -w=16 randtest randtest.out;
cmp randtest randtest.out;
if [ $? -gt 0 ]; then echo "bad copy at $GOMAXPROCS" fi;
done

Carlos Castillo

unread,
Aug 20, 2012, 2:38:54 PM8/20/12
to golan...@googlegroups.com
Two thoughts:
  1. As you have it, the purpose of the channel is to provide mutual exclusion access to the file, you could just use a sync/mutex instead.
  2. Right now, your current model doesn't allow requests to continue until the writes are finished, all you have done is forced all file writing to happen one at a time.

To increase concurrency you could use a buffered channel, then the writing goroutines won't block until the buffer is full.

writes := make(chan string, 32) // Buffer of 32 elements

Also, it's probably more idiomatic (but not essential) for your file-writer goroutine to use a for := range clause (see: http://golang.org/doc/effective_go.html#for and http://golang.org/ref/spec#For_statements), in a web-server this might not be as important, since you probably will never want to stop writing to the file.

roger peppe

unread,
Aug 21, 2012, 4:02:41 AM8/21/12
to Manuel Kiessling, golan...@googlegroups.com
This looks reasonable, but I'm not sure you want to do it this way.
If you are consistently serving requests at a rate faster than the
data can be written to disk, then you will could accumulate goroutines
waiting to send to the writer goroutine and eventually run out of memory.

Better, I think, would be to use a buffered channel (choose a
suitably big size) and do the send inline:
package main

func main() {
writes := make(chan(string), 1000)
go func() {
for data := range writes {
}
}()

func handleRequest() {
// Read data from http request into theData...
writes <- theData
// Respond http request
}
}

You might also speed up disk throughput by
using a bufio.Writer (again with some suitably chosen
buffer size), but this has the disadvantage that
records will not necessarily be written in whole
units, and if you mind about it, you'd want some
logic to make sure that the buffer is flushed occasionally.

Another technique you could use to save disk writes
without using bufio.Writer is to read all available data
and then write it. This would be an improvement only if the data
sent from each http request is relatively small.

func writer(writes <-chan []byte) {
var buf []byte
for {
data := <-writes
buf = append(buf, data)
// Accumulate data from any goroutines
// that are ready to send it (up to ~256KB), so that we
// use less disk writes.
drain:
for len(buf) < 256*1024 {
select {
case d := <-writes:
buf = append(buf, d)
default:
break drain
}
}
f.Write(buf)
buf = buf[:0]
}
}

Shuai Lin

unread,
Aug 21, 2012, 7:03:40 AM8/21/12
to golan...@googlegroups.com, Manuel Kiessling
I think use a buffered channel is a reasonable solution. On one hand, a buffered channel can guarantee the quick response to clients. On the other hand, with a big enough buffered channel, when the send operation is blocked, i.e, the buffered channel is full, it means that the workload of the server are too heavy.

在 2012年8月21日星期二UTC+8下午4时02分41秒,rog写道:
Reply all
Reply to author
Forward
0 new messages