github.com/golang/snappy implements the Snappy compression format. The
snappy.Writer type is:
----
type Writer struct {
// Has unexported fields.
}
func NewWriter(w io.Writer) *Writer
func (w *Writer) Reset(writer io.Writer)
func (w *Writer) Write(p []byte) (n int, errRet error)
----
This API admits for no internal buffering; there is no Flush or Close
method. This means that, out of the box, a Writer is not suitable for
many small writes, only for few large writes. In fact, due to the
framing overhead, the subsequent bytes on the wire can easily be
longer than the input bytes, defeating the whole purpose of using
compression.
Instead, just like when using an os.File, if you're making many small
writes, you should use the standard bufio package to wrap the
snappy.Writer.
At the very least, this is not obvious from the snappy package godoc,
and the easiest 'fix' is to add a note to those docs to suggest using
bufio if needed. Call this Option 0.
In the standard library, the compress/{flate,gzip,zlib} writers'
method sets are all Close + Flush + Reset + Write. The compress/lzw
writer's method set is Close + Write - it is literally an
io.WriteCloser. These writers all have their own internal buffering,
so are good for many small writes, and users are required to call
Close when done.
https://github.com/golang/snappy/pull/21 adds an internal buffer (and
Flush and Close methods) so that a snappy.Writer is good to use out of
the box, regardless of the frequency and shape of the writes. Call
this Option 1. However, this will silently break any existing users of
that package, and so I dismissed that suggestion.
On further thought, though, it is possible to make this internal
buffering opt-in instead of mandatory, so there is no such breakage.
Call this Option 2. There are a number of API possibilities, but one
concrete proposal is:
----
// NewWriter returns a new Writer that compresses to w.
//
// The Writer returned does not buffer writes. There is no need to Flush such a
// Writer.
//
// Deprecated: the Writer returned is not suitable for many small writes, only
// for few large writes. Use NewBufferedWriter instead, which is efficient
// regardless of the frequency and shape of the writes.
func NewWriter(w io.Writer) *Writer
// NewBufferedWriter returns a new Writer that compresses to w, using the
// framing format described at
//
https://github.com/google/snappy/blob/master/framing_format.txt
//
// The Writer returned buffers writes. Users must call Flush to guarantee all
// data has been forwarded to the underlying io.Writer.
func NewBufferedWriter(w io.Writer) *Writer
// Flush flushes the Writer to its underlying io.Writer.
func (w *Writer) Flush() error
// Reset and Write are as before.
----
Note that there is no Close method, only Flush. This is similar to a
bufio.Writer but unlike the other compress/* writers. The Snappy wire
format does not need an explicit EOF marker or trailing checksum, so I
don't think a snappy.Writer needs a Close method in addition to a
Flush method, but I am open to being convinced otherwise. Admittedly,
a Close method would give us a place to hook into a sync.Pool of byte
buffers, if we wanted such a place, but I don't think we do.
For completeness, an alternative to the NewBufferedWriter function is
adding a ... argument to NewWriter so that it would become
sw := snappy.NewWriter(w, snappy.Buffered)
or maybe
sw := snappy.NewWriter(w, snappy.BufferSize(8192))
while existing code continued to work (without Flush'ing):
sw := snappy.NewWriter(w)
but I prefer the separate NewBufferedWriter function. I don't
anticipate wanting any other options, including being able to choose
the internal buffer size. The other compress/* writers in the standard
library don't offer anything similar.
Long story short, I intend to implement Option 2 with the API proposed
above, but as I made the original API mistake, I am open to any
further suggestions.