Preferred way to serialise to []byte

1,496 views
Skip to first unread message

wouter...@publica.duodecim.org

unread,
Sep 23, 2012, 10:29:11 PM9/23/12
to golan...@googlegroups.com

Hello,

I'm serialising a struct into a []byte slice (for base64-encoding into a
cookie, but that's not relevant). There are several ways to do this,
such as formatting the data into a string with fmt and converting that
to []byte, using append to construct the []byte slice, or using a
bytes.Buffer.

I guess using fmt and converting the resulting string to []byte looks
easier than manually constructing the []byte slice, but perhaps this is
the slowest method. I'm not sure how one should weigh the use of strings
vs. []byte slices.

(I'm ignoring solutions such as the gob package here because I'd like
the data to be accessible to other languages.)

I was wondering which method of serialising to (plain) text is the most
idiomatic.

Opinions?



Some sample code:

type TimeStamp struct {
Stamp time.Time
Seqno uint16
Host []byte
Hmac []byte
}


func (ts *TimeStamp) String() string {
return fmt.Sprintf("%d.%d %d %s",
ts.Stamp.Unix(), ts.Stamp.Nanosecond(), ts.Seqno, ts.Host)
}


func (ts *TimeStamp) Bytes() []byte {
epoch := make([]byte, 20) // unix(10) + '.' + nsec(9)
epoch = strconv.AppendInt(epoch, ts.Stamp.Unix(), 10)
epoch = append(epoch, '.')
epoch = strconv.AppendInt(epoch, int64(ts.Stamp.Nanosecond()), 10)

var buf bytes.Buffer
buf.Write(epoch)
buf.WriteByte(0x20)
buf.Write(strconv.AppendUint(nil, uint64(ts.Seqno), 10))
buf.WriteByte(0x20)
buf.Write(ts.Host)

return buf.Bytes()
}

Jonathan Pittman

unread,
Sep 24, 2012, 12:00:05 AM9/24/12
to wouter...@publica.duodecim.org, golan...@googlegroups.com
If you want relatively plain text, JSON usually looks rather nice and works with other languages.



--



Brian Slesinsky

unread,
Sep 24, 2012, 12:14:38 AM9/24/12
to golan...@googlegroups.com, wouter...@publica.duodecim.org
You've combined a few different questions. If you're asking which format to use, JSON is a good choice if you prefer plain text or want to communicate with JavaScript, or protocol buffers work well as a binary format for communicating with other server-side languages.

To write output to a byte stream efficiently, the idiom is to take an io.Writer as a parameter rather than returning a string or []byte. So if you're using fmt then fmt.Fprintf will be better than fmt.Sprintf.

If you want to make your type printable using the fmt package, you have the choice of implementing the Stringer, GoStringer, or Formatter interfaces in the fmt package. For large amounts of data, the Formatter interface will be more efficient since it gives you an io.Writer (the State type implements Writer) and avoids the intermediate conversion to a string.

- Brian

wouter...@publica.duodecim.org

unread,
Sep 24, 2012, 11:57:02 AM9/24/12
to golan...@googlegroups.com, Brian Slesinsky, jonathan.m...@gmail.com

First of all, thanks for your replies!

I'm generating a short UUID-like time stamp, so I think JSON might be
too verbose for this purpose.

I admit my question wasn't very concrete.

The fmt package makes it easy to get a string representation of a struct
– it's basically a one-liner (it feels higher-level).

However, many hash, encryption and encoding (e.g. base64) functions
quite logically take (or return) a []byte slice or buffer.

Some Go packages freely mix strings and []byte slices depending on
convenience, converting back and forth; especially converting fmt string
output to []byte.

My question is if it's ok to convert fmt string to []byte, or would it
be better to stick with []byte if that's what's needed down the line and
emulate fmt formatting by using append/appendInt/buffers directly on
[]byte, avoiding conversions and memory overhead.

Since there is a lot of overlap between strings and []byte (the
functions in their respective modules are pretty similar too), it's not
always clear which to prefer.

Kyle Lemons

unread,
Sep 24, 2012, 2:49:48 PM9/24/12
to wouter...@publica.duodecim.org, golan...@googlegroups.com, Brian Slesinsky, jonathan.m...@gmail.com
Usually if you are working with []byte, you'll have a buffer or stream into which you can Fprint and avoid the conversion



--



Brian Slesinsky

unread,
Sep 24, 2012, 3:23:14 PM9/24/12
to golan...@googlegroups.com, Brian Slesinsky, jonathan.m...@gmail.com, wouter...@publica.duodecim.org
> I'm generating a short UUID-like time stamp, so I think JSON might be too verbose for this purpose. 

Yes, that's true. JSON would be more appropriate if you had a collection of timestamps; you could store each timestamp in a string literal in the larger document.

My question is if it's ok to convert fmt string to []byte, or would it
be better to stick with []byte if that's what's needed down the line and
emulate fmt formatting by using append/appendInt/buffers directly on
[]byte, avoiding conversions and memory overhead.

In general, rather than appending to []byte, you should append to an io.Writer and let the caller decide what the destination should be; it might be a []byte but it could be a file or socket.
 
Since there is a lot of overlap between strings and []byte (the
functions in their respective modules are pretty similar too), it's not
always clear which to prefer.

There's nothing wrong with converting from string to byte[] - just remember that it does a copy. If performance matters to you, you'll want to avoid repeatedly copying the same data, which can result in an O(n^2) algorithm if you have a recursive data structure.

A timestamp converts to constant-size output so it's probably not that bad unless it's used a lot. Appending to an io.Writer will be more important for variable-sized output.

- Brian

wouter...@publica.duodecim.org

unread,
Sep 25, 2012, 6:52:15 PM9/25/12
to golan...@googlegroups.com, Kyle Lemons, Brian Slesinsky
On 24.09.2012 22:23, Brian Slesinsky wrote:
> [...]
>
> A timestamp converts to constant-size output so it's probably not
> that bad unless it's used a lot. Appending to an io.Writer will be
> more important for variable-sized output.
>

Yup. I guess I'd better go and benchmark one or two things... I've
already noticed buffers seem to be significantly slower than []byte
slices for small data.

Thanks for the help guys!

Reply all
Reply to author
Forward
0 new messages