gob encoding of []byte

1,180 views
Skip to first unread message

John Graham-Cumming

unread,
Jan 4, 2012, 5:27:49 AM1/4/12
to golang-nuts
I was looking using WireShark at gob encoded data passing across a TCP
connection and as part of my code I pass a [20]byte (actually the
output of a SHA1 hash) as an identifier. I noticed that the encoding
isn't as efficient as it could be.

For example, when transmitting 5e:b8:e2:24:2e:eb:4f:
54:32:be:ee:c8:b8:73:f1:ab:0a:6b:71:fd actually on the wire is the
value: 5e:ff:b8:ff:e2:24:2e:ff:eb:4f:
54:32:ff:be:ff:ee:ff:c8:ff:b8:73:ff:f1:ff:ab:0a:6b:71:ff:fd. It
appears that each value that is greater than 7F has been encoded with
an additional byte (e.g. b8 is sent as ff:b8).

Looking into the gob code there's the following comment (which applies
because []byte is really []unit8):

// Unsigned integers have a two-state encoding. If the number is less
// than 128 (0 through 0x7F), its value is written directly.
// Otherwise the value is written in big-endian byte order preceded
// by the byte length, negated.

I can see how this is efficient for uintX where X > 8 but it's
unfortunate for uint8 because it makes the encoding larger than the
input for no (apparent) good reason.

Is there a good way for me to get around this restriction? I'm
thinking that perhaps not using [20]byte but using a string would
actually make my implementation more efficient.

John.

roger peppe

unread,
Jan 4, 2012, 5:59:36 AM1/4/12
to John Graham-Cumming, golang-nuts

you could make the encoding more efficient by using []byte rather than
[20]byte (but then you pay for the length prefix).
byte slices get special treatment, but byte arrays do not.

John Graham-Cumming

unread,
Jan 4, 2012, 9:32:53 AM1/4/12
to golang-nuts
Many thanks. I'll do that.

A small follow up question. I would like to be able to gob.Encode one
of n different structs and send them across the TCP connection and
have the other end gob.Decode any of the n and then determine what it
has received. What's the best way to achieve that?

John.

roger peppe

unread,
Jan 4, 2012, 10:07:00 AM1/4/12
to John Graham-Cumming, golang-nuts

two possible ways:

you could define a struct with a member for
each possible type:

e.g.
type A struct {
// ...
}

type B struct {
// ...
}

type Any struct {
A *A
B *B
// etc
}

enc.Encode(&Any{A: &A{...}})

this is the simplest method, and may be the most
efficient. the down side is it's a little bit awkward to go through
each field to work out which one is non-nil.

alternatively you could register each type and send an
interface:

gob.Register((*A)(nil))
gob.Register((*B)(nil))
type Any struct {
X interface{}
}
enc.Encode(Any{&A{...})

here's a more complete example of the latter:
http://play.golang.org/p/PGqi9Hwx-_

note that an interface value cannot be sent as a top level value,
so i've used a struct with an interface field instead.

John Graham-Cumming

unread,
Jan 4, 2012, 10:11:15 AM1/4/12
to golang-nuts
Thanks. The interface method is the most elegant, but I think I'm
going to go with the slightly uglier Any struct style.

John.

Rob 'Commander' Pike

unread,
Jan 4, 2012, 10:46:01 AM1/4/12
to roger peppe, John Graham-Cumming, golang-nuts

On Jan 4, 2012, at 7:07 AM, roger peppe wrote:

note that an interface value cannot be sent as a top level value,

why do you say this?

-rob

John Asmuth

unread,
Jan 4, 2012, 11:18:28 AM1/4/12
to golan...@googlegroups.com, roger peppe, John Graham-Cumming
Wouldn't the gob encoder just assume you meant whatever is contained by the interface you pass it? You can store an interface *in* an interface... can you?

roger peppe

unread,
Jan 4, 2012, 11:30:26 AM1/4/12
to Rob 'Commander' Pike, John Graham-Cumming, golang-nuts

because that is the case, i think.
for instance, this code fails: http://play.golang.org/p/iR3sM-S8mP
it's issue 2367 (closed).

John Asmuth

unread,
Jan 4, 2012, 11:47:57 AM1/4/12
to golan...@googlegroups.com, roger peppe, John Graham-Cumming


On Wednesday, January 4, 2012 11:18:28 AM UTC-5, John Asmuth wrote:
Wouldn't the gob encoder just assume you meant whatever is contained by the interface you pass it? You can store an interface *in* an interface... can you?

"You can store an interface..." -> "You can't store an interface..." 

Julian Phillips

unread,
Jan 4, 2012, 11:51:18 AM1/4/12
to roger peppe, Rob 'Commander' Pike, John Graham-Cumming, golang-nuts

But that code doesn't _send_ an interface value.

c.f. http://play.golang.org/p/zAfq83DQeO ...

> it's issue 2367 (closed).

Which is about receiving anything into an interface, not sending
interface values.

--
Julian

Paul Borman

unread,
Jan 4, 2012, 11:54:00 AM1/4/12
to roger peppe, Rob 'Commander' Pike, John Graham-Cumming, golang-nuts
Well, your code shows you can't receive into a top level interface, but you can still send a top level interface, which should not be surprising.

Paul Borman

unread,
Jan 4, 2012, 11:54:42 AM1/4/12
to roger peppe, Rob 'Commander' Pike, John Graham-Cumming, golang-nuts
Opps, that should have been http://play.golang.org/p/7olGK2z6CR
I forgot to hit the Share button.

roger peppe

unread,
Jan 4, 2012, 11:59:42 AM1/4/12
to Julian Phillips, Rob 'Commander' Pike, John Graham-Cumming, golang-nuts

you're right. i thought i'd tried the "send a pointer to an interface" approach,
but it seems not.

Rob 'Commander' Pike

unread,
Jan 4, 2012, 12:34:56 PM1/4/12
to roger peppe, John Graham-Cumming, golang-nuts

I see what you mean. If you used EncodeValue, it would work, but so does a wrapping layer.

-rob

Rob 'Commander' Pike

unread,
Jan 4, 2012, 12:36:10 PM1/4/12
to roger peppe, John Graham-Cumming, golang-nuts
It's not gob that's at fault here, just to be clear. It's the way interfaces work. You can't use Printf to print a "top-level" interface value either; it always extracts the concrete value within.

-rob

Alexey Borzenkov

unread,
Jan 6, 2012, 7:34:48 AM1/6/12
to roger peppe, John Graham-Cumming, golang-nuts

I wonder if it is not too late to actually fix this at the protocol
level, i.e. why not just send bytes as, well, actual bytes without any
variable-length encoding? Of course it would mean that old programs
cannot talk to new programs, but the actual patch that would be fixing
it is really simple (some tests that check for exact byte
representation need fixing too, though):

diff -r 341889fdcf45 src/pkg/encoding/gob/decode.go
--- a/src/pkg/encoding/gob/decode.go Thu Jan 05 09:44:25 2012 -0800
+++ b/src/pkg/encoding/gob/decode.go Fri Jan 06 16:32:35 2012 +0400
@@ -116,6 +116,14 @@
return x
}

+func (state *decoderState) decodeUint8() (x uint8) {
+ x, err := state.b.ReadByte()
+ if err != nil {
+ error_(err)
+ }
+ return x
+}
+
// decodeInt reads an encoded signed integer from state.r.
// Does not check for overflow.
func (state *decoderState) decodeInt() int64 {
@@ -126,6 +134,10 @@
return int64(x >> 1)
}

+func (state *decoderState) decodeInt8() int8 {
+ return int8(state.decodeUint8())
+}
+
// decOp is the signature of a decoding operator for a given type.
type decOp func(i *decInstr, state *decoderState, p unsafe.Pointer)

@@ -188,12 +200,8 @@
}
p = *(*unsafe.Pointer)(p)
}
- v := state.decodeInt()
- if v < math.MinInt8 || math.MaxInt8 < v {
- error_(i.ovfl)
- } else {
- *(*int8)(p) = int8(v)
- }
+ v := state.decodeInt8()
+ *(*int8)(p) = int8(v)
}

// decUint8 decodes an unsigned integer and stores it as a uint8 through p.
@@ -204,12 +212,8 @@
}
p = *(*unsafe.Pointer)(p)
}
- v := state.decodeUint()
- if math.MaxUint8 < v {
- error_(i.ovfl)
- } else {
- *(*uint8)(p) = uint8(v)
- }
+ v := state.decodeUint8()
+ *(*uint8)(p) = uint8(v)
}

// decInt16 decodes an integer and stores it as an int16 through p.
diff -r 341889fdcf45 src/pkg/encoding/gob/encode.go
--- a/src/pkg/encoding/gob/encode.go Thu Jan 05 09:44:25 2012 -0800
+++ b/src/pkg/encoding/gob/encode.go Fri Jan 06 16:32:35 2012 +0400
@@ -72,6 +72,13 @@
}
}

+func (state *encoderState) encodeUint8(x uint8) {
+ err := state.b.WriteByte(uint8(x))
+ if err != nil {
+ error_(err)
+ }
+}
+
// encodeInt writes an encoded signed integer to state.w.
// The low bit of the encoding says whether to bit complement the
(other bits of the)
// uint to recover the int.
@@ -85,6 +92,10 @@
state.encodeUint(uint64(x))
}

+func (state *encoderState) encodeInt8(i int8) {
+ state.encodeUint8(uint8(i))
+}
+
// encOp is the signature of an encoding operator for a given type.
type encOp func(i *encInstr, state *encoderState, p unsafe.Pointer)

@@ -158,19 +169,19 @@

// encInt8 encodes the int8 with address p.
func encInt8(i *encInstr, state *encoderState, p unsafe.Pointer) {
- v := int64(*(*int8)(p))
+ v := *(*int8)(p)
if v != 0 || state.sendZero {
state.update(i)
- state.encodeInt(v)
+ state.encodeInt8(v)
}
}

// encUint8 encodes the uint8 with address p.
func encUint8(i *encInstr, state *encoderState, p unsafe.Pointer) {
- v := uint64(*(*uint8)(p))
+ v := *(*uint8)(p)
if v != 0 || state.sendZero {
state.update(i)
- state.encodeUint(v)
+ state.encodeUint8(v)
}
}

It's not 1.0 yet, so maybe this can be changed?

roger peppe

unread,
Jan 6, 2012, 8:14:49 AM1/6/12
to Alexey Borzenkov, John Graham-Cumming, golang-nuts

one of the nice things about gob is that something encoded as an
integer of one size can be decoded as any other size of integer
(modulo overflows). so we can decode a uint8 into a uint16 for example.
i think that changing the representation of uint8 and int8 could break this.

Aram Hăvărneanu

unread,
Jan 6, 2012, 8:18:48 AM1/6/12
to Alexey Borzenkov, roger peppe, John Graham-Cumming, golang-nuts
Alexey Borzenkov wrote:
> I wonder if it is not too late to actually fix this at the protocol level

I was not aware there's something broken.

--
Aram Hăvărneanu

Alexey Borzenkov

unread,
Jan 6, 2012, 9:21:15 AM1/6/12
to roger peppe, John Graham-Cumming, golang-nuts
On Fri, Jan 6, 2012 at 5:14 PM, roger peppe <rogp...@gmail.com> wrote:
> one of the nice things about gob is that something encoded as an
> integer of one size can be decoded as any other size of integer
> (modulo overflows). so we can decode a uint8 into a uint16 for example.
> i think that changing the representation of uint8 and int8 could break this.

Ah, yes, you're right. Then what about replicating slice efficiency
only to arrays? Something like this:

diff -r fa49a85c5941 src/pkg/encoding/gob/decode.go
--- a/src/pkg/encoding/gob/decode.go Thu Jan 05 18:40:17 2012 -0800
+++ b/src/pkg/encoding/gob/decode.go Fri Jan 06 18:17:50 2012 +0400
@@ -581,6 +581,22 @@
dec.decodeArrayHelper(state, p, elemOp, elemWid, length, elemIndir, ovfl)
}

+// decodeUint8Array decodes a byte array and stores it through p,
that is, p points to the 0th element
+// The length is an unsigned integer preceding the elements. Even
though the length is redundant
+// (it's part of the type), it's a useful check and is included in
the encoding.
+func (dec *Decoder) decodeUint8Array(atyp reflect.Type, state
*decoderState, p uintptr, length, indir int) {
+ if indir > 0 {
+ p = allocate(atyp, p, 1) // All byt the last level has been
allocated by dec.Indirect
+ }
+ if n := state.decodeUint(); n != uint64(length) {
+ errorf("length mismatch in decodeUint8Array")
+ }
+ slice := *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{Data: p,
Len: length, Cap: length}))
+ if _, err := state.b.Read(slice); err != nil {
+ errorf("error decoding [%d]byte: %s", length, err)
+ }
+}
+
// decodeIntoValue is a helper for map decoding. Since maps are
decoded using reflection,
// unlike the other items we can't use a pointer directly.
func decodeIntoValue(state *decoderState, op decOp, indir int, v
reflect.Value, ovfl error) reflect.Value {
@@ -827,6 +843,13 @@
switch t := typ; t.Kind() {
case reflect.Array:
name = "element of " + name
+ elemKind := t.Elem().Kind()
+ if elemKind == reflect.Uint8 || elemKind == reflect.Int8 {
+ op = func(i *decInstr, state *decoderState, p unsafe.Pointer) {
+ state.dec.decodeUint8Array(t, state, uintptr(p), t.Len(), i.indir)
+ }
+ break
+ }
elemId := dec.wireType[wireId].ArrayT.Elem
elemOp, elemIndir := dec.decOpFor(elemId, t.Elem(), name, inProgress)
ovfl := overflow(name)
@@ -848,7 +871,8 @@

case reflect.Slice:
name = "element of " + name
- if t.Elem().Kind() == reflect.Uint8 {
+ elemKind := t.Elem().Kind()
+ if elemKind == reflect.Uint8 || elemKind == reflect.Int8 {
op = decUint8Slice
break
}
diff -r fa49a85c5941 src/pkg/encoding/gob/encode.go
--- a/src/pkg/encoding/gob/encode.go Thu Jan 05 18:40:17 2012 -0800
+++ b/src/pkg/encoding/gob/encode.go Fri Jan 06 18:17:50 2012 +0400
@@ -393,6 +393,17 @@
enc.freeEncoderState(state)
}

+// encodeUint8Array encodes the byte array whose 0th element is at p
+func (enc *Encoder) encodeUint8Array(b *bytes.Buffer, p uintptr, length int) {
+ slice := *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{Data: p,
Len: length, Cap: length}))
+ state := enc.newEncoderState(b)
+ state.fieldnum = -1
+ state.sendZero = true
+ state.encodeUint(uint64(length))
+ state.b.Write(slice)
+ enc.freeEncoderState(state)
+}
+
// encodeReflectValue is a helper for maps. It encodes the value v.
func encodeReflectValue(state *encoderState, v reflect.Value, op
encOp, indir int) {
for i := 0; i < indir && v.IsValid(); i++ {
@@ -562,7 +573,8 @@
// Special cases
switch t := typ; t.Kind() {
case reflect.Slice:
- if t.Elem().Kind() == reflect.Uint8 {
+ elemKind := t.Elem().Kind()
+ if elemKind == reflect.Uint8 || elemKind == reflect.Int8 {
op = encUint8Array
break
}
@@ -578,6 +590,14 @@
}
case reflect.Array:
// True arrays have size in the type.
+ elemKind := t.Elem().Kind()
+ if elemKind == reflect.Uint8 || elemKind == reflect.Int8 {
+ op = func(i *encInstr, state *encoderState, p unsafe.Pointer) {
+ state.update(i)
+ state.enc.encodeUint8Array(state.b, uintptr(p), t.Len())
+ }
+ break
+ }
elemOp, indir := enc.encOpFor(t.Elem(), inProgress)
op = func(i *encInstr, state *encoderState, p unsafe.Pointer) {
state.update(i)
diff -r fa49a85c5941 src/pkg/encoding/gob/encoder_test.go
--- a/src/pkg/encoding/gob/encoder_test.go Thu Jan 05 18:40:17 2012 -0800
+++ b/src/pkg/encoding/gob/encoder_test.go Fri Jan 06 18:17:50 2012 +0400
@@ -233,11 +233,12 @@
type Type5 struct {
A [3]string
B [3]byte
+ C [3]int8
}
type Type6 struct {
A [2]string // can't hold t5.a
}
- t5 := Type5{[3]string{"hello", ",", "world"}, [3]byte{1, 2, 3}}
+ t5 := Type5{[3]string{"hello", ",", "world"}, [3]byte{1, 128, 255},
[3]int8{1, -1, -128}}
var t5p Type5
if err := encAndDec(t5, &t5p); err != nil {
t.Error(err)

Or is it also desirable to be able to decode [n]uint8 into [n]uint16
or [n]uint32? (I don't know if it even does that though)

Alexey Borzenkov

unread,
Jan 6, 2012, 9:31:48 AM1/6/12
to Aram Hăvărneanu, roger peppe, John Graham-Cumming, golang-nuts
On Fri, Jan 6, 2012 at 5:18 PM, Aram Hăvărneanu <ara...@mgk.ro> wrote:
> Alexey Borzenkov wrote:
>> I wonder if it is not too late to actually fix this at the protocol level
>
> I was not aware there's something broken.

It might not be broken, but it sure is inefficient. Unless there's a
reason for using more than one byte per element in byte arrays (when
elements already always fit into one byte completely), and if we can
make it use one byte per element for byte arrays (as is currently done
for slices), why not fix it?

roger peppe

unread,
Jan 6, 2012, 9:46:12 AM1/6/12
to Alexey Borzenkov, John Graham-Cumming, golang-nuts
> Or is it also desirable to be able to decode [n]uint8 into [n]uint16
> or [n]uint32? (I don't know if it even does that though)

it does do that now: http://play.golang.org/p/faB8SgBKis

interestingly, trying to decode []byte into []uint causes a panic,
which is a bug (i just reported it as issue 2662)

i don't think this is a huge issue - the code works, and there's a
partial workaround
for the efficiency issue (use []byte not [n]byte).

Alexey Borzenkov

unread,
Jan 6, 2012, 10:03:15 AM1/6/12
to roger peppe, John Graham-Cumming, golang-nuts

Oh, you're right. I missed that there are type compatibility checks,
as well as that there are basic types for []byte and string, which
breaks after my patch. :) Looks like it's working as intended right
now...

Reply all
Reply to author
Forward
0 new messages