encoding/binary and strings

1,813 views
Skip to first unread message

pavolstartrek

unread,
Sep 2, 2011, 5:46:06 AM9/2/11
to golang-nuts
Hallo,

I have question: Why encoding/binary does not support strings? It
should be simple to save string as int64(len(s)) and then []byte(s).
Reading is same but inverse.

I made little patch for this and it works for me well as far as I can
say.

Is there problem that no one is interested in such functionality, no
one has time to implement it (in this case, I can do it) or I miss
some important caveats.

thx

Dave Cheney

unread,
Sep 2, 2011, 6:19:34 AM9/2/11
to pavolstartrek, golang-nuts
I think it is more complex than that, for example, what byte ordering is going to be applied to your int64 length?

Cheers

Dave

Sent from my iPad

Nigel Tao

unread,
Sep 2, 2011, 6:26:49 AM9/2/11
to Dave Cheney, pavolstartrek, golang-nuts
On 2 September 2011 20:19, Dave Cheney <da...@cheney.net> wrote:
> I think it is more complex than that, for example, what byte ordering is going to be applied to your int64 length?

Well, since it's encoding/binary, I'm guessing that he's already
either specifying BigEndian or LittleEndian.

A better question, I think, is why int64 and not int32, uint64, or
uint32? If I was going to pick one, though, I'd use the varint
encoding, instead of a fixed width encoding:
http://code.google.com/p/snappy-go/source/browse/varint/varint.go

pavolstartrek

unread,
Sep 2, 2011, 7:32:46 AM9/2/11
to golang-nuts
It is simple
encodin/binary
func Write(w io.Writer, order ByteOrder, data interface{}) os.Error

so existing implementtion already requires specification of Indian, so
simply lenght will be encoded by specified byteOrder. It is
completelly inline with existing implementtion. That pkg already
provides functionality to write/read int, int64, etc. All simple
types.
I cannot see problem here.

And the same applies to choice of int64. It does not need to be int64
but simple int as len(s) returns int. encoding/binary already contains
functionality to store int, no change here. As far as i know in is
int32 on 32b system and int64 on 64b system. sizeof int is calculated
by reflection package so again no issue here.
If I use int64 it is little more platform agnostic, but as encoding/
binary itself is not by design (sozeof int is calculated runtime), it
does not matter.

Extending encoding/binary to store strings can be extended to store
slices of particular type or even slices of interface{}, or maps
later.

\s

On Sep 2, 12:19 pm, Dave Cheney <d...@cheney.net> wrote:
> I think it is more complex than that, for example, what byte ordering is going to be applied to your int64 length?
>
> Cheers
>
> Dave
>
> Sent from my iPad
>

pavolstartrek

unread,
Sep 2, 2011, 7:36:06 AM9/2/11
to golang-nuts
it can be int as len(s) returns int. Int64 is little more architecture
agnostic from my point of view, but i have no objections to int.

encoding/binary package already contains functionality to store int,
so no need to reinvent wheel.

/s

On Sep 2, 12:26 pm, Nigel Tao <nigel...@golang.org> wrote:

roger peppe

unread,
Sep 2, 2011, 8:48:53 AM9/2/11
to pavolstartrek, golang-nuts
On 2 September 2011 12:36, pavolstartrek <pavols...@gmail.com> wrote:
> it can be int as len(s) returns int. Int64 is little more architecture
> agnostic from my point of view, but i have no objections to int.
>
> encoding/binary package already contains functionality to store int,
> so no need to reinvent wheel.

actually it does not - encoding/binary has only functionality to store
fixed sizes (for example int16, uint32) but not int.

it's for that reason that it's not a good idea to store strings indexes - they
are not fixed size values.

as for nigel's suggestion to use varint, i don't think that would be
in the spirit of the encoding/binary package which currently
uses all fixed-size values - important for some applications,
for instance when writing random-access data.

it might be nice to have a package that supported extensible
non-type-tagged data encoding, but i'm not sure that encoding/binary is the
right place for that.

BTW, what did you do about binary.TotalSize ?

pavolstartrek

unread,
Sep 2, 2011, 9:21:54 AM9/2/11
to golang-nuts
You are right, int is not there. That is reason why i cast it to
int64.

On Sep 2, 2:48 pm, roger peppe <rogpe...@gmail.com> wrote:
> On 2 September 2011 12:36, pavolstartrek <pavolstart...@gmail.com> wrote:
>
> > it can be int as len(s) returns int. Int64 is little more architecture
> > agnostic from my point of view, but i have no objections to int.
>
> > encoding/binary package already contains functionality to store int,
> > so no need to reinvent wheel.
>
> actually it does not - encoding/binary has only functionality to store
> fixed sizes (for example int16, uint32) but not int.
>
> it's for that reason that it's not a good idea to store strings indexes - they
> are not fixed size values.

I do not understand this. length of string is something like control
information only.

write extension
w is io.Writer
s is string
binary.write(w, binary.BigEndian, int64(len(s)))
w.write([]byte(s))

Read extension
r is io.Reader

var l int64 = 0
binary.Read(r, binary.BigEndian, &l)
var text []byte = make([]byte, l)
r.Read(text)
return string(text)

of course there is missing common sauce (error checking, etc) in this
snippets

>
> as for nigel's suggestion to use varint, i don't think that would be
> in the spirit of the encoding/binary package which currently
> uses all fixed-size values - important for some applications,
> for instance when writing random-access data.

I do not understand this. What is the idea? When string will be
implemented as null-terminated would it be ok?
By my understanding, string is fixed size, internally it is immutable
array of bytes. If it is not, then struct is not too (you iterate over
fields). And if not struct, slice definitelly is.
If there are implmented slices of fixed-size elements, why there is no
implementation for map[fixed-size]fixed-size.
Anyway sugested implementation of string format is at least stable.

I'm just courious, why there is such constraint if implementation is
so simple. May be i do not understand difference between fixed-size
and variable size.

>
> it might be nice to have a package that supported extensible
> non-type-tagged data encoding, but i'm not sure that encoding/binary is the
> right place for that.
>
> BTW, what did you do about binary.TotalSize ?

I simply add len(strring) + sizeof(int64). TotalSize is used only for
calculation of required size of buffer, so this comply with this.
Right now, I implement small database system. So I created this
extended version of binary and it works perfectly. And my
modifications works for strings in structs and slices too, without
problems (so far).
This is not big issue for me to have own package, I'm just courious
why it should not be implemented in standard package.

Russ Cox

unread,
Sep 2, 2011, 1:15:56 PM9/2/11
to pavolstartrek, golang-nuts
On Fri, Sep 2, 2011 at 05:46, pavolstartrek <pavols...@gmail.com> wrote:
> Why encoding/binary does not support strings?

Because there is not one obvious encoding,
as evidenced by the remainder of this thread.

Russ

Martin Charles

unread,
Jun 16, 2015, 9:47:35 PM6/16/15
to golan...@googlegroups.com
Why do we need to worry about how to store the length of the string? Why not simply null terminate it like C does?

Dan Kortschak

unread,
Jun 16, 2015, 9:51:34 PM6/16/15
to Martin Charles, golan...@googlegroups.com
On Tue, 2015-06-16 at 18:47 -0700, Martin Charles wrote:
> Why do we need to worry about how to store the length of the string?
> Why not simply null terminate it like C does?
>
Say you are using encoding/binary to deal with stream data from an
untrusted source and someone sends a stream with no 0.

Martin Charles

unread,
Jun 16, 2015, 10:53:58 PM6/16/15
to Dan Kortschak, golan...@googlegroups.com

In that case I think using something like the json packages rawmessage api would be appropriate.

Dan Kortschak

unread,
Jun 16, 2015, 11:03:00 PM6/16/15
to Martin Charles, golan...@googlegroups.com
On Wed, 2015-06-17 at 02:53 +0000, Martin Charles wrote:
> In that case I think using something like the json packages rawmessage
> api would be appropriate.

And crafted packets? The whole point of not using zero terminated
strings is that it rules out a class of bugs and attacks.

Rob Pike

unread,
Jun 16, 2015, 11:49:05 PM6/16/15
to Dan Kortschak, Martin Charles, golan...@googlegroups.com
That, and the fact that strlen is implemented as a MOV instruction (O(1)) rather than a loop (O(n)).

-rob



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages