optional bytes vs repeated bytes

6,046 views
Skip to first unread message

Saptarshi

unread,
Aug 24, 2009, 10:51:37 AM8/24/09
to Protocol Buffers
Hello,
Suppose I would like to store a type that could be a sequence of raw
bytes, so

message ...{

optional bytes sdata1=1; //A
repeated bytes sdata2=2; //B
optional BYT sdata3=3; //C
}

message BYT{
uint32 length=1;
bytes data=2;
}
Now I have a unsigned char * array which I wish to store in the
message.

My first approach was using (B), add_sdata2(array[i],i) (something
similar), but this is 3bytes per byte stored.

I then tried, option A, storing the entire set of data into sdata1
(which is actually a string, according to the generated protobuf
header files). But when it comes to reading it, how do i know the
number if bytes stored in the string? Suppose I my data looks like
0x00,0x00,0x00, what will be the length?

I am currently using option (c).

Have I missed something? Is there a better approach.
Thank you in advance
Saptarshi

jasonh

unread,
Aug 24, 2009, 2:46:26 PM8/24/09
to Protocol Buffers


On Aug 24, 7:51 am, Saptarshi <saptarshi.g...@gmail.com> wrote:
> Hello,
> Suppose I would like to store a type that could be a sequence of raw
> bytes, so
>
>     message ...{
>
>             optional bytes sdata1=1; //A
>             repeated bytes sdata2=2;  //B
>            optional BYT sdata3=3; //C
>
> }
>
> message BYT{
> uint32 length=1;
> bytes data=2;}
>
> Now I have a unsigned char * array which I wish to store in the
> message.
>
> My first approach was using (B), add_sdata2(array[i],i) (something
> similar), but this is 3bytes per byte stored.

This shouldn't be 3 bytes per byte, unless you are storing each byte
individually. If you only have a single string, you should just be
adding the entire thing as the repeated element. Also, with that
add_sdata2() call you are using the index as the length of the string.
Given const char* array[] = { "foo", "bar" }; you want add_sdata2(array
[i], strlen(array[i]));

>
> I then tried, option A, storing the entire set of data into sdata1
> (which is actually a string, according to the generated protobuf
> header files). But when it comes to reading it, how do i know the
> number if bytes stored in the string? Suppose I my data looks like
> 0x00,0x00,0x00, what will be the length?
>
> I am currently using option (c).

You should use option a or b: c adds additional overhead. Here is the
wire format for each of the options:

(a): <1-byte tag + wire type (00001010 for tag 1, length-delimited)>
<varint length><raw bytes of sdata1>
If your data contains three null characters, then you'll get
<0x0a><0x3><0x00><0x00><0x00>
When parsing, the size of the data will just be msg.sdata1().size() ==
3.
(b): has the same wire format as (a), except that you can encode
multiple byte arrays with the same tag. I.e., if you had:
unsigned char* array[] = { "foo", "bar", "quux" };
You could add each to the repeated bytes field, and each string would
get encoded to the wire with the format above:
<0x12><0x3>"foo"<0x12><0x3>"bar"<0x12><0x4>"quux"
(c): The wire format for nested messages already encodes the size, so
you are adding extra bytes of overhead by encoding the length
separately. You now have:
[tag for BYT field][BYT message length][tag for size][varint size][tag
for data][length of data][data]
Or for three null characters:
<0x1a><0x07><0x8><0x03><0x12><0x03><0x00><0x00><0x00>

Hope that helps,
Jason

Kenton Varda

unread,
Aug 24, 2009, 3:21:30 PM8/24/09
to Saptarshi, Protocol Buffers
In C++, "bytes" fields are stored using std::string.  This class has a size() method which returns the length of the string.  So, the length of sdata1 is:
  message.sdata1().size()
So you want to use option (A).

You would use option B if you wanted to store multiple independent byte strings, each one containing an arbitrary number of bytes.  There is never a reason to use option C.

Saptarshi Guha

unread,
Aug 24, 2009, 3:58:31 PM8/24/09
to Kenton Varda, Protocol Buffers
Hello,
Thank you! Being new to c++ i had no idea of the size method. I was
thinking of strlen on c_str() which would have given me the wrong
number of bytes.
Nice and clean now.
Regards
Saptarshi

Kenton Varda

unread,
Aug 24, 2009, 4:09:38 PM8/24/09
to sg...@purdue.edu, Protocol Buffers
Yes, unlike C-style char*, C++'s std::string can store strings that contain NUL bytes.  You should never use .c_str() in this case -- use .data() and .size() instead.

Sapsi

unread,
Aug 24, 2009, 6:18:48 PM8/24/09
to Kenton Varda, sg...@purdue.edu, Protocol Buffers
Thanks, I had no idea about the data method. 
Much to learn.
Regards
Saptarshi 

On Aug 24, 2009, at 4:09 PM, Kenton Varda <ken...@google.com> wrote:

Yshould never use .c_str() in this case -- use .data() and .size() instead.
Reply all
Reply to author
Forward
0 new messages