question about SerializeToString output for cpp

1,456 views
Skip to first unread message

mistlike

unread,
Jun 11, 2010, 2:30:12 AM6/11/10
to Protocol Buffers
bool SerializeToString(string* output) const;: serializes the message
and stores the bytes in the given string. Note that the bytes are
binary, not text; we only use the string class as a convenient
container.


string output store binary data, and whether has "\0" in output
string?
such as if i want to use output->size() or strlen(output->c_str())
get its length.
i see many "\n" in binary data, but not sure it whether has "\0",
since if has "\0" may have some problem for string to store it.

Jason Hsueh

unread,
Jun 11, 2010, 11:54:41 AM6/11/10
to mistlike, Protocol Buffers
Yes, '\0' may appear in the binary format. output->size() will return the correct result: C++ string can store null characters without any issue. However, strlen(output->c_str()) and other calls that assume null-terminated C strings will not.


--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.


mistlike

unread,
Jun 11, 2010, 1:01:23 PM6/11/10
to Protocol Buffers
thanks, i try to use protocol buffer for hadoop c++ pipes, however, c+
+ pipes interface for hadoop may have problem because input for hadoop
mapper must line by line with "\n" between every one item for mapper
input.
i try to make following change:

data input for hadoop make following change,
"\n" --> "\\n"
"\0" -->"\\0"
"\\" -->"\\\\"

and while data transfer to mapper , then make reverse change, and use
protocol buffer ParseFromArray to parse buffer.

I am not sure is there other better solution for hadoop c++ pipes when
use protocol buffer.


On Jun 11, 11:54 pm, Jason Hsueh <jas...@google.com> wrote:
> Yes, '\0' may appear in the binary format. output->size() will return the
> correct result: C++ string can store null characters without any issue.
> However, strlen(output->c_str()) and other calls that assume null-terminated
> C strings will not.
>
> On Thu, Jun 10, 2010 at 11:30 PM, mistlike <mistl...@gmail.com> wrote:
> > bool SerializeToString(string* output) const;: serializes the message
> > and stores the bytes in the given string. Note that the bytes are
> > binary, not text; we only use the string class as a convenient
> > container.
>
> > string output store binary data, and whether has "\0" in output
> > string?
> > such as if i want to use output->size() or strlen(output->c_str())
> > get its length.
> > i see many "\n" in binary data, but not sure it whether has "\0",
> > since if has "\0" may have some problem for string to store it.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Protocol Buffers" group.
> > To post to this group, send email to prot...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > protobuf+u...@googlegroups.com<protobuf%2Bunsu...@googlegroups.com>
> > .

Henner Zeller

unread,
Jun 11, 2010, 1:18:46 PM6/11/10
to mistlike, Protocol Buffers
I have no idea about the hadoop pipes, but wouldn't it be easier to
just add an implementation of the pipes to be length delimited instead
of trying to mundge the binary data you're sending ? You'd probably
loose a lot of speed if you would edit the binary data afterwards.
After all, this is what the available source code for hadoop is good for :)

> To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages