Issue 689 in protobuf: Repeated fields do not serialize the 'size' into the file. Hence deserialize of repeated expects an unending sequence of repeated fields

993 views
Skip to first unread message

prot...@googlecode.com

unread,
Feb 17, 2015, 10:12:35 PM2/17/15
to prot...@googlegroups.com
Status: New
Owner: liu...@google.com
Labels: Type-Defect Priority-Medium

New issue 689 by narayan....@gmail.com: Repeated fields do not serialize
the 'size' into the file. Hence deserialize of repeated expects an unending
sequence of repeated fields
https://code.google.com/p/protobuf/issues/detail?id=689

What steps will reproduce the problem?
1. Create a MyStudent.proto with below contents
-----------------------------------------------
1 option optimize_for=SPEED;
2
3 package My;
4
5 message Course {
6 required string courseName = 1;
7 required int32 courseMarks = 2;
8 };
9
10 message Student {
11 required string rollNo = 1;
12 required string studentName = 2;
13 repeated Course Courses = 3;
14 };
15
16 enum RequestType {
17 ReadStudentRequest_e = 0;
18 WriteStudentRequest_e = 1;
19 DeleteStudentRequest_e = 3;
20 SetMarksRequest_e = 4;
21 GetMarksRequest_e = 5;
22 DeleteMarksRequest_e = 6;
23 };
24
25 enum ResponseType {
26 RequestSuccessful_e = 0;
27 RequestFailed_e = 1;
28 };
29
30 message Request
31 {
32 required RequestType requestType = 1; // The request type
33 optional Student student = 2; // For the use of
WriteStudentRequest_e
34 optional string rollNo = 3; // For the use of
ReadStudentRequest_e, DeleteStudentRequest_e,
35 //
SetMarksRequest_e, GetMarksRequest_e, DeleteMarksRequest_e
36 optional string courseName = 4; // For the use of
GetMarksRequest_e, SetMarksRequest_e, DeleteMarksRequest_e
37 optional int32 courseMarks = 5; // For the use of
SetMarksRequest_e
38 };
39
40 message Response
41 {
42 required RequestType requestType = 1; // The request type
43 required ResponseType responseType = 2; // The response type
44 optional string responseString = 3; // The response type
45 optional Student student = 4; // For the use of
ReadStudentRequest_e
46 optional string rollNo = 5; // For the use of
WriteStudentRequest_e, DeleteStudentRequest_e,
47 //
SetMarksRequest_e, GetMarksRequest_e, DeleteMarksRequest_e
48 optional string courseName = 6; // For the use of
GetMarksRequest_e, SetMarksRequest_e, DeleteMarksRequest_e
49 optional int32 courseMarks = 7; // For the use of
GetMarksRequest_e
50 };
-----------------------------------------------

2. Compile it:
-----------------------------------------------
$ protoc --cpp_out=$PWD Student.proto
-----------------------------------------------

3. In the file MyStudent.pb.cc, what we see is:
-----------------------------------------------
...
686 ::google::protobuf::uint8* Student::SerializeWithCachedSizesToArray(
687 ::google::protobuf::uint8* target) const {
688 // required string rollNo = 1;
689 if (has_rollno()) {
690 ::google::protobuf::internal::WireFormat::VerifyUTF8String(
691 this->rollno().data(), this->rollno().length(),
692 ::google::protobuf::internal::WireFormat::SERIALIZE);
693 target =
694 ::google::protobuf::internal::WireFormatLite::WriteStringToArray(
695 1, this->rollno(), target);
696 }
697
698 // required string studentName = 2;
699 if (has_studentname()) {
700 ::google::protobuf::internal::WireFormat::VerifyUTF8String(
701 this->studentname().data(), this->studentname().length(),
702 ::google::protobuf::internal::WireFormat::SERIALIZE);
703 target =
704 ::google::protobuf::internal::WireFormatLite::WriteStringToArray(
705 2, this->studentname(), target);
706 }
707
708 // repeated .My.Course Courses = 3;
709 for (int i = 0; i < this->courses_size(); i++) {
710 target = ::google::protobuf::internal::WireFormatLite::
711 WriteMessageNoVirtualToArray(
712 3, this->courses(i), target);
713 }
714
715 if (!unknown_fields().empty()) {
716 target
= ::google::protobuf::internal::WireFormat::SerializeUnknownFieldsToArray(
717 unknown_fields(), target);
718 }
719 return target;
720 }
...
-----------------------------------------------

What is the expected output? What do you see instead?
-----------------------------------------------
Notice that in line 708 of the above code-snippet from MyStudent.pb.cc, we
see that it writes only contents, but does not write the 'size' before
writing the contents. This makes subsequent 'parse' unsafe since it will
read infinite number of repeated elements (till end of file / array).
-----------------------------------------------

What version of the product are you using? On what operating system?
-----------------------------------------------
I am using the protobuf-devel version that comes with Fedora 21. Here is a
snippet from 'yum info protobuf':
Name : protobuf-devel
Arch : x86_64
Version : 2.5.0
Release : 11.fc21
Size : 963 k
Repo : installed
From repo : fedora
Summary : Protocol Buffers C++ headers and libraries
URL : http://code.google.com/p/protobuf/
License : BSD
Description : This package contains Protocol Buffers compiler for all
languages and
: C++ headers and libraries
-----------------------------------------------

Please provide any additional information below.
-----------------------------------------------
- N/A -

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

prot...@googlecode.com

unread,
Mar 10, 2015, 5:11:00 PM3/10/15
to prot...@googlegroups.com
Updates:
Status: WorkingAsIntended
Owner: xiaof...@google.com

Comment #1 on issue 689 by xiaof...@google.com: Repeated fields do not
serialize the 'size' into the file. Hence deserialize of repeated expects
an unending sequence of repeated fields
https://code.google.com/p/protobuf/issues/detail?id=689

This is by design. Protobuf fields are serialized as <tag, value> pairs.
The parsing routine will check the tag to decide whether a value belongs to
a repeated field and act accordingly. There could be a potentially infinite
number of such <tag, value> pairs in an input stream, but in practice, the
size of a message is usually already known before parsing. For example,
when you send a protobuf message over the wire, you'll first send the size
of the data before serialized bytes of the message.

prot...@googlecode.com

unread,
Mar 11, 2015, 9:33:16 AM3/11/15
to prot...@googlegroups.com

Comment #2 on issue 689 by narayan....@gmail.com: Repeated fields do not
serialize the 'size' into the file. Hence deserialize of repeated expects
an unending sequence of repeated fields
https://code.google.com/p/protobuf/issues/detail?id=689

Thank you for the response. This is what I am doing - i.e., adding the size
before sending the protobuf on the wire (file descriptor of a file or
socket) - then serializing-to-array and parsing-from-array.

Since there are many protobuf message structures like this in my college
project, I ended up writing a template methods as follows (where T is the
type of the protobuf object and My::Size itself is another protobuf object
with a single int64 field in it, named size).

The request here is : Can we do something better?

---------------------
The template function:
---------------------
namespace My
{
///
/// @brief
/// Handy template to read a protobuf message object
/// along with its byte size from a file descriptor.
///
/// @param[out] rObj
/// Reference to the protobuf object.
///
/// @param[in] iFd
/// The file descriptor.
///
template <class T> void ReadProtobufWithSizeFromFileDescriptor(T &rObj,
int iFd)
{
My::Size objSize;

objSize.set_size(0);

char *pByteArray = new char[objSize.ByteSize()];

if (!pByteArray)
{
ERRORprintf("Unable allocate memory.");
MY_THROW("Allocate Error.");
}

int iResult = read(iFd, pByteArray, objSize.ByteSize());

if (iResult < objSize.ByteSize())
{
ERRORprintf("Unable to read from file.");
delete [] pByteArray;
MY_THROW("File Read Error.");
}

// Get the object size first.
bool bResult = objSize.ParseFromArray(pByteArray,
objSize.ByteSize());
if (!bResult)
{
ERRORprintf("Unable to read from file.");
MY_THROW("File Read Error.");
}

delete [] pByteArray;

pByteArray = new char[objSize.size()];

if (!pByteArray)
{
ERRORprintf("Unable allocate memory.");
MY_THROW("Allocate Error.");
}

iResult = read(iFd, pByteArray, objSize.size());

if (iResult < objSize.size())
{
ERRORprintf("Unable to read from file.");
delete [] pByteArray;
MY_THROW("File Read Error.");
}

bResult = rObj.ParseFromArray(pByteArray, objSize.size());

if (!bResult)
{
ERRORprintf("Unable to read from file.");
delete [] pByteArray;
MY_THROW("File Read Error.");
}

delete [] pByteArray;
}

///
/// @brief
/// Handy template to read a protobuf message object
/// along with its byte size from a file descriptor.
///
/// @param[out] rObj
/// Reference to the protobuf object.
///
/// @param[in] iFd
/// The file descriptor.
///
template <class T> void WriteProtobufWithSizeToFileDescriptor(T &rObj,
int iFd)
{
My::Size objSize;

objSize.set_size(rObj.ByteSize());

char *pByteArray = new char[objSize.ByteSize()];

if (!pByteArray)
{
ERRORprintf("Unable allocate memory.");
MY_THROW("Allocate Error.");
}

// Write the object size first.
bool bResult = objSize.SerializeToArray(pByteArray,
objSize.ByteSize());

if (!bResult)
{
ERRORprintf("Unable to write to file.");
MY_THROW("File Write Error.");
}

int iResult = write(iFd, pByteArray, objSize.ByteSize());

if (iResult < objSize.ByteSize())
{
ERRORprintf("Unable to write to file.");
delete [] pByteArray;
MY_THROW("File Write Error.");
}

delete [] pByteArray;

pByteArray = new char[objSize.size()];

if (!pByteArray)
{
ERRORprintf("Unable allocate memory.");
MY_THROW("Allocate Error.");
}

bResult = rObj.SerializeToArray(pByteArray, objSize.size());

if (!bResult)
{
ERRORprintf("Unable to write to file.");
delete [] pByteArray;
MY_THROW("File Write Error.");
}

iResult = write(iFd, pByteArray, objSize.size());

if (iResult < objSize.size())
{
ERRORprintf("Unable to write to file.");
delete [] pByteArray;
MY_THROW("File Write Error.");
}

delete [] pByteArray;
}
}

--------------------------

prot...@googlecode.com

unread,
Mar 11, 2015, 7:43:19 PM3/11/15
to prot...@googlegroups.com

Comment #3 on issue 689 by dwa...@gmail.com: Repeated fields do not
serialize the 'size' into the file. Hence deserialize of repeated expects
an unending sequence of repeated fields
https://code.google.com/p/protobuf/issues/detail?id=689

It sounds like you're looking for the Message.writeDelimitedTo and
Message.parseDelimitedFrom functions. Those are available in Java, but not
in C++ (see issue 472).

prot...@googlecode.com

unread,
Mar 11, 2015, 9:18:49 PM3/11/15
to prot...@googlegroups.com

Comment #4 on issue 689 by narayan....@gmail.com: Repeated fields do not
serialize the 'size' into the file. Hence deserialize of repeated expects
an unending sequence of repeated fields
https://code.google.com/p/protobuf/issues/detail?id=689

Looks likeit.
Reply all
Reply to author
Forward
0 new messages