Question about parsing Protocal buffers

87 views
Skip to first unread message

ode

unread,
Mar 23, 2009, 4:06:49 AM3/23/09
to Protocol Buffers
hi,

I'm going to use protocol buffers in http post data, seems
SerializeToString can be used to generate binary string, but what if
the data is very large, is all data serialize in memory?

The following is the proto file I defined, It is used for file
upload.

/////////////////////////////////////
package fileupload;

message Range {
required uint64 start = 1;
required uint32 len = 2;
}

message Block {
required Range r = 1;
required bytes blk_hash = 2;
required bytes blk_data = 3;
}

message DifferUpload {
repeated Block blk = 1;
}

/////////////////////////////////////

Any solutions>

Thanks in advance

ode

unread,
Mar 23, 2009, 4:38:04 AM3/23/09
to Protocol Buffers
There is api in c++, ParseFromIstream, but is there any similar api in
python?

Kenton Varda

unread,
Mar 23, 2009, 2:14:20 PM3/23/09
to ode, Protocol Buffers
On Mon, Mar 23, 2009 at 1:38 AM, ode <fuj...@gmail.com> wrote:

There is api in c++, ParseFromIstream, but is there any similar api in
python?

No, there's no Python equivalent right now.

But, the parsed objects are bigger than the original serialized data, so if the original serialized data can't fit in memory, then the parsed objects definitely can't.  In general, protocol buffers are designed to encode small to medium-sized messages, generally less than 1MB (usually much less).  If your data is larger than that, you should split it up into multiple small messages and devise some higher-level container format to wrap them so you can parse one at a time.

In your case, you might try separating the messages from the payload.  That is, remove the blk_data field from Block, and instead write all of the data to the stream *after* the DifferUpload message.  Then on the receiving end, you can parse the whole protocol message first and then use it to write the data directly to the final destination as you read it.

Jim Sermersheim

unread,
Mar 23, 2009, 4:33:30 PM3/23/09
to Protocol Buffers
I was thinking about this limitation last week and wondered if it would be feasible to add a new value type of IOStream. In code, one would just get/set output/input streams. (in Java) Message.writeTo would switch between streaming the in-mem object data and the referenced input stream(s) to the output stream.  ToByte* and toString of course would still be subject to heap problems with really large data.

Jim

Kenton Varda wrote:
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To post to this group, send email to prot...@googlegroups.com
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---


--
Jim Sermersheim
Senior Software Engineer,
Applications Development
m: 801.380.8760
l: 801.424.5511
f: 801.293.3054
e: jserme...@fusionio.com

6350 S. 3000 E, 6th floor
Salt Lake City, UT 84121
www.fusionio.com

CONFIDENTIAL

This document and attachments contain information from Fusion-io, Inc. which is confidential and/or legally privileged.
The information is intended only for the use of the individual or entity named on this transmission.
If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or taking of any
action in reliance on the contents of this emailed information is strictly prohibited, and that the documents should be returned to Fusion-io, Inc. immediately.
In this regard, if you have received this email in error, please notify us by return email immediately.

Reply all
Reply to author
Forward
0 new messages