Hello,I'm using the official proto3 cpp project.My organization is interested in using protocol buffers to exchange messages between services. We solve physics simulation problems and deal with a mix of structured metadata and large amounts of numerical data (on the order of 1-10GB).I ran some quick tests to investigate the feasibility of doing this with protobuf.message ByteContainer {string name = 1;bytes payload = 2;string other_data = 3;}What I found was surprising. Here are the relative serialization speeds of a bytes payload of 800 million bytes:
- resizing a std::vector<uint8_t> to 800,000,000: 416 ms
- memcpy of an initialized char* (named buffer) of the same size into that vector: 190ms
- byte_container.set_file(buffer, length): 1004ms
- serializing the protobuf: 2000ms
- deserializing the protobuf: 1800ms
I understand that protobufs are not intended for messages of this scale (the documentation warns to use messages under 1MB), and that protobufs must use some custom memory allocation that is optimized in a different direction.I think that for byte messages, it is reasonable to expect performance on the same order of magnitude of memcpy. This is the case with Avro (although we really really don't like the avro cpp API).Is this possible to fix in the proto library? If not for the general 'bytes' object, what if we add a tag like:bytes payload = 2; [huge]
Thanks!Mohamed KoubaaSoftware DeveloperANSYS Inc
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.
Hi Feng,I was using SerializeToOstream and ParseFromCodedStream. I had to use the SetTotalBytesLimit, which is not required by ParseFromArray. Does this mean that I can use a byte field with a greater size than INT_MAX with ParseFromArray?
Using SerializeToArray and ParseToArray, the performance has improved:serializing is at 700ms, and deserializing went down to 561ms. It is the same order of magnitude, which is a lot better.I tried with a ~2GB byte array to quickly estimate the scaling. Fortunately it looks to be linear! I wonder if the assignment step (set_payload) can also be made closer to memcpy.
- resize vector: 1050 ms
- memcy: 560 ms
- assigning the protobuf: ~2500 ms
- seralizing the protobuf: ~1500 ms
- deserializing the protobuf: ~1800 ms
It would be great to have some c++11 move semantics for this in a future version of the library.
I think this is better than the Aliasing option that you mention because that would require careful management of the lifetime of the memory being aliased.