Storing protocol buffers: binary vs. text

Tom Swirly

unread,

Dec 10, 2011, 2:05:39 PM12/10/11

to prot...@googlegroups.com

Hello, proto-people. I suspect some of you know me already...

I'm just finishing a moderately large project that makes extensive use of protocol buffers as a storage format for data files for a consumer desktop application. (The protos are working extremely well, of course, and I have a really slick object-oriented persistence mechanism with them that's really useful, but that's for another day).

I have a flag that lets me store the protocol buffers either as text (using the Print* and Parse* methods from google::protobuf::TextFormat) or serialized. It's of course much easier to keep them as text when I'm developing, and since the files are pretty tiny (though there are a lot of them) I'm thinking of keeping them as text files even for the first public release.

But this got me thinking. If I see a file I haven't seen before that might be either a binary proto or a text proto, why can't I try to parse it as text, and then if that fails, as binary?

Yes, yes, this has some spiritual dubiousness. Nothing in the proto definition precludes the idea that the binary form of one proto buffer cannot be the text form of another.

And there's certainly the case of the "empty file" - which could either be the text string representing the default protocol buffer, or the binary string representing that same protocol buffer. But in that case, I don't care.

But practically speaking, I don't see how this would not work. If I try to read a binary format as text, then Between the wire types and my protocol buffer field IDs (which are all less than 32), the text parsing has to run into an unprintable byte very soon and terminate...

Am I right? It's not a big deal if not...

Christopher Head

unread,

Dec 11, 2011, 2:47:57 PM12/11/11

to prot...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

If you're worried, just spec your format such that when you write a
binary-format file, you shove a magic unprintable signature on the
front. Now when you want to load data, try to parse it as binary first
(looking for the signature) and, if it doesn't appear, then try to
parse as text. Guaranteed no collisions.

Chris

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)

iEYEAREDAAYFAk7lCPAACgkQXUF6hOTGP7feyQCggg0jHPq495NPQXCVR8Rhw7C3
vmgAn0TiXDiYw1eW+6i14NB0j6RYtEcT
=h7/A
-----END PGP SIGNATURE-----

Daniel Wright

unread,

Dec 11, 2011, 4:12:07 PM12/11/11

to prot...@googlegroups.com

The main concern with text format is that it doesn't have nearly as good backwards- and forwards- compatibility as the binary format. E.g. what happens if you release your program, and then in a future update want to remove or rename a field? The new binary format code would have no trouble reading the existing data, but if the existing data was in text format it would be a problem.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/protobuf/-/yYNE3tPUtcgJ.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.

Reply all

Reply to author

Forward