The serialized protocol buffer data is not self-describing. If you
have a stream of bytes generated by serializing some protocol buffer
type you have to know what type to expect to deserialize it.
> I mean, Google seems to have its code base organized well enough that
> you know that there are 12183 .proto files, but if protocol buffers
> start being used outside Google, people will end up with protocol
> buffer files for which the .proto files are simply lost
Then those people will be sad because it will be a difficult task to
recover that data unless the .proto files can be reconstructed.
Nathan
Are there any reflection capabilities?
That is, given a stream that
receives protocol buffer messages, is there any way of reconstructing
the .proto that was used to generate the messages on the stream?
Also, the Python usage seems to involve manually generating code in an
external file. I understand why that is the usual use case, but from
a Python API, I'd expect that I can load a .proto file dynamically or
even put the protocol buffer definition inline in the source code.
Did I just miss the functions to do this? Even a simple workaround
(invoking the external tool behind the scenes) would make development
much easier, since a lot of Python development simply does not involve
any kind of build process.
I am sorry, I do not intend this to be harsh, but wouldn't this be the
decision of your lab and not Google? Communicating all of the
possible use cases would either completely shutdown further
development due to all of the writing that would need to be done, or
would simply bog down the purpose of protobuf'ers which is to be
simple and usable in any imaginable way. Google already has their use
case, it's up to the users now to find interesting and effective ways
to expand upon that.
You can always fork a project and expand upon protobuf to fit your
needs. Especially since I am sure you or your lab wouldn't be the
only ones to benefit from such use cases.
--
# Curt Micol
I don't see how. A protocol buffer definition is a fairly smallASCII string, negligible compared to the total size of a typical
protocol buffer stream. For storage purposes, it only needs to be
inserted once at the beginning of the file. For self-describing
streams, it can be inserted every now and then into the stream. All
of this would be optional--the programmer could turn it on or off.
So, I think Google needs to decide whether archival storage and
support for machine learning and data mining are important future use
cases or not. Maybe that would be a good thing to communicate.
As I was saying: you do NOT need to embed the metadata with eachmessage. Since the .proto file describes the entire stream format,
all you ever need to do is embed the source text for the protocol
buffer definition once, as a string, at the beginning (if you want to
be able to "cut into" a stream, you might also embed it occasionally
in a stream with a sync token).
Yes and no. Google probably is doing machine learning and data mining
on protocol buffer streams, but given the design of protocol buffers
right now, you can't write general purpose machine learning tools that
treat the protocol buffer variables themselves as "columns".
As far as I can tell, all DynamicMessage lets me do is manipulatemessages whose type isn't available in my compilation unit, but the
message type itself still needs to be compiled and linked into my
program somewhere.
> We simply leave it up to your application to decide where to actually put this metadataThis basically means that I cannot build a tool that looks at other
people's protocol buffer message stream and finds the metadata.
How exactly does one construct a message via DescriptorProto?
Suppose I am trying to build a generic receiver that should handle 5
msg types (Foo1-Foo5) but I get the data in serialized form (a char
array). I would like to be able to determine the type dynamically and
instantiate the right Message object. Then I want to use reflection
to iterate through the fields. Is this possible?