Conceptually this sounds great, the big question to me is whether this should be implemented as an option in the compiler or as a separate plugin. I haven't taken a thorough look at the patch, but I'd guess it adds a decent amount to the core code generator. I have a preference for the plugin approach, but of course I'm primarily an internal protobuf user, so I'm willing to be convinced otherwise :-) Would using a plugin, possibly even shipped with the standard implementation, make this feature too inconvenient to use? Or is there enough demand for this that it warrants implementing as an option?
Regarding the proposed interfaces: I can imagine some applications where the const refs passed to the visitor methods may be too restrictive - the user may instead want to take ownership of the object. e.g., suppose the stream is a series of requests, and each of the visitor handlers needs to start some asynchronous work. It would be good to hear if users have use cases that don't quite fit into this model (or at least if the existing use cases will work).
I guess the naming is confusing in the example. The Visit is per
field-name; but since the typed is named the same as the field in this
example, it is confusing.
> --
> You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
> To post to this group, send email to prot...@googlegroups.com.
> To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
>
>
I guess the naming is confusing in the example. The Visit is perfield-name; but since the typed is named the same as the field in this
example, it is confusing.
On Tue, Feb 1, 2011 at 3:17 PM, Jason Hsueh <jas...@google.com> wrote:
Conceptually this sounds great, the big question to me is whether this should be implemented as an option in the compiler or as a separate plugin. I haven't taken a thorough look at the patch, but I'd guess it adds a decent amount to the core code generator. I have a preference for the plugin approach, but of course I'm primarily an internal protobuf user, so I'm willing to be convinced otherwise :-) Would using a plugin, possibly even shipped with the standard implementation, make this feature too inconvenient to use? Or is there enough demand for this that it warrants implementing as an option?First of all, note that this feature is off by default. You have to turn it on with the generate_visitors message-level option. The only new code added to the base library is a couple templates in WireFormatLite, which are of course never instantiated if you don't generate visitor code.There are a few reasons I prefer to make this part of the base code generator:- If you look at the patch, you'll see that the code generation for the two Guide classes actually shares a lot with the code generation for MergeFromCodedStream and SerializeWithCachedSizes. To make this a plugin, either we'd have to expose parts of the C++ code generator internals publicly (eww) or we'd have to reproduce a lot of code (also eww).- The Reader and Writer classes directly use WireFormatLite, which is a private interface.
- It seems clear that this feature is widely desired by open source users. We're not talking about a niche use case here.Regarding the proposed interfaces: I can imagine some applications where the const refs passed to the visitor methods may be too restrictive - the user may instead want to take ownership of the object. e.g., suppose the stream is a series of requests, and each of the visitor handlers needs to start some asynchronous work. It would be good to hear if users have use cases that don't quite fit into this model (or at least if the existing use cases will work).Interesting point. In the Reader case, it's creating new objects, so in theory it ought to be able to hand off ownership to the Visitor it calls. But, the Walker is walking an existing object and thus clearly cannot give up ownership. It seems clear that some use cases need const references, which means that the only way we could support ownership passing is by adding another parallel set of methods. I suppose they could have default implementations that delegate to the const reference versions, in which case only people who wanted to optimize for them would need to override them. But I'd like to see that this is really desired first -- it's easy enough to add later.
Also note that my code currently doesn't reuse message objects, but improving it to do so would be straightforward. A Reader could allocate one object of each sub-message type for reuse. But, it seems like that wouldn't play well with ownership-passing.
* It seems to me that this will solve the problem for people who know
statically at compile time what types they need to handle from a
stream, so they can define the "stream type" appropriately. Will users
find themselves running into the case where they need to handle
"generic" messages, and end up needing to "roll their own" stream
support anyway?
I ask this question because I built my own RPC system on top of
protocol buffers, and in this domain it is useful to be able to pass
"unknown" messages around, typically as unparsed byte strings. Hence,
this streams proposal wouldn't be useful to me, so I'm just wondering:
am I an anomaly here, or could it be that many applications will find
themselves needing to handle "any" protocol buffer message in their
streams?
> The Visitor class has two standard implementations: "Writer" and
> "Filler". MyStream::Writer writes the visited fields to a
> CodedOutputStream, using the same wire format as would be used to
> encode MyStream as one big message.
Imagine I wanted a different protocol. Eg. I want something that
checksums each message, or maybe compresses them, etc. Will I need to
subclass MessageType::Visitor for each stream that I want to encode?
Or will I need to change the code generator? Maybe this is an unusual
enough need that the design doesn't need to be flexible enough to
handle this, but it is worth thinking about a little, since features
like being able to detect broken streams and "resume" in the middle
are useful.
Thanks!
Evan
I read this proposal somewhat carefully, and thought about it for a couple days.
* It seems to me that this will solve the problem for people who know statically at compile time what types they need to handle from a stream, so they can define the "stream type" appropriately. Will users find themselves running into the case where they need to handle "generic" messages, and end up needing to "roll their own" stream support anyway?
I ask this question because I built my own RPC system on top of protocol buffers, and in this domain it is useful to be able to pass "unknown" messages around, typically as unparsed byte strings. Hence, this streams proposal wouldn't be useful to me, so I'm just wondering: am I an anomaly here, or could it be that many applications will find themselves needing to handle "any" protocol buffer message in their streams?
Imagine I wanted a different protocol. Eg. I want something that checksums each message, or maybe compresses them, etc. Will I need to subclass MessageType::Visitor for each stream that I want to encode? Or will I need to change the code generator?
The Visitor class has two standard implementations: "Writer" and "Filler". MyStream::Writer writes the visited fields to a CodedOutputStream, using the same wire format as would be used to encode MyStream as one big message.
features like being able to detect broken streams and "resume" in the middle are useful.
This is what I do as well, as does protobuf-socket-rpc:
http://code.google.com/p/protobuf-socket-rpc/source/browse/trunk/proto/rpc.proto
I guess I was thinking that if you already have to do some sort of
"lookup" of the message type that is stored in that byte blob, then
maybe you don't need the streaming extension. For example, you could
just build a library that produces a sequence of byte strings, which
the "user" of the library can then parse appropriately.
I see how you are using it though: it is a friendly wrapper around
this simple "sequence of byte strings" model, that automatically
parses that byte string using the tag and "schema message." This might
be useful for some people.
> This is somewhat inefficient currently, as it will require an extra
> copy of all those bytes. However, it seems likely that future
> improvements to protocol buffers will allow "bytes" fields to share
> memory with the original buffer, which will eliminate this concern.
Ah cool. I was considering changing my protocol to be two messages:
the first one is the "descriptor" (eg. your CallRequest message), then
the second would be the "body" of the request, which I would then
parse based on the type passed in the CallRequest.
> Note that I expect people will generally only "stream" their top-
> level message. Although the proposal allows for streaming sub-
> messages as well, I expect that people will normally want to parse
> them into message objects which are handled whole. So, you only
> have to manually implement the top-level stream, and then you can
> invoke some reflective algorithm from there.
Right, but my concern is that I might want to use this streaming API
to write messages into files. In this case, I might have a file
containing the FooStream and another file containing the BarStream.
I'll have to implement both these ::Writer interfaces, or hack the
code generator to generate it for me. Although now that I think about
this, the implementation of these two APIs will be relatively trivial...
>> features like being able to detect broken streams and "resume" in
>> the middle are useful.
> I'm not sure how this relates. This seems like it should be handled
> at a lower layer, like in the InputStream -- if the connection is
> lost, it can re-establish and resume, without the parser ever
> knowing what happened.
Sorry, just an example of why you might want a different protocol. If
I've streamed 10e9 messages to disk, I don't want this stream to break
if there is some weird corruption in the middle, so I want some
protocol that can "resume" from corruption.
Evan
Sorry, just an example of why you might want a different protocol. If I've streamed 10e9 messages to disk, I don't want this stream to break if there is some weird corruption in the middle, so I want some protocol that can "resume" from corruption.
This may be a naive question, but wouldn't the format in text_format
be a prime example another "protocol"? It seems that if you are able
to reuse the vistor generate the text format, then it would be easily
extendable by others for json or the latest encoding of the week.. I
look forward to seeing it pushed into the tree.
thanks
-jeff
On Feb 8, 2:34 pm, Kenton Varda <ken...@google.com> wrote:
> On Tue, Feb 8, 2011 at 5:47 AM, Evan Jones <ev...@mit.edu> wrote:
>
> > The Visitor class has two standard implementations: "Writer" and
> >> "Filler". MyStream::Writer writes the visited fields to a
> >> CodedOutputStream, using the same wire format as would be used to encode
> >> MyStream as one big message.
>
> > Imagine I wanted a different protocol. Eg. I want something that checksums
> > each message, or maybe compresses them, etc. Will I need to subclass
> > MessageType::Visitor for each stream that I want to encode? Or will I need
> > to change the code generator?
>
> To do these things generically, we'd need to introduce some sort of
> equivalent of Reflection for streams. This certainly seems like it could be
> a useful addition to the family, but I wanted to get the basic functionality
> out there first and then see if this is needed.
>
> Note that I expect people will generally only "stream" their top-level
> message. Although the proposal allows for streaming sub-messages as well, I
> expect that people will normally want to parse them into message objects
> which are handled whole. So, you only have to manually implement the
> top-level stream, and then you can invoke some reflective algorithm from
> there.
>