Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion New protobuf feature proposal: Generated classes for streaming / visitors

Received: by 10.100.123.7 with SMTP id v7mr2146429anc.45.1296602267121;
        Tue, 01 Feb 2011 15:17:47 -0800 (PST)
X-BeenThere: protobuf@googlegroups.com
Received: by 10.100.26.21 with SMTP id 21ls5119anz.2.p; Tue, 01 Feb 2011
 15:17:43 -0800 (PST)
Received: by 10.100.136.10 with SMTP id j10mr1370017and.56.1296602263528;
        Tue, 01 Feb 2011 15:17:43 -0800 (PST)
Received: by 10.100.136.10 with SMTP id j10mr1370016and.56.1296602263485;
        Tue, 01 Feb 2011 15:17:43 -0800 (PST)
Return-Path: <jas...@google.com>
Received: from smtp-out.google.com (wpay13.hot.corp.google.com [172.24.198.13])
        by gmr-mx.google.com with ESMTPS id f5si8134685anh.2.2011.02.01.15.17.42
        (version=TLSv1/SSLv3 cipher=RC4-MD5);
        Tue, 01 Feb 2011 15:17:42 -0800 (PST)
Received-SPF: pass (google.com: domain of jas...@google.com designates 172.24.198.13 as permitted sender)
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of jas...@google.com designates 172.24.198.13 as permitted sender) smtp.mail=jas...@google.com; dkim=pass (test mode) header...@google.com
Received: from wpaz29.hot.corp.google.com (wpaz29.hot.corp.google.com [172.24.198.93])
	by smtp-out.google.com with ESMTP id p11NHgr2023497
	for <protobuf@googlegroups.com>; Tue, 1 Feb 2011 15:17:42 -0800
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=google.com; s=beta;
	t=1296602262; bh=WvIlt2m1BTV6V/4UjbL8I93dtf0=;
	h=MIME-Version:In-Reply-To:References:Date:Message-ID:Subject:From:
	 To:Cc:Content-Type;
	b=HkhF598H1ck1UiU9XUFx/7xrL2j9VpAfbXBwHfPz/cf9ulQFt7V8cUiJpZPHp2cMG
	 K6KrIYLQGsLEp8fg1E29w==
Received: from iwn6 (iwn6.prod.google.com [10.241.68.70])
	by wpaz29.hot.corp.google.com with ESMTP id p11NGdPf014706
	(version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT)
	for <protobuf@googlegroups.com>; Tue, 1 Feb 2011 15:17:41 -0800
Received: by iwn6 with SMTP id 6so6733745iwn.1
        for <protobuf@googlegroups.com>; Tue, 01 Feb 2011 15:17:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=beta;
        h=domainkey-signature:mime-version:in-reply-to:references:date
         :message-id:subject:from:to:cc:content-type;
        bh=pe77R0pEavfPh4LT1wOh28OQAx62L3mwjMpDuVotUlM=;
        b=eG4z0rXnsaA2SneWH4Wfgl3q1mzEB/t7rMK79NPpjtN7XfrI3qoQpuNTH3kyNk80Xe
         HGz+BT6nsDB3uxX15uCA==
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=google.com; s=beta;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :cc:content-type;
        b=JDOE3uBBIhd+Za4Slmxpp2BijyOLWlK/A8ANlcRJUxOzqTBmIvGxY9GWpgqtUA5t4n
         i7CkYI4M6s+jWmb2b1qQ==
MIME-Version: 1.0
Received: by 10.231.157.211 with SMTP id c19mr1447093ibx.172.1296602260912;
 Tue, 01 Feb 2011 15:17:40 -0800 (PST)
Received: by 10.231.10.68 with HTTP; Tue, 1 Feb 2011 15:17:40 -0800 (PST)
In-Reply-To: <AANLkTinHFPd6XJFWU9S7MrtHPMAChXp5_t4j-40HX...@mail.gmail.com>
References: <AANLkTinHFPd6XJFWU9S7MrtHPMAChXp5_t4j-40HX...@mail.gmail.com>
Date: Tue, 1 Feb 2011 15:17:40 -0800
Message-ID: <AANLkTi=VK7ZEEo5q8JDcUxWn7YSbnnGAdPza0b5qw...@mail.gmail.com>
Subject: Re: New protobuf feature proposal: Generated classes for streaming / visitors
From: Jason Hsueh <jas...@google.com>
To: Kenton Varda <ken...@google.com>
Cc: Protocol Buffers <protobuf@googlegroups.com>,
        Pherl Liu <liuj...@google.com>, Steven Knight <s...@google.com>
Content-Type: multipart/alternative; boundary=001636d34bc82d3c35049b40bdf4
X-System-Of-Record: true

--001636d34bc82d3c35049b40bdf4
Content-Type: text/plain; charset=ISO-8859-1

Conceptually this sounds great, the big question to me is whether this
should be implemented as an option in the compiler or as a separate plugin.
I haven't taken a thorough look at the patch, but I'd guess it adds a decent
amount to the core code generator. I have a preference for the plugin
approach, but of course I'm primarily an internal protobuf user, so I'm
willing to be convinced otherwise :-) Would using a plugin, possibly even
shipped with the standard implementation, make this feature too inconvenient
to use? Or is there enough demand for this that it warrants implementing as
an option?

Regarding the proposed interfaces: I can imagine some applications where the
const refs passed to the visitor methods may be too restrictive - the user
may instead want to take ownership of the object. e.g., suppose the stream
is a series of requests, and each of the visitor handlers needs to start
some asynchronous work. It would be good to hear if users have use cases
that don't quite fit into this model (or at least if the existing use cases
will work).

On Tue, Feb 1, 2011 at 10:45 AM, Kenton Varda <ken...@google.com> wrote:

> Hello open source protobuf users,
>
> *Background*
>
> Probably the biggest deficiency in the open source protocol buffers
> libraries today is a lack of built-in support for handling streams of
> messages.  True, it's not too hard for users to support it manually, by
> prefixing each message with its size as described here:
>
>
> http://code.google.com/apis/protocolbuffers/docs/techniques.html#streaming
>
> However, this is awkward, and typically requires users to reach into the
> low-level CodedInputStream/CodedOutputStream classes and do a lot of work
> manually.  Furthermore, many users want to handle streams
> of heterogeneous message types.  We tell them to wrap their messages in an
> outer type using the "union" pattern:
>
>   http://code.google.com/apis/protocolbuffers/docs/techniques.html#union
>
> But this is kind of ugly and has unnecessary overhead.
>
> These problems never really came up in our internal usage, because inside
> Google we have an RPC system and other utility code which builds on top of
> protocol buffers and provides appropriate abstraction. While we'd like to
> open source this code, a lot of it is large, somewhat messy, and highly
> interdependent with unrelated parts of our environment, and no one has had
> the time to rewrite it all cleanly (as we did with protocol buffers itself).
>
> *Proposed solution:  Generated Visitors*
>
> I've been wanting to fix this for some time now, but didn't really have a
> good idea how.  CodedInputStream is annoyingly low-level, but I couldn't
> think of much better an interface for reading a stream of messages off the
> wire.
>
> A couple weeks ago, though, I realized that I had been failing to consider
> how new kinds of code generation could help this problem.  I was trying to
> think of solutions that would go into the protobuf base library, not
> solutions that were generated by the protocol compiler.
>
> So then it became pretty clear:  A protobuf message definition can also be
> interpreted as a definition for a streaming protocol.  Each field in the
> message is a kind of item in the stream.
>
>   // A stream of Foo and Bar messages, and also strings.
>   message MyStream {
>     option generate_visitors = true;  // enables generation of streaming
> classes
>     repeated Foo foo = 1;
>     repeated Bar bar = 2;
>     repeated string baz = 3;
>   }
>
> All we need to do is generate code appropriate for treating MyStream as a
> stream, rather than one big message.
>
> My approach is to generate two interfaces, each with two provided
> implementations.  The interfaces are "Visitor" and "Guide".
>  MyStream::Visitor looks like this:
>
>   class MyStream::Visitor {
>    public:
>     virtual ~Visitor();
>
>     virtual void VisitFoo(const Foo& foo);
>     virtual void VisitBar(const Bar& bar);
>     virtual void VisitBaz(const std::string& baz);
>   };
>
> The Visitor class has two standard implementations:  "Writer" and "Filler".
>  MyStream::Writer writes the visited fields to a CodedOutputStream, using
> the same wire format as would be used to encode MyStream as one big message.
>  MyStream::Filler fills in a MyStream message object with the visited
> values.
>
> Meanwhile, Guides are objects that drive Visitors.
>
>   class MyStream::Guide {
>    public:
>     virtual ~Guide();
>
>     // Call the methods of the visitor on the Guide's data.
>     virtual void Accept(MyStream::Visitor* visitor) = 0;
>
>     // Just fill in a message object directly rather than use a visitor.
>     virtual void Fill(MyStream* message) = 0;
>   };
>
> The two standard implementations of Guide are "Reader" and "Walker".
>  MyStream::Reader reads items from a CodedInputStream and passes them to the
> visitor.  MyStream::Walker walks over a MyStream message object and passes
> all the fields to the visitor.
>
> To handle a stream of messages, simply attach a Reader to your own Visitor
> implementation.  Your visitor's methods will then be called as each item is
> parsed, kind of like "SAX" XML parsing, but type-safe.
>
> *Nonblocking I/O*
>
> The "Reader" type declared above is based on blocking I/O, but many users
> would prefer a non-blocking approach.  I'm less sure how to handle this, but
> my thought was that we could provide a utility class like:
>
>   class NonblockingHelper {
>    public:
>     template <typename MessageType>
>     NonblockingHelper(typename MessageType::Visitor* visitor);
>
>     // Push data into the buffer.  If the data completes any fields,
>     // they will be passed to the underlying visitor.  Any left-over data
>     // is remembered for the next call.
>     void PushData(void* data, int size);
>   };
>
> With this, you can use whatever non-blocking I/O mechanism you want, and
> just have to push the data into the NonblockingHelper, which will take care
> of calling the Visitor as necessary.
>
> *C++ implementation*
>
> I've written up a patch implementing this for C++ (not yet including the
> nonblocking part):
>
>   http://codereview.appspot.com/4077052
>
> *Feedback*
>
> What do you think?
>
> I know I'm excited to use this in some of my own side projects (which is
> why I spent my weekend working on it), but before adding this to the
> official implementation we should make sure it is broadly useful.
>

--001636d34bc82d3c35049b40bdf4
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Conceptually this sounds great, the big question to me is whether this shou=
ld be implemented as an option in the compiler or as a separate plugin. I h=
aven&#39;t taken a thorough look at the patch, but I&#39;d guess it adds a =
decent amount to the core code generator. I have a preference for the plugi=
n approach, but of course I&#39;m primarily an internal protobuf user, so I=
&#39;m willing to be convinced otherwise :-) Would using a plugin, possibly=
 even shipped with the standard implementation, make this feature too incon=
venient to use? Or is there enough demand for this that it warrants impleme=
nting as an option?<div>



<br></div><div>
Regarding the proposed interfaces: I can imagine some applications where th=
e const refs passed to the visitor methods may be too restrictive - the use=
r may instead want to take ownership of the object. e.g., suppose the strea=
m is a series of requests, and each of the visitor handlers needs to start =
some asynchronous work. It would be good to hear if users have use cases th=
at don&#39;t quite fit into this model (or at least if the existing use cas=
es will work).</div>



<div><br><div class=3D"gmail_quote">On Tue, Feb 1, 2011 at 10:45 AM, Kenton=
 Varda <span dir=3D"ltr">&lt;<a href=3D"mailto:ken...@google.com" target=3D=
"_blank">ken...@google.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hello open source protobuf users,<div><br></=
div><div><b>Background</b></div><div><br></div><div>Probably the biggest de=
ficiency in the open source protocol buffers libraries today is a lack of b=
uilt-in support for handling streams of messages. =A0True, it&#39;s not too=
 hard for users to support it manually, by prefixing each message with its =
size as described here:</div>











<div><br></div><div>=A0=A0<a href=3D"http://code.google.com/apis/protocolbu=
ffers/docs/techniques.html#streaming" target=3D"_blank">http://code.google.=
com/apis/protocolbuffers/docs/techniques.html#streaming</a></div>

<div><br></div><div>However, this is awkward, and typically requires users =
to reach into the low-level CodedInputStream/CodedOutputStream classes and =
do a lot of work manually. =A0Furthermore, many users want to handle stream=
s of=A0heterogeneous=A0message types. =A0We tell them to wrap their message=
s in an outer type using the &quot;union&quot; pattern:</div>











<div><br></div><div>=A0=A0<a href=3D"http://code.google.com/apis/protocolbu=
ffers/docs/techniques.html#union" target=3D"_blank">http://code.google.com/=
apis/protocolbuffers/docs/techniques.html#union</a></div><div>

<br></div><div>But this is kind of ugly and has unnecessary overhead.</div>=
<div><br></div><div>These problems never really came up in our internal usa=
ge, because inside Google we have an RPC system and other utility code whic=
h builds on top of protocol buffers and provides appropriate abstraction. W=
hile we&#39;d like to open source this code, a lot of it is large, somewhat=
 messy, and highly interdependent with unrelated parts of our environment, =
and no one has had the time to rewrite it all cleanly (as we did with proto=
col buffers itself).</div>











<div><br></div><div><b>Proposed solution: =A0Generated Visitors</b></div><d=
iv><br></div><div>I&#39;ve been wanting to fix this for some time now, but =
didn&#39;t really have a good idea how. =A0CodedInputStream is annoyingly l=
ow-level, but I couldn&#39;t think of much better an interface for reading =
a stream of messages off the wire.</div>











<div><br></div><div>A couple weeks ago, though, I realized that I had been =
failing to consider how new kinds of code generation could help this proble=
m. =A0I was trying to think of solutions that would go into the protobuf ba=
se library, not solutions that were generated by the protocol compiler.</di=
v>











<div><br></div><div>So then it became pretty clear: =A0A protobuf message d=
efinition can also be interpreted as a definition for a streaming protocol.=
 =A0Each field in the message is a kind of item in the stream.</div><div><b=
r>











</div><div>=A0=A0// A stream of Foo and Bar messages, and also strings.</di=
v><div>=A0=A0message MyStream {</div><div>=A0=A0 =A0option generate_visitor=
s =3D true; =A0// enables generation of streaming classes</div><div>=A0=A0 =
=A0repeated Foo foo =3D 1;</div>











<div>=A0=A0 =A0repeated Bar bar =3D 2;</div><div>=A0=A0 =A0repeated string =
baz =3D 3;</div><div>=A0=A0}</div><div><br></div><div>All we need to do is =
generate code appropriate for treating MyStream as a stream, rather than on=
e big message.</div>











<div><br></div><div>My approach is to generate two interfaces, each with tw=
o provided implementations. =A0The interfaces are &quot;Visitor&quot; and &=
quot;Guide&quot;. =A0MyStream::Visitor looks like this:</div><div><br></div=
>











<div>=A0=A0class MyStream::Visitor {</div><div>=A0=A0 public:</div><div>=A0=
=A0 =A0virtual ~Visitor();</div><div><br></div><div>=A0=A0 =A0virtual void =
VisitFoo(const Foo&amp; foo);</div><div>=A0=A0 =A0virtual void VisitBar(con=
st Bar&amp; bar);</div>











<div>=A0=A0 =A0virtual void VisitBaz(const std::string&amp; baz);</div><div=
>=A0=A0};</div><div><br></div><div>The Visitor class has two standard imple=
mentations: =A0&quot;Writer&quot; and &quot;Filler&quot;. =A0MyStream::Writ=
er writes the visited fields to a CodedOutputStream, using the same wire fo=
rmat as would be used to encode MyStream as one big message. =A0MyStream::F=
iller fills in a MyStream message object with the visited values.</div>











<div><br></div><div>Meanwhile, Guides are objects that drive Visitors.</div=
><div><br></div><div>=A0=A0class MyStream::Guide {</div><div>=A0=A0 public:=
</div><div>=A0=A0 =A0virtual ~Guide();</div><div><br></div><div>=A0=A0 =A0/=
/ Call the methods of the visitor on the Guide&#39;s data.</div>











<div>=A0=A0 =A0virtual void Accept(MyStream::Visitor* visitor) =3D 0;</div>=
<div><br></div><div>=A0=A0 =A0// Just fill in a message object directly rat=
her than use a visitor.</div><div>=A0=A0 =A0virtual void Fill(MyStream* mes=
sage) =3D 0;</div>











<div>=A0=A0};</div><div><br></div><div>The two standard implementations of =
Guide are &quot;Reader&quot; and &quot;Walker&quot;. =A0MyStream::Reader re=
ads items from a CodedInputStream and passes them to the visitor. =A0MyStre=
am::Walker walks over a MyStream message object and passes all the fields t=
o the visitor.</div>











<div><br></div><div>To handle a stream of messages, simply attach a Reader =
to your own Visitor implementation. =A0Your visitor&#39;s methods will then=
 be called as each item is parsed, kind of like &quot;SAX&quot; XML parsing=
, but type-safe.</div>











<div><br></div><div><b>Nonblocking I/O</b></div><div><br></div><div>The &qu=
ot;Reader&quot; type declared above is based on blocking I/O, but many user=
s would prefer a non-blocking approach. =A0I&#39;m less sure how to handle =
this, but my thought was that we could provide a utility class like:</div>











<div><br></div><div>=A0=A0class NonblockingHelper {</div><div>=A0=A0 public=
:</div><div>=A0=A0 =A0template &lt;typename MessageType&gt;</div><div>=A0=
=A0 =A0NonblockingHelper(typename MessageType::Visitor* visitor);</div><div=
><br></div><div>=A0=A0 =A0// Push data into the buffer. =A0If the data comp=
letes any fields,</div>











<div>=A0=A0 =A0// they will be passed to the underlying visitor. =A0Any lef=
t-over data</div><div>=A0=A0 =A0// is remembered for the next call.</div><d=
iv>=A0=A0 =A0void PushData(void* data, int size);</div><div>=A0=A0};</div><=
div><br></div><div>










With this, you can use whatever non-blocking I/O mechanism you want, and ju=
st have to push the data into the NonblockingHelper, which will take care o=
f calling the Visitor as necessary.</div>
<div><br></div><div><b>C++ implementation</b></div><div><br></div><div>I&#3=
9;ve written up a patch implementing this for C++ (not yet including the no=
nblocking part):</div><div><br></div><div>=A0=A0<a href=3D"http://coderevie=
w.appspot.com/4077052" target=3D"_blank">http://codereview.appspot.com/4077=
052</a></div>











<div><br></div><div><b>Feedback</b></div><div><br></div><div>What do you th=
ink?</div><div><br></div><div>I know I&#39;m excited to use this in some of=
 my own side projects (which is why I spent my weekend working on it), but =
before adding this to the official implementation we should make sure it is=
 broadly useful.</div>











</blockquote></div><br>
</div>

--001636d34bc82d3c35049b40bdf4--