Dealing with multiple messages and .proto files

Mika Raento

unread,

Dec 8, 2009, 7:11:06 AM12/8/09

to Protocol Buffers for Perl/XS

Dear protobuf-perlxs

It's very easy to use the protobuf-perlxs compiler to create one
DynaLoader perl module from one .proto containing one message
definition. Currently (AFAIK) the only way to create modules for
several messages requires creating a project/Makefile.PL/Makefile for
each.

I've hacked my version of the compiler + added a more complicated
Makefile.PL to put a number of messages in a single DynaLoader module
that is then 'use'd by the per-message packages. I'm soliciting for
feedback on what the best way to do this would be before I submit any
code for review.

Some background:

DynaLoader works so that 'bootstrap My::Module $VERSION' loads an so
called 'Module' preferably close to the .pm and calls boot_My__Module
from that so (dll). The boot_My__Module function can add any symbols
it wants to any package but DynaLoader couples the name of the so the
loading package and the bootstrap function. Hence two different
packages cannot use the same so and only one bootstrap function will
be called when loading a so.

The xsubpp compiler for .xs files creates one boot_X function for
one .xs file. Several resulting .c files can be linked to the same so
but you can't use DynaLoader to call different boot_X functions.
http://search.cpan.org/dist/ExtUtils-MakeMaker/lib/ExtUtils/MakeMaker/FAQ.pod#XS
proposes to solve this by calling all the boot_X functions from the
boot_X in a designated .xs file and 'use Y'ing that from the other
packages.

The normal protobuf command line interface compiles each .proto file
separately and the compilers don't know of any other .protos. Hence
it's not possible to know which boot_X functions to call from a
designated module if compiling several .protos at once (for a
single .proto with multiple message definitions we could designate one
(say the first one) and make it responsible for calling the other
boot_Xs). Also, all output files are truncated before writing to.

In short: for things to work as easily as possible with DynaLoader and
MakeMaker we would prefer for all the code we want to be put in
one .xs file and for all the per-message modules load the
corresponding package.

My current solution:

I split the message.xs file to three parts: a message-inl.h that
contains all the C++ code, a message.xsh that contains all the XS code
and a message.xs that #includes the message-inl.h and INCLUDE:s the
message.xsh.

I then (in Makefile.PL) iterate over the generated .xsh/-inl.h files
creating one .xs file that looks like
#include "message1-inl.h"
#include "message2-inl.h"

MODULE = toplevel
INCLUDE: message1.xsh
INCLUDE: message2.xsh
MODULE = topleveld

And overwrites the .pm files to do 'use toplevel;' instead of the
DynaLoader things and removes the generated .xs files.

Questions:

This works and is backward compatible but feels ugly.

One option would be to replace the command line interface of the
protocol buffer compiler, add some parameters to designate the top-
level module and output the necessary .pm and .xs. We'd still need to
split the .xs code because of output file truncation.

The second option is to basically formalize what I've done: do the top-
level module generation in Makefile.PL but to provide a module for
doing all of this. It would be nice to still add some command-line
options to stop generation of the per-message .xs files and get the
correct .pm files.

Is there a third option I haven't thought about?

BTW, there are some changes needed in general to cleanly use several
messages as the custom OutputStream class needs to have a different
name in the different .xs files or be shared from somewhere else.

Regards,

Mika

Dave Bailey

unread,

Dec 9, 2009, 10:27:28 PM12/9/09

to protobu...@googlegroups.com

Hi Mika,

Thanks for bringing this up - comments inline.

I think what you are doing is fairly similar to what I'm doing for a
Makefile.PL and project with multiple message types. In my case, they
all fall under the same package prefix:

Foo::Bar
Foo::Bat
Foo::Baz

So I have a bar.proto, a bat.proto, and a baz.proto. My Makefile.PL
runs protoxs on all of these files and generates Bar.xs, Bat.xs, and
Baz.xs (and .pod and .pm files). Then I have a Foo.xs and
Foo.pm/Foo.pod which I've written by hand. Foo.xs has a BOOT: section
in which I explicitly call the boot_Foo__Bar, boot_Foo__Bat, and
boot_Foo__Baz functions to load up the other modules. Foo.pm just
bootstraps Foo via DynaLoader. In my Perl scripts, I just "use Foo;",
and I actually remove Bar.pm, Bat.pm, and Baz.pm, because I don't use
them at all. Lastly, I have to move Bar.pod, Bat.pod, and Baz.pod
into a Foo/ directory, and I put a Foo.pod in the top level directory
of the project.

In any case, however it is that we lay out the package, my thought was
that we could get protoxs to take direction from custom options in the
.proto files. We (protobuf-perlxs) have been assigned field number
1001 for doing this (I had asked Kenton Varda about this a while back,
and 1001 is the globally accepted protobuf-perlxs extension number for
custom options). So, under "Custom Options" here:

http://code.google.com/apis/protocolbuffers/docs/proto.html#options

we could define a set of perlxs extensions for FileOptions,
MessageOptions, etc, such that protoxs can figure out from the .proto
files exactly what it needs to do in order to generate the right
output for the project. So, for example, in my case above, I would do
something like:

1) in bar.proto (similarly for baz.proto, and bat.proto):

package Foo;

import perlxs.proto;

option perlxs.file.pm = false; // the default is true
option perlxs.file.poddir = "Foo"; // relative to the output directory
(default is ".")

message Bar {
...
}

2) in foo.proto (a new file that I would create, which would contain
no message descriptions at all):

package Foo;

import perlxs.proto;

option perlxs.file.boot = true; // causes Foo.xs, Foo.pm, and Foo.pod
to be generated
option perlxs.file.makefile_pl = true; // causes a Makefile.PL to be generated

This may be going a little too far, but in any case, I think it should
be possible to use these options to instruct protoxs to generate a
Foo.xs that boots the other message types (which it knows about).
e.g. "protoxs foo.proto bar.proto baz.proto bat.proto" should, in
principle, make it possible to generate a Foo.xs that has all the
right boot_Foo__Blah calls in it. And if we've gone that far, we can
probably even generate a Makefile.PL and who knows what else.

I think this means we would need to distribute a perlxs.proto with
protobuf-perlxs, and have it installed in the right place. The
contents of the file would look something like:

package perlxs;

import "google/protobuf/descriptor.proto"

message PerlXSFileOptions {
optional bool pm = 1; [default=true];
optional string poddir = 2; [default="."];
optional bool boot = 3; [default=false];
optional bool makefile_pl = 4; [default=false];
}

extend google.protobuf.FileOptions {
optional PerlXSFileOptions file;
}

Users would not be required to use perlxs.proto, but if they wanted to
do any of the advanced package management, they'd need to import it
into their .proto files.

> BTW, there are some changes needed in general to cleanly use several
> messages as the custom OutputStream class needs to have a different
> name in the different .xs files or be shared from somewhere else.

I had thought protoxs generated an output stream class definition for
each message type, so if you have Foo::Bar and Foo::Baz, Bar.xs will
have a class Foo_Bar_OutputStream (or similar) and Baz.xs will have
Foo_Baz_OutputStream. It's been a while since I looked at this code,
is my recollection incorrect?

In any case, this also makes me wonder if we should install a
libprotobuf-perlxs.so with the protobuf-perlxs package, and have some
things (such as a common, reusable class PerlXS_OutputStream) moved to
that library, rather than creating multiple essentially identical
classes, one for each top-level message type. I can't think of a lot
of other things that would go in that library right now, though.

-dave

> Regards,
>
> Mika
>
> --
>
> You received this message because you are subscribed to the Google Groups "Protocol Buffers for Perl/XS" group.
> To post to this group, send email to protobu...@googlegroups.com.
> To unsubscribe from this group, send email to protobuf-perl...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/protobuf-perlxs?hl=en.
>
>
>

Mika Raento

unread,

Dec 9, 2009, 11:01:00 PM12/9/09

to protobu...@googlegroups.com

Thanks for the reply Dave - I'll get back to you next week when I'm
back in the office.

Mika

2009/12/10 Dave Bailey <da...@daveb.net>:

Mika Raento

unread,

Dec 14, 2009, 7:02:09 AM12/14/09

to protobu...@googlegroups.com

Hi

I'm not yet sure what the best option is. Some thoughts below:

I think having a Foo::Bar package but not being able to 'use Foo::Bar'
is counter-intuitive. I would prefer to have a Foo/Bar.pm that does
'use Foo'.

Requiring a single perl 'project' (Makefile.PL) to be in a single
.proto 'package' (on some level, it could be Foo or Foo::Bar) seems a
reasonable choice.

I'm not terribly keen on putting this as .proto options:
- it limits the reusability of the .proto in different contexts
(I'm already suffering from the OPTIMIZE_FOR being an option)
- at least the examples given require a set of options for each
message, which seems overly verbose
I would rather make the necessary data a command line parameter to the compiler.

Generating a toplevel .xs that calls the necessary boot_Foo__Bar
functions is a reasonable choice. However getting a list of all the
messages in the files on the command line will require some code that
doesn't fit well with the way .proto compilation is structured - but
it wouldn't be too bad.

Putting the stream class in a shared library makes sense but OTOH if
we make the compiler aware of the module we are creating we can at
least make it easy to only have one copy in any one module which
should be good enough (this may just be a cop-out as I lack the
autoconf-fu to add a shared library to the project).

What think you?

Mika

2009/12/10 Mika Raento <mika....@gmail.com>:

Dave Bailey

unread,

Mar 9, 2010, 7:33:17 PM3/9/10

to protobu...@googlegroups.com

Hi Mika,

I think you are right, it's better to provide this via command line
parameters. There was also a discussion thread on the main protobuf
group about making optimize_for into a command-line parameter, but it
stalled out:

http://groups.google.com/group/protobuf/browse_thread/thread/d60b87fceefbe731#

Anyway, for now I uploaded protobuf-perlxs 1.0, which has support for
the one option that I needed right away (--perlxs-package, which is
the Perl/XS equivalent of the java_package file-level option). I hope
this doesn't clash too badly with any work you've done on your end - I
added you to the list of committers on the project, so please feel
free to commit any changes you've made and we can sort it out.