Any Python wrappers for the C++ implementation?

1,032 views
Skip to first unread message

Yang Zhang

unread,
Dec 1, 2010, 2:22:01 AM12/1/10
to prot...@googlegroups.com
Has anyone written (a tool for generating) Python wrappers around the
C++ generated code and is willing to share this? I'm looking to do the
same, so this would save me a bit of research time. (It's fine if it's
not a general tool and this is specific to some schema.) Thanks!

Kenton Varda

unread,
Dec 1, 2010, 3:04:42 PM12/1/10
to Yang Zhang, prot...@googlegroups.com
Protobuf 2.4.0 will include an implementation of the Python API that is backed by C++ objects.  The interface is identical to the existing Python API, and you can wrap it around existing C++ objects or have it construct its own.

This code is already is SVN.  Unfortunately the team is someone backlogged and we haven't been able to make a lot of progress on an official release.  But it should be a lot easier to get the SVN code working than to write your own.  :)


--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.


Yang Zhang

unread,
Dec 1, 2010, 3:07:00 PM12/1/10
to Kenton Varda, prot...@googlegroups.com
Thanks Kenton, we'll take a look. Out of curiosity, any ETA on 2.4.0?

Yang Zhang

unread,
Dec 1, 2010, 4:54:50 PM12/1/10
to Kenton Varda, prot...@googlegroups.com
FWIW I'm seeing ~12x and ~7x speed-ups on serialization and parsing,
respectively, for messages in our app (which are ~10KB serialized) -
not too shabby!

$ python sandbox/pbbench.py out.ini # time in seconds per msg serialization
ser: 0.000434461673101
parse: 0.000602062404156

$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python sandbox/pbbench.py out.ini
ser: 2.86788344383e-05
parse: 7.63910810153e-05

Yang

--
Yang Zhang
http://yz.mit.edu/

Kenton Varda

unread,
Dec 7, 2010, 10:08:06 PM12/7/10
to Yang Zhang, prot...@googlegroups.com
Cool.  Serialization and parsing themselves should actually be improved even more than that, but having other Python code around it waters down the numbers.  :)  Also, note that if you explicitly compile C++ versions of your messages and link them into the process, they'll be even faster.  (If you don't, the library falls back to DynamicMessage which is not as fast as generated code.)

As for when 2.4.0 might be released, it's hard to say.  There's a lot of work to do, and we have a new person doing this release so he has to learn the process.  Also, holidays are coming up.  So, I'd guess it will be ready sometime in January.

Yang Zhang

unread,
Dec 8, 2010, 12:19:48 AM12/8/10
to Kenton Varda, prot...@googlegroups.com
On Tue, Dec 7, 2010 at 7:08 PM, Kenton Varda <ken...@google.com> wrote:
> Cool.  Serialization and parsing themselves should actually be improved even
> more than that, but having other Python code around it waters down the
> numbers.  :)

The times are from a minimal microbenchmark using Python's timeit module:

nruns = 1000
nwarmups = 100

es = ... # the protobufs

def ser():
return [e.SerializeToString() for e in es]

def parse(ses):
for se in ses: pb.Email().ParseFromString(se)

t = timeit.Timer(lambda:None)
t.timeit(nwarmups)
print 'noop:', t.timeit(nruns) / nruns

t = timeit.Timer(ser)
t.timeit(nwarmups)
print 'ser:', t.timeit(nruns) / nruns / len(es)

ses = ser()
t = timeit.Timer(lambda: parse(ses))
t.timeit(nwarmups)
print 'parse:', t.timeit(nruns) / nruns / len(es)

print 'msg size:', sum(len(se) for se in ses) / len(ses)

> Also, note that if you explicitly compile C++ versions of your
> messages and link them into the process, they'll be even faster.  (If you
> don't, the library falls back to DynamicMessage which is not as fast as
> generated code.)

I'm trying to decipher that last hint, but having some trouble - what
exactly do you mean / how do I do that? I'm just using protoc
--py_out=... and PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp.

> As for when 2.4.0 might be released, it's hard to say.  There's a lot of
> work to do, and we have a new person doing this release so he has to learn
> the process.  Also, holidays are coming up.  So, I'd guess it will be ready
> sometime in January.

Thanks for the estimate; even a ballpark without commitment is useful.

Kenton Varda

unread,
Dec 8, 2010, 12:40:50 AM12/8/10
to Yang Zhang, prot...@googlegroups.com
On Tue, Dec 7, 2010 at 9:19 PM, Yang Zhang <yangha...@gmail.com> wrote:
> Also, note that if you explicitly compile C++ versions of your
> messages and link them into the process, they'll be even faster.  (If you
> don't, the library falls back to DynamicMessage which is not as fast as
> generated code.)

I'm trying to decipher that last hint, but having some trouble - what
exactly do you mean / how do I do that? I'm just using protoc
--py_out=... and PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp.

I'm not completely sure what I mean, because I don't have much experience with Python C Extensions.  Basically I'm saying you should additionally generate C++ code using protoc, the compile that into a C extension (even with no interface), and then load it into your Python process.  Simply having the C++ code for your message types present will make them faster.

Yang Zhang

unread,
Dec 8, 2010, 1:49:43 AM12/8/10
to Kenton Varda, prot...@googlegroups.com

Ah, my understanding now is that:

- Python code ordinarily (without
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp) uses pure Python
(generated code) to parse/serialize messages.

- Python code *with* PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp) uses
generic C++ code that dynamically parses/serializes messages (via
DynamicMessage/reflection), as opposed to using any pre-generated C++
code.

- Python code with PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp actually
also *searches for the symbols for any pre-generated C++ code in the
current process*, and uses them if available instead of
DynamicMessage...? (This is via some global DescriptorPool magic?)

Sounds like pretty weird behavior, but indeed, now I get even faster
processing. The following run shows ~68x and ~13x speedups vs. ~15x
and ~8x (my original speedup calculations were ~15x and ~8x, not ~12x
and ~7x...not sure how I got those, I probably was going off a
different set of measurements):

$ PYTHONPATH=build/lib.linux-x86_64-2.6/:$PYTHONPATH
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python sandbox/pbbench.py
out.ini
noop: 1.6188621521e-07
ser: 6.39575719833e-06
parse: 4.55250144005e-05
msg size: 10730

This was simple to do. I added a C extension to my setup.py:

<<<
setup(
...
ext_modules=[Extension('podpb',
sources=['cpp/podpb.c','cpp/main.pb.cc'], libraries=['protobuf'])],
...
)
>>>

Generate the second source file with `protoc --cpp_out=cpp`, and
create the first one to set up an empty Python module:

<<<
#include <Python.h>

static PyMethodDef PodMethods[] = {
{NULL, NULL, 0, NULL} /* Sentinel */
};

PyMODINIT_FUNC
initpodpb(void)
{
PyObject *m;

m = Py_InitModule("podpb", PodMethods);
if (m == NULL)
return;
}
>>>

Now `python setup.py build` should build everything. Just import the
module (podpb in our case) and you're good.

Awesome tip, thanks Kenton. I foresee additions to the documentation
in protobuf's near future.... :)

Evan Goldschmidt

unread,
Jan 1, 2017, 6:48:10 PM1/1/17
to Protocol Buffers, ken...@google.com
Is the advice in this thread, particularly with respect to generating C++ message implementations, still valid for modern versions of the Python protobuf runtime?

Brief spelunking through the Python codebase didn't yield a clear mechanism for how messages are automagically discovered.

Feng Xiao

unread,
Jan 3, 2017, 2:48:09 PM1/3/17
to Evan Goldschmidt, Protocol Buffers, Kenton Varda
On Sun, Jan 1, 2017 at 3:48 PM, Evan Goldschmidt <evan.gol...@gmail.com> wrote:
Is the advice in this thread, particularly with respect to generating C++ message implementations, still valid for modern versions of the Python protobuf runtime? 

Brief spelunking through the Python codebase didn't yield a clear mechanism for how messages are automagically discovered.
Yes, it's still valid for Python protobuf runtime. The mechanism hasn't changed. The message creation process is delegated to C++ DescriptorPool/MessageFactory which has a global registry of all loaded C++ protos.
 

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscribe@googlegroups.com.

To post to this group, send email to prot...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages