--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
$ python sandbox/pbbench.py out.ini # time in seconds per msg serialization
ser: 0.000434461673101
parse: 0.000602062404156
$ PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python sandbox/pbbench.py out.ini
ser: 2.86788344383e-05
parse: 7.63910810153e-05
Yang
--
Yang Zhang
http://yz.mit.edu/
The times are from a minimal microbenchmark using Python's timeit module:
nruns = 1000
nwarmups = 100
es = ... # the protobufs
def ser():
return [e.SerializeToString() for e in es]
def parse(ses):
for se in ses: pb.Email().ParseFromString(se)
t = timeit.Timer(lambda:None)
t.timeit(nwarmups)
print 'noop:', t.timeit(nruns) / nruns
t = timeit.Timer(ser)
t.timeit(nwarmups)
print 'ser:', t.timeit(nruns) / nruns / len(es)
ses = ser()
t = timeit.Timer(lambda: parse(ses))
t.timeit(nwarmups)
print 'parse:', t.timeit(nruns) / nruns / len(es)
print 'msg size:', sum(len(se) for se in ses) / len(ses)
> Also, note that if you explicitly compile C++ versions of your
> messages and link them into the process, they'll be even faster. (If you
> don't, the library falls back to DynamicMessage which is not as fast as
> generated code.)
I'm trying to decipher that last hint, but having some trouble - what
exactly do you mean / how do I do that? I'm just using protoc
--py_out=... and PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp.
> As for when 2.4.0 might be released, it's hard to say. There's a lot of
> work to do, and we have a new person doing this release so he has to learn
> the process. Also, holidays are coming up. So, I'd guess it will be ready
> sometime in January.
Thanks for the estimate; even a ballpark without commitment is useful.
> Also, note that if you explicitly compile C++ versions of your> messages and link them into the process, they'll be even faster. (If youI'm trying to decipher that last hint, but having some trouble - what
> don't, the library falls back to DynamicMessage which is not as fast as
> generated code.)
exactly do you mean / how do I do that? I'm just using protoc
--py_out=... and PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp.
Ah, my understanding now is that:
- Python code ordinarily (without
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp) uses pure Python
(generated code) to parse/serialize messages.
- Python code *with* PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp) uses
generic C++ code that dynamically parses/serializes messages (via
DynamicMessage/reflection), as opposed to using any pre-generated C++
code.
- Python code with PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp actually
also *searches for the symbols for any pre-generated C++ code in the
current process*, and uses them if available instead of
DynamicMessage...? (This is via some global DescriptorPool magic?)
Sounds like pretty weird behavior, but indeed, now I get even faster
processing. The following run shows ~68x and ~13x speedups vs. ~15x
and ~8x (my original speedup calculations were ~15x and ~8x, not ~12x
and ~7x...not sure how I got those, I probably was going off a
different set of measurements):
$ PYTHONPATH=build/lib.linux-x86_64-2.6/:$PYTHONPATH
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp python sandbox/pbbench.py
out.ini
noop: 1.6188621521e-07
ser: 6.39575719833e-06
parse: 4.55250144005e-05
msg size: 10730
This was simple to do. I added a C extension to my setup.py:
<<<
setup(
...
ext_modules=[Extension('podpb',
sources=['cpp/podpb.c','cpp/main.pb.cc'], libraries=['protobuf'])],
...
)
>>>
Generate the second source file with `protoc --cpp_out=cpp`, and
create the first one to set up an empty Python module:
<<<
#include <Python.h>
static PyMethodDef PodMethods[] = {
{NULL, NULL, 0, NULL} /* Sentinel */
};
PyMODINIT_FUNC
initpodpb(void)
{
PyObject *m;
m = Py_InitModule("podpb", PodMethods);
if (m == NULL)
return;
}
>>>
Now `python setup.py build` should build everything. Just import the
module (podpb in our case) and you're good.
Awesome tip, thanks Kenton. I foresee additions to the documentation
in protobuf's near future.... :)
Is the advice in this thread, particularly with respect to generating C++ message implementations, still valid for modern versions of the Python protobuf runtime?
Brief spelunking through the Python codebase didn't yield a clear mechanism for how messages are automagically discovered.
--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscribe@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.