Faster way to convert C++ protobuf into a Python protobuf?

1,606 views
Skip to first unread message

Philipp Schrader

unread,
Jan 4, 2016, 8:35:27 PM1/4/16
to Protocol Buffers

Hi all,


I'm wondering if anyone knows of a better way to convert a C++ protobuf into a Python protobuf.

I'm writing a Python module in C++ that wraps some of our protobuf-related functionality such as reading our logs.
Right now I'm serializing the C++ protobuf and then de-serializing it in Python.
I'd love to avoid this performance cost.

Here's roughly the code I'm using:


// Python setup code
const char pb_name[] = "foo.Message1";
PyObject* database_module = PyImport_ImportModule("google.protobuf.symbol_database");
PyObject* database = PyObject_CallMethod(database_module, "Default", nullptr);
PyObject *msg_class = PyObject_CallMethod(database, "GetSymbol", "s", pb_name);
...

PyObject* ConvertToPythonPB(const ::google::protobuf::Message &msg) {
 
// Serialize into string
 
auto serialized_msg = msg.SerializeAsString();
 
// Create a new message instance
 
PyObject *py_msg = PyObject_CallObject(msg_class, nullptr);
 
// Deserialize into the Pyhton object
 
PyObject *result = PyObject_CallMethod(py_msg, "ParseFromString", "y#",
                                         serialized_msg
.data(), serialized_msg.size());
 
...
 
return result;
}

I've looked into using the cpp protobuf implementation for Python, but I haven't had any luck calling that C++ code from my module.

Is there documentation that I'm overlooking? I can't find anything on how to do this more easily.


Thanks,
Phil

Feng Xiao

unread,
Jan 4, 2016, 10:06:19 PM1/4/16
to Philipp Schrader, Protocol Buffers
As far as I know serializing and parsing again is the best way to transfer proto data between the python/c++ language boundary. There is no documentation about it and neither any public API for it. You may be able to do better by hacking around the python cpp implementation but I presume that's difficult and doesn't actually buy you much (considering that you still have python code to run, it may be better to just write the whole thing in C++).
 


Thanks,
Phil

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Josh Haberman

unread,
Jan 12, 2016, 5:42:07 PM1/12/16
to Protocol Buffers
Hi Phil,

If you are not doing so already, I would highly recommend using the C++ implementation for Python (instead of the pure-Python one). It will make ParseFromString() in Python much, much faster, which might give you the speed boost you need.

There are possible ways of doing more clever things, but they would require you to use the C++ implementation anyway, and they would make you more dependent on the internals of the Python/C++ library.

Philipp Schrader

unread,
Jan 14, 2016, 5:46:11 PM1/14/16
to Protocol Buffers
Hi Josh,

Thanks for the reply. I was worried someone would say as much :)

Also, I am indeed using the C++ implementation. I'm very happy that this is possible!

Thanks
Phil

Emmanuel Decitre

unread,
Jan 15, 2016, 12:10:47 PM1/15/16
to Protocol Buffers
Hi Philipp,

I am exposing the generated C++ classes to my Python code through cython.
It works pretty well, and I could use the CodedInputStream and GzipInputStream classes too.

For this you have to develop a couple of .pyx and .pxd files around the generated .cpp code.

Cheers
Emmanuel
Reply all
Reply to author
Forward
0 new messages