Protocol Buffers Python extension in C

720 views
Skip to first unread message

Atamurad Hezretkuliyev

unread,
Jan 7, 2011, 6:36:37 AM1/7/11
to Protocol Buffers
Hey there,

We at connex.io use Protocol Buffers quite heavily, 1) as client-server communication protocol, and 2) to store objects in our key-value database.
When processing requests server-side by the Python application, I've discovered that ~30% of CPU usage goes to deserializing PB messages. Which is why we looked into how we can speed up the PBs. After some experimenting and benchmarks, we decided to write a Python C extension.

Currently our basic deserializer module is 17x faster than Google's implementation in pure Python. 

You can look at the benchmark code and results here:

Our plan from here is to develop the currently missing parts (encoder, .proto file parser, code generator, etc..) to make it usable and start to use it internally as soon as possible. You can follow the above repository for updates.

If anyone is interested in testing, reviewing, or contributing your help will be appreciated. Any feedback and suggestions are welcome. 

Best,
Atamurad

--
connex.io - your smart address book

Atamurad Hezretkuliyev
Co-Founder

+993 67 642 642


Evan Jones

unread,
Jan 8, 2011, 8:32:52 AM1/8/11
to Atamurad Hezretkuliyev, Protocol Buffers
On Jan 7, 2011, at 6:36 , Atamurad Hezretkuliyev wrote:
> Currently our basic deserializer module is 17x faster than Google's
> implementation in pure Python.

The pure python code is pretty slow. However, the repository version
(and the newly released 2.4.0 rc 1?) has C++ code to do
serialization / deserialization. There is no documentation, but the
following thread describes it:

http://groups.google.com/group/protobuf/browse_thread/thread/cfb13cd0a609b1c7/a5ada8791ca3c0ca#a5ada8791ca3c0ca

You may want to test that and see how it turns out. And/or contact
Yang about this, since he was interested in the same problem.

Evan

--
Evan Jones
http://evanjones.ca/

Atamurad Hezretkuliyev

unread,
Jan 8, 2011, 1:05:20 PM1/8/11
to Evan Jones, Protocol Buffers

Following Evan's advice, I've run experiment using the repository version. 
According to my test, Google's C++ parser code is 7.6 times faster than pure python, which is consistent with Yang's results.

Our C decoder is twice faster than google's c++ extension.

I've pushed the updated code and numbers here:
https://github.com/connexio/cypb

Atamurad

Gregory P. Smith

unread,
Jan 8, 2011, 3:14:52 PM1/8/11
to Atamurad Hezretkuliyev, Evan Jones, Protocol Buffers
On Sat, Jan 8, 2011 at 10:05 AM, Atamurad Hezretkuliyev
<atam...@connex.io> wrote:
>
> Following Evan's advice, I've run experiment using the repository version.
> According to my test, Google's C++ parser code is 7.6 times faster than pure
> python, which is consistent with Yang's results.
>
> Our C decoder is twice faster than google's c++ extension.
>
> I've pushed the updated code and numbers here:
> https://github.com/connexio/cypb
>
> Atamurad

Did you see https://groups.google.com/d/msg/protobuf/z7E80KYJscc/ysCjHHmoraUJ ?

Be sure to compile your .proto into C++ code and link that into an
extension. It should use the generated C++ code and be even faster.

-gps

Atamurad Hezretkuliyev

unread,
Feb 1, 2011, 8:02:01 PM2/1/11
to Evan Jones, Protocol Buffers
I've implemented code generator so others can test / use our C implementation in Python.

Another new thing is Lazy decoding support. Message are decoded on the fly as attributes are accessed for the first time.

More info:
http://blog.connex.io/introducing-cypb-improving-the-performance-of

You can get the source here:
https://github.com/connexio/cypb

As always, feedback and contributions are welcome!
Reply all
Reply to author
Forward
0 new messages