ANN: new Python protobuf implementation

180 views
Skip to first unread message

David Wilson

unread,
Jul 26, 2009, 10:33:07 AM7/26/09
to Protocol Buffers, d...@botanicus.net
I'm hacking on a protobuf encoder as part of a library to make
handling typed, structured data simpler, cleaner, and less error-
prone.

The project is still being designed (it's really just an experiment),
although has reached the point where it can convert protobuf's own
descriptor.proto (after conversion to a FileDescriptorSet via protoc -
o) into the target in-memory representation, and dump the resulting
tree to JSON.

It already has some features to differentiate it from the official
distribution:

* The same structure definition method is used to serialize JSON too
(and eventually XML).
* Natural interface, e.g. simply assign a new list of messages over
an old one, rather than slice assignment like the official
distribution.
* Use types other than list for sequences, e.g. array.array('L'),
set, <your-custom-tree-class>, etc.
* Lighter weight runtime (61ms startup vs. 208ms for official
distribution on my MacBook)
* Hopefully better perf, as it builds some structures to speed up
encoding/decoding. I've got more tricks to boost speed, but not ready
to start adding complexity yet.

Things it doesn't do yet:

* Enums (you can just treat them as integers)
* Preserve unrecognized data between load/dump roundtrips (planned)
* Read .proto files (a FileDescriptorSet -> .py conversion tool is
about 50% done)
* Generate .proto files based on its native representation (planned)
* Completely defined type system. Native types are based on
protobuf's types, but going to add sme more, and perhaps tweak the
collections support into something more generic that can handle
arbitrary mapping types too.
* Docs are incoherent pile of concatenated notes.

The PB support is in addition to the main goal for the library, so
certain features are omitted entirely, such as extensions,
configurable int packing, etc.

Feel free to have a tinker, I'd especially like feedback on the
interface, and general performance:

http://code.google.com/p/py-datakit

See tests/encoding_testlib createKind & createEntity, and
pb_encoding.Encoding.encode() / decode().


David
Reply all
Reply to author
Forward
0 new messages