Python C++ wrapping

482 views
Skip to first unread message

Jason Paryani

unread,
Jul 7, 2013, 5:29:57 AM7/7/13
to capn...@googlegroups.com
Hi all,
The past few weeks I've been working on a re-write of my Python wrapping of the C++ capnroto library, and it's finally in a workable state. I've dropped the dependency on Boost::Python, and replaced it with Cython, a library meant for quick and easy writing of Python C/C++ extension modules. I've also changed my library to use capnproto's dynamic API, so it's going to be decently slower than my initial Boost::Python version. The advantage is that there won't be any compile steps required except for the initial installation of the library. Code to load and use the library looks like so:
import capnp
addressbook = capnp.load('addressbook.capnp')

message = capnp.MallocMessageBuilder()
addressBook = message.initRoot(addressbook.AddressBook)
people = addressBook.initPeople(2)

alice = people[0]
alice.id = 123
alice.name = 'Alice'
alice.email = 'al...@example.com'
alicePhones = alice.initPhones(1)
alicePhones[0].number = "555-1212"
alicePhones[0].type = 'mobile'
alice.employment.school = "MIT"

bob = people[1]
bob.id = 456
bob.name = 'Bob'
bob.email = 'b...@example.com'
bobPhones = bob.initPhones(2)
bobPhones[0].number = "555-4567"
bobPhones[0].type = 'home'
bobPhones[1].number = "555-7654" 
bobPhones[1].type = addressbook.Person.PhoneNumber.Type.WORK
bob.employment.unemployed = None
Essentially, the python API looks almost exactly the same as it did before, except note the `addressbook = capnp.load('addressbook.capnp')` line. Now you are able to dynamically import any capnp specifications at runtime, without any complicated compilation steps.

The code is available at https://github.com/jparyani/capnpc-python-cpp, and please read the README carefully. You absolutely must have the latest version of cython and setuptools installed (pip install -U cython setuptools), and for some reason I needed to run `sudo ldconfig` on my ubuntu 12.10 box before I could link with libcapnp.so properly. Please let me know if you run into any issues, especially if it's an installation issue. I've only able to test on OSX 10.8 Python 2.7.4, and Ubuntu 12.10 Python 2.7.3 so far. Python 3.x is currently not working because of unicode problems, but it shouldn't be too much work to fix it.

Kenton Varda

unread,
Jul 7, 2013, 2:29:28 PM7/7/13
to Jason Paryani, capnproto
Awesome!

Have you done any tests comparing performance between the dynamic API and the old static approach?  It might not be all that bad, compared to the overhead of using Python in the first place.  :)



--
You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
Visit this group at http://groups.google.com/group/capnproto.
 
 

Jason Paryani

unread,
Jul 7, 2013, 3:54:37 PM7/7/13
to capn...@googlegroups.com, Jason Paryani
Alright you motivated me to do a quick benchmark. The new dynamic version is actually much faster than the old boost python wrapping. My best guess is it's because I'm terrible at using boost-python, and the way I was copying C++ objects constantly was causing slowdowns.

python benchmark_boost.py > /dev/null  1.76s user 0.28s system 99% cpu 2.049 total
python benchmark_dynamic.py > /dev/null  0.70s user 0.23s system 99% cpu 0.933 total

The benchmark is just running the example from examples/example.py 10000 times.

I also went ahead and did a line by line profile to get a better idea what kinds of operations are causing slowdowns (unfortunately it's tough to run a normal profile on the boost python code, because it doesn't have python discernable functions). I've attached the results below in case anyone cares to take a look:

dynamic:
Timer unit: 1e-06 s


File: example.py
Function: writeAddressBook at line 8
Total time: 0.000119 s


Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     
8                                           @profile
     
9                                           def writeAddressBook(fd):
   
10         1            7      7.0      5.9      message = capnp.MallocMessageBuilder()
   
11         1           10     10.0      8.4      addressBook = message.initRoot(addressbook.AddressBook)
   
12         1           21     21.0     17.6      people = addressBook.init('people', 2)
   
13                                          
   
14         1            1      1.0      0.8      alice = people[0]
   
15         1            6      6.0      5.0      alice.id = 123
   
16         1            3      3.0      2.5      alice.name = 'Alice'
   
17         1            2      2.0      1.7      alice.email = 'al...@example.com'
   
18         1            6      6.0      5.0      alicePhones = alice.init('phones', 1)
   
19         1            2      2.0      1.7      alicePhones[0].number = "555-1212"
   
20         1            2      2.0      1.7      alicePhones[0].type = 'mobile'
   
21         1           11     11.0      9.2      alice.employment.school = "MIT"
   
22                                          
   
23         1            1      1.0      0.8      bob = people[1]
   
24         1            3      3.0      2.5      bob.id = 456
   
25         1            1      1.0      0.8      bob.name = 'Bob'
   
26         1            3      3.0      2.5      bob.email = 'b...@example.com'
   
27         1            6      6.0      5.0      bobPhones = bob.init('phones', 2)
   
28         1            1      1.0      0.8      bobPhones[0].number = "555-4567"
   
29         1            2      2.0      1.7      bobPhones[0].type = 'home'
   
30         1            1      1.0      0.8      bobPhones[1].number = "555-7654"
   
31         1            3      3.0      2.5      bobPhones[1].type = addressbook.Person.PhoneNumber.Type.WORK
   
32         1            6      6.0      5.0      bob.employment.unemployed = None  # This is definitely bad, syntax will change at some point
   
33                                          
   
34         1           21     21.0     17.6      capnp.writePackedMessageToFd(fd, message)


File: example.py
Function: printAddressBook at line 37
Total time: 0.000125 s


Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   
37                                           @profile
   
38                                           def printAddressBook(fd):
   
39         1           15     15.0     12.0      message = capnp.PackedFdMessageReader(f.fileno())
   
40         1            4      4.0      3.2      addressBook = message.getRoot(addressbook.AddressBook)
   
41                                          
   
42         3           20      6.7     16.0      for person in addressBook.people:
   
43         2           24     12.0     19.2          print(person.name, ':', person.email)
   
44         5           30      6.0     24.0          for phone in person.phones:
   
45         3           12      4.0      9.6              print(phone.type, ':', phone.number)
   
46                                          
   
47         2            6      3.0      4.8          which = person.employment.which()
   
48         2            2      1.0      1.6          print(which)
   
49                                          
   
50         2            2      1.0      1.6          if which == addressbook.Person.Employment.Which.UNEMPLOYED:
   
51         1            1      1.0      0.8              print('unemployed')
   
52         1            1      1.0      0.8          elif which == addressbook.Person.Employment.Which.EMPLOYER:
   
53                                                       print('employer:', person.employment.employer)
   
54         1            1      1.0      0.8          elif which == addressbook.Person.Employment.Which.SCHOOL:
   
55         1            5      5.0      4.0              print('student at:', person.employment.school)
   
56                                                   elif which == addressbook.Person.Employment.Which.SELF_EMPLOYED:
   
57                                                       print('self employed')
   
58         2            2      1.0      1.6          print()



boost:
Timer unit: 1e-06 s


File: example.py
Function: writeAddressBook at line 6
Total time: 0.000158 s


Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     
6                                           @profile
     
7                                           def writeAddressBook(fd):
     
8         1           14     14.0      8.9      message = capnp.MallocMessageBuilder()
     
9         1           23     23.0     14.6      addressBook = message.initRootAddressBook()
   
10         1           11     11.0      7.0      people = addressBook.initPeople(2)
   
11                                          
   
12         1           10     10.0      6.3      alice = people[0]
   
13         1            5      5.0      3.2      alice.id = 123
   
14         1            6      6.0      3.8      alice.name = 'Alice'
   
15         1            2      2.0      1.3      alice.email = 'al...@example.com'
   
16         1            4      4.0      2.5      alicePhones = alice.initPhones(1)
   
17         1           12     12.0      7.6      alicePhones[0].number = "555-1212"
   
18         1            6      6.0      3.8      alicePhones[0].type = addressbook.Person.PhoneNumber.Type.MOBILE
   
19         1            5      5.0      3.2      alice.employment.school = "MIT"
   
20                                          
   
21         1            3      3.0      1.9      bob = people[1]
   
22         1            2      2.0      1.3      bob.id = 456
   
23         1            2      2.0      1.3      bob.name = 'Bob'
   
24         1            2      2.0      1.3      bob.email = 'b...@example.com'
   
25         1            2      2.0      1.3      bobPhones = bob.initPhones(2)
   
26         1            4      4.0      2.5      bobPhones[0].number = "555-4567"
   
27         1            4      4.0      2.5      bobPhones[0].type = addressbook.Person.PhoneNumber.Type.HOME
   
28         1            4      4.0      2.5      bobPhones[1].number = "555-7654"
   
29         1            4      4.0      2.5      bobPhones[1].type = addressbook.Person.PhoneNumber.Type.WORK
   
30         1            4      4.0      2.5      bob.employment.unemployed = capnp.Void.VOID  # This is definitely bad, syntax will change at some point
   
31                                          
   
32         1           29     29.0     18.4      capnp.writePackedMessageToFd(fd, message)


File: example.py
Function: printAddressBook at line 35
Total time: 0.000249 s


Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   
35                                           @profile
   
36                                           def printAddressBook(fd):
   
37         1           19     19.0      7.6      message = capnp.PackedFdMessageReader(f.fileno())
   
38         1           10     10.0      4.0      addressBook = message.getRootAddressBook()
   
39                                          
   
40         3           39     13.0     15.7      for person in addressBook.people:
   
41         2           30     15.0     12.0          print(person.name, ':', person.email)
   
42         5          100     20.0     40.2          for phone in person.phones:
   
43         3           20      6.7      8.0              print(phone.type, ':', phone.number)
   
44                                          
   
45         2           13      6.5      5.2          which = person.employment.which()
   
46         2            4      2.0      1.6          print(which)
   
47                                          
   
48         2            3      1.5      1.2          if which == addressbook.Person.Employment.Which.UNEMPLOYED:
   
49         1            1      1.0      0.4              print('unemployed')
   
50         1            1      1.0      0.4          elif which == addressbook.Person.Employment.Which.EMPLOYER:
   
51                                                       print('employer:', person.employment.employer)
   
52         1            1      1.0      0.4          elif which == addressbook.Person.Employment.Which.SCHOOL:
   
53         1            6      6.0      2.4              print('student at:', person.employment.school)
   
54                                                   elif which == addressbook.Person.Employment.Which.SELF_EMPLOYED:
   
55                                                       print('self employed')
   
56         2            2      1.0      0.8          print()



Kenton Varda

unread,
Jul 7, 2013, 4:47:57 PM7/7/13
to Jason Paryani, capnproto
Hah, awesome!

How about comparing against protobuf as well?  Be sure to enable the still-experimental C-extension-backed protobuf implementation for a fair fight.

Jason Paryani

unread,
Jul 7, 2013, 7:25:13 PM7/7/13
to capn...@googlegroups.com, Jason Paryani
I started trying to plug into the benchmark suite you have in c++/src/capnp/benchmark, but gave up because it was going to take a fair bit of work. It's something worth doing at some point, but for now I just wrote some a very simple benchmark, similar to what I was doing above, and put them in https://github.com/jparyani/capnpc-python-cpp/tree/master/benchmark.

As expected, capnproto is much faster when you take into account serialization and writing to a stream.

$time python addressbook.proto.py
python addressbook
.proto.py  3.70s user 0.39s system 99% cpu 4.101 total


$export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION
=cpp; time python addressbook.proto.py
python addressbook
.proto.py  1.63s user 0.40s system 99% cpu 2.031 total


$time python addressbook
.capnp.py
python addressbook
.capnp.py  0.72s user 0.34s system 99% cpu 1.064 total


I've attached the line profiles as well, just in case anyone's curious for the breakdown.
addressbook.capnp.py.txt
addressbook.proto.cpp.py.txt
addressbook.proto.py.txt

Kenton Varda

unread,
Jul 7, 2013, 7:38:48 PM7/7/13
to Jason Paryani, capnproto
Excellent.

Out of curiosity, if you don't count serialization time, what do you get?  I.e. if you just measure how long it takes to call the accessors to set up an object.  The protobuf implementation is implemented in a very similar way, so I'd expect performance to be similar.

Also note that protobuf serialization time is much faster if you generate the C++ code for the specific message type and link it into the C extension.  But, of course, Python programmers probably don't want to do that.


Jason Paryani

unread,
Jul 7, 2013, 8:03:12 PM7/7/13
to capn...@googlegroups.com, Jason Paryani
Actually, I mis-spoke before. It's not the serialization making a major difference, as we can see from the line profiles:

    33     10000       128148     12.8     22.8      capnp.writePackedMessageToFd(fd, message)
vs
   
29     10000       148150     14.8      9.5      message_string = addressBook.SerializeToString()
   
30     10000        41984      4.2      2.7      fd.write(message_string)

The highlighted column is time in microseconds per call.

The biggest differences actually occur in initializing fields:

    13     10000        70373      7.0     12.5      people = addressBook.init('people', 2)
vs
   
11     10000       233784     23.4     14.9      alice = addressBook.person.add()
...
   
19     10000       121705     12.2      7.8      bob = addressBook.person.add()


That alone accounts for most of the difference. 

Otherwise, almost every other operation is marginally faster across the board. Without diving more into the google protobuf implementation, I can't say what's causing the difference for sure, but if I had to guess they probably have some pure-python glue code to the C++ layer that's causing slowdowns.

Kenton Varda

unread,
Jul 7, 2013, 8:20:30 PM7/7/13
to Jason Paryani, capnproto
That makes sense.  As I recall the guy working on the python-c-extension protobuf implementation told me that object allocation was very slow, I think due to bookkeeping they needed to do to deal with differing ownership paradigms and such.

Speaking of which, does your implementation make sure that the MessageBuilder can't be garbage-collected while references to sub-objects exist?

Anyway, awesome stuff!  What's left to do before declaring this production-ready?

-Kenton

Jason Paryani

unread,
Jul 7, 2013, 8:52:35 PM7/7/13
to capn...@googlegroups.com, Jason Paryani
So, my implementation at the moment does not ensure that the parent object isn't garbage collected. My thinking was that it makes no sense for a user to allocate a MessageBuilder object, init fields, and then throw away the reference to the original MessageBuilder while still wanting to use those fields. The only concern is that not following this (admittedly arbitrary) rule will almost certainly lead to seg-faults and crashing the whole python process, so I'll go ahead and add the references.

Does the same problem apply to MessageReader? I'm getting the feeling it does, unless you're copying behind the scenes.

In general, the only thing keeping it from being production-ready is some testing. I've only run it through some very simple cases, and it seems to hold up fine, but I wouldn't be prepared to call it production-ready.

Kenton Varda

unread,
Jul 7, 2013, 9:06:18 PM7/7/13
to Jason Paryani, capnproto
Yeah, the same applies to MessageReader.  And there I think it's a lot more likely that someone will throw away the top-level object while still holding references to its contents.  For MessageBuilder I agree this is less likely, but I could imagine someone doing it if they were just using Cap'n Proto objects as an in-memory representation (for some reason).  It should probably work (and definitely shouldn't segfault).

For testing, you might consider translating test-util.c++ to Python.

Jason Paryani

unread,
Jul 7, 2013, 10:01:21 PM7/7/13
to capn...@googlegroups.com, Jason Paryani
Ok, I added saving away references of the parent MessageBuilder/Reader objects inside child objects. It had only a negligible effect on speed, and will definitely save some people.

Also, I forgot to mention before that I got the library working with Python 3.3, tested on OSX 10.8.

Jason Paryani

unread,
Aug 12, 2013, 6:54:49 PM8/12/13
to capn...@googlegroups.com, Jason Paryani
I've upgraded the library to work with v0.2 of the capnproto C++ library. Also I've uploaded the library to pypi (https://pypi.python.org/pypi/capnp/), so the installation process is now way easier. First make sure you have v0.2rc of the capnproto C++ library installed from http://capnproto.org/capnproto-c++-0.2.0-rc2.tar.gz. Then just run `pip install capnp`, and you should be good to go. This of course assumes you have pip already installed (http://www.pip-installer.org/en/latest/installing.html), and have a relatively new version of setuptools (run `pip install -U setuptools` if it gives you trouble).

Also, 1 breaking change. I've simplified the codebase quite a lot by removing enum namespaces. What this means is things like `bobPhones[1].type = addressbook.Person.PhoneNumber.Type.WORK` are now gone, and instead you would have to use bobPhones[1].type = 'work'. The former method was a holdover of the C++ way of doing it, but I felt it was overly complex to support, as well as being un-pythonic and actually a lot slower than just using string literals (all those dots in addressbook.Person.PhoneNumber.Type.WORK work out to being dictionary lookups).

Kenton Varda

unread,
Aug 12, 2013, 10:04:27 PM8/12/13
to Jason Paryani, capnproto
Nice work!

Agreed that using string names for enums is the way to go.  Since Python is dynamically typed, there's not much advantage to using a symbolic name.  Either way, an exception is going to be raised at runtime.

I'm working on the schema parser API as we speak, then we can make this even better...


On Mon, Aug 12, 2013 at 3:54 PM, Jason Paryani <jpar...@gmail.com> wrote:
I've upgraded the library to work with v0.2 of the capnproto C++ library. Also I've uploaded the library to pypi (https://pypi.python.org/pypi/capnp/), so the installation process is now way easier. First make sure you have v0.2rc of the capnproto C++ library installed from http://capnproto.org/capnproto-c++-0.2.0-rc2.tar.gz. Then just run `pip install capnp`, and you should be good to go. This of course assumes you have pip already installed (http://www.pip-installer.org/en/latest/installing.html), and have a relatively new version of setuptools (run `pip install -U setuptools` if it gives you trouble).

Also, 1 breaking change. I've simplified the codebase quite a lot by removing enum namespaces. What this means is things like `bobPhones[1].type = addressbook.Person.PhoneNumber.Type.WORK` are now gone, and instead you would have to use bobPhones[1].type = 'work'. The former method was a holdover of the C++ way of doing it, but I felt it was overly complex to support, as well as being un-pythonic and actually a lot slower than just using string literals (all those dots in addressbook.Person.PhoneNumber.Type.WORK work out to being dictionary lookups).

--

Andrew Lutomirski

unread,
Aug 12, 2013, 11:03:20 PM8/12/13
to Jason Paryani, capn...@googlegroups.com
I wonder if it's worth making bobPhones[1].type be a string subclass
that enforces singleton-ness. Otherwise "bobPhones[0].type is
bobPhones[1].type" will be unreliable.

--Andy

Kenton Varda

unread,
Aug 12, 2013, 11:31:25 PM8/12/13
to Andrew Lutomirski, Jason Paryani, capnproto
On Mon, Aug 12, 2013 at 8:03 PM, Andrew Lutomirski <an...@luto.us> wrote:
I wonder if it's worth making bobPhones[1].type be a string subclass
that enforces singleton-ness.  Otherwise "bobPhones[0].type is
bobPhones[1].type" will be unreliable.

Hmm, but why would anyone expect "is" to be reliable in the first place, when operating on values that look like strings?  Wouldn't they just use == to avoid the question?

Jason Paryani

unread,
Aug 12, 2013, 11:33:51 PM8/12/13
to capn...@googlegroups.com, Jason Paryani, an...@luto.us
On Monday, August 12, 2013 8:03:20 PM UTC-7, Andrew Lutomirski wrote:
I wonder if it's worth making bobPhones[1].type be a string subclass
that enforces singleton-ness.  Otherwise "bobPhones[0].type is
bobPhones[1].type" will be unreliable.

--Andy

I definitely see your concern, but I'm more concerned about the weirdness of introducing a new type here. Seeing as I removed the interface for addressing the enums globally (ie. you can't do addressbook.Person.PhoneNumber.Type.WORK anymore), I don't see it as too big a point of confusion. It's very clear that these fields behave like strings, and in python, you should never be using `is` over '==', unless you're a 100% sure that the references are supposed to be the same (which is almost never the case with user generated strings). I see why you'd want to do it for speed, but at the end of the day, I'd have to look up references in some sort of map internally to get this to work (albeit this would be in cython/c++, so possibly decently faster than in python). String literals actually are decently optimized for in python, although the string comparison operation is obviously non-ideal.

Andrew Lutomirski

unread,
Aug 12, 2013, 11:51:34 PM8/12/13
to Kenton Varda, Jason Paryani, capnproto

I suppose I'm thinking that type.VALUE is type.VALUE and I'm expecting enums to work like that even if they're strings.  This may be silly.

--Andy

Jason Paryani

unread,
Aug 13, 2013, 12:00:20 AM8/13/13
to capn...@googlegroups.com, Kenton Varda, Jason Paryani, an...@luto.us
Part of the problem here is that Python just doesn't have a good enum type that's native to the language (although it looks like there will finally be one in Python 3.4 http://www.python.org/dev/peps/pep-0435/). At this point, I'm hesitant to introduce non-standard semantics for some new type for these fields. Instead, now they're just strings with the one slight weirdness being that you'll get a ValueError exception if you set it to an invalid value.

Kenton Varda

unread,
Aug 13, 2013, 3:56:14 PM8/13/13
to Andrew Lutomirski, Jason Paryani, capnproto
On Mon, Aug 12, 2013 at 8:51 PM, Andrew Lutomirski <an...@luto.us> wrote:

I suppose I'm thinking that type.VALUE is type.VALUE and I'm expecting enums to work like that even if they're strings.  This may be silly.

I thought the point was that there won't actually be any constant called VALUE.  You have to use a string literal.

Andrew Lutomirski

unread,
Aug 13, 2013, 5:13:35 PM8/13/13
to Kenton Varda, Jason Paryani, capnproto
Which is exactly why my complaint is silly. Although, if Python
3.4-style enums are ever supported, the results might be a little bit
odd.

Jason Paryani

unread,
Aug 19, 2013, 2:56:11 AM8/19/13
to capn...@googlegroups.com, Kenton Varda, Jason Paryani, an...@luto.us
Docs are now up at http://jparyani.github.io/capnpc-python-cpp/. Please let me know if you have any thoughts/suggestions.

I haven't added a section detailing the new behavior of enums, since I haven't changed the code yet to do as we discussed. For now, it just throws exceptions on unknown enum fields of all types.

Kenton Varda

unread,
Aug 19, 2013, 8:13:39 PM8/19/13
to Jason Paryani, capnproto
Overall, very nice work!  I can't wait to make this part of the 0.3.0 release announcement.

Some nitpicks:

- The performance numbers seem odd to me.  You say 4x faster than pure-python protobuf, 2x faster than c-extension protobuf.  But, I'm pretty sure the c-extension protobuf is much more than 2x faster than the pure-python protobuf in most cases.  The other problem with numbers like this is that performance numbers vary wildly depending on use case.  So, I'd suggest not mentioning specific numbers unless you want to refer to benchmarks covering a wide variety of cases.

- On a related note, you say "The INFINITY TIMES faster part isn’t so true for python", but it should be just as true for Python as it is for C++.  (It's an obviously questionable claim in both cases.  :) )

- When I do "capnp.load('foo.capnp')", where does it look for the file?  I think we want to search PYTHONPATH.  In fact, I wonder if it would be possible to hook into the module loading mechanism such that people can actually use a regular python import statement to load a capnp schema...

- You write the title "Capnproto" in a lot of places.  For consistency, it should be written "Cap'n Proto".

- The instructions for installing the C++ runtime should really direct people at a release version, not git head as it may be broken at any particular time.  I'd suggest keeping this pointing at the most recent release version that you've tested against.

-Kenton

Jason Paryani

unread,
Aug 19, 2013, 8:43:22 PM8/19/13
to capn...@googlegroups.com, Jason Paryani
On Monday, August 19, 2013 5:13:39 PM UTC-7, Kenton Varda wrote:
Overall, very nice work!  I can't wait to make this part of the 0.3.0 release announcement.

Some nitpicks:

- The performance numbers seem odd to me.  You say 4x faster than pure-python protobuf, 2x faster than c-extension protobuf.  But, I'm pretty sure the c-extension protobuf is much more than 2x faster than the pure-python protobuf in most cases.  The other problem with numbers like this is that performance numbers vary wildly depending on use case.  So, I'd suggest not mentioning specific numbers unless you want to refer to benchmarks covering a wide variety of cases. 

I was a little bit worried putting the numbers in, and you've convinced me to take them out. Benchmarks are always contentious, so I'll just make it a vague "has been benchmarked to be faster than protobuf in certain cases", and point them to the benchmark part of the repo.

- On a related note, you say "The INFINITY TIMES faster part isn’t so true for python", but it should be just as true for Python as it is for C++.  (It's an obviously questionable claim in both cases.  :) )
 
I think I'm going to leave this in since it's all pretty tongue-in-cheek :)

 
- When I do "capnp.load('foo.capnp')", where does it look for the file?  I think we want to search PYTHONPATH.  In fact, I wonder if it would be possible to hook into the module loading mechanism such that people can actually use a regular python import statement to load a capnp schema...
 
For now it only looks in the current working directory, or it takes absolute paths. I'll document that, and put PYTHONPATH searching on my TODO, since it should be relatively easy to accomplish.

I'm a little bit leery of monkey patching the import system, but I know for a fact it's definitely possible. I'll probably take a crack at it at some point, but leave it as a highly experimental/not recommended method.

 
- You write the title "Capnproto" in a lot of places.  For consistency, it should be written "Cap'n Proto".
 
My bad. I'll fix that up. 


- The instructions for installing the C++ runtime should really direct people at a release version, not git head as it may be broken at any particular time.  I'd suggest keeping this pointing at the most recent release version that you've tested against.

The only reason I'm pointing them at master is because v.2 of capnproto is too old now. I have code referencing unnamed enums that breaks when compiled against that version. As soon as v.3 comes out, I'll fix up the docs to point to the release, and in the future I'll try to not push API breaking code changes to master until you've released :)

Thanks for all the suggestions. Feel free to let me know if you find anything else.

Kenton Varda

unread,
Aug 19, 2013, 8:47:09 PM8/19/13
to Jason Paryani, capnproto
All sounds good.


--

Jason Paryani

unread,
Aug 29, 2013, 9:58:13 PM8/29/13
to capn...@googlegroups.com, Jason Paryani
I've finally started writing tests for the python library. It's pretty fledgling at the moment, but I've been adding to it whenever I have some spare time. You can run them yourself after you've installed the library with `py.test`. If you don't have py.test installed, first run `pip install pytest`.

Also, I've added docstrings to all public functions. As a consequence, the API reference docs have gotten a lot prettier at http://jparyani.github.io/capnpc-python-cpp/capnp.html. I'm sure there's all sorts of typos or confusing bits, so feel free to let me know or fix it up and submit a pull request :)

Jason Paryani

unread,
Sep 1, 2013, 4:07:37 AM9/1/13
to capn...@googlegroups.com, Jason Paryani
I've decided to change the name of the python package to pycapnp, so that there's no confusion with the normal libcapnp C++ library. There's no real convention for this, but it's the way pyzmq named themselves, and that's good enough for me.

That means the urls for pypi and github will change to https://pypi.python.org/pypi/pycapnp and https://github.com/jparyani/pycapnp respectively. The docs have also changed to http://jparyani.github.io/pycapnp/.

Jason Paryani

unread,
Sep 1, 2013, 10:32:30 PM9/1/13
to capn...@googlegroups.com, Jason Paryani
Just a note, make sure you uninstall the old capnp library before installing pycapnp ie. 'pip uninstall capnp && pip install pycapnp'. If 'pip uninstall' doesn't work for you, you'll have to manually delete the directory. An easy way to find where it installed is by running `python -c 'import capnp; print capnp.__path__'

Also, there's a few minor naming changes going into the next release, as well as deprecating the old read/write api. A friend of mine pointed out I had misread PEP 8, and the proper naming scheme for functions is supposed to be all lowercase, with underscores. I decided to fix it now, before the release goes any wider. That means the following renames are going in:

_DynamicStructBuilder.asReader -> as_reader
_DynamicStructBuilder.initResizableList -> init_resizable_list
_DynamicStructBuilder.writeTo -> write
_DynamicStructBuilder.writePackedTo -> write_packed
readFrom -> read
readPackedFrom -> read_packed

As well as all the private functions I could find in the library.

Also, the newly added convenience functions for reading/writing from files that Kenton added to python library are now the preferred method. I'm prepending all the old classes/methods with an underscore for now, and you should discontinue using them as soon as possible. They've also been lowercased/underscored where appropriate.

This will all go into effect for the next release, v0.3.12. I'm sorry for anyone this greatly inconveniences, but I think it's better to standardize the naming now, before the official release.

Kenton Varda

unread,
Sep 1, 2013, 10:38:29 PM9/1/13
to Jason Paryani, capnproto
Will you also transform the style of field names?  I would be in favor of it, but this will add some complication since you'll have to build a map for each schema and you'll no longer be able to use the DynamicStruct methods that take a string name.


--

Jason Paryani

unread,
Sep 1, 2013, 10:41:05 PM9/1/13
to Kenton Varda, capnproto
I'm going to hold off on doing anything to field names. I'm not a big fan of transforming them for users, since it just adds a big point of confusion. 

Jason

Jason Paryani

unread,
Sep 2, 2013, 12:41:23 AM9/2/13
to capn...@googlegroups.com, Kenton Varda
I've added the ability to hook into the python `import` system. I used to be opposed to making it the default way to load .capnp files, but I'm kind of liking how it looks...

import capnp
capnp
.add_import_hook()

import addressbook # it will search everywhere in sys.path for a file named addressbook.capnp

The `add_import_hook` function also takes an additional parameter for easily adding paths to the search list. These are prepended to the beginning of sys.path
import capnp
capnp
.add_import_hook(['/usr/local/include/capnp'])

import schema

I'm not sure the parameter is all that necessary, since you could just as easily add onto sys.path yourself. My thinking was PYTHONPATH won't always equal exactly the same thing as your Cap'n Proto search path, and this will give you a quick and easy way to include extra paths without the sys.path.insert/remove mess.

Kenton Varda

unread,
Sep 2, 2013, 2:32:04 AM9/2/13
to Jason Paryani, capnproto
Awesome!

Two things:
- What happens if add_import_hook is called from multiple files?
- It may be common for people to want to have a schema and a Python module by the same name, e.g. "addressbook.capnp" and "addressbook.py".  I wonder if it makes sense to require the import name to have a "_capnp" suffix or something.  (I'm not sure about this, just throwing it out as an idea.)

Jason Paryani

unread,
Sep 2, 2013, 2:49:52 AM9/2/13
to Kenton Varda, capnproto
On Sun, Sep 1, 2013 at 11:32 PM, Kenton Varda <temp...@gmail.com> wrote:
Awesome!

Two things:
- What happens if add_import_hook is called from multiple files?
 
This should work fine. I call remove_import_hook at the beginning of every call of add_import_hook, so it will go through adding the hook every time. This is a pretty fast operation, so speed isn't a huge concern.

- It may be common for people to want to have a schema and a Python module by the same name, e.g. "addressbook.capnp" and "addressbook.py".  I wonder if it makes sense to require the import name to have a "_capnp" suffix or something.  (I'm not sure about this, just throwing it out as an idea.)


I was wondering about this too. I can't think of a better way to do it, and I do think people will appreciate the namespacing and clarity of importing schemas with a '_capnp' suffix. I'll go ahead and add this.

Kenton Varda

unread,
Sep 2, 2013, 2:55:49 AM9/2/13
to Jason Paryani, capnproto
So, one thing I could see being an issue is, one file in the import chain calls add_import_hook, and then files that depend on it can get away with using the import hook without actually calling add_import_hook themselves.  But then, some day, that first file changes a little bit and stops calling add_import_hook, and suddenly everyone downstream is broken.

I'm not sure if there's any good solution for that.

BTW, is there any good reason not to add the import hook atomatically when capnp is imported?  If it's just because you're worried some people won't want their import mechanism messed with, well...  I think they're going to have a hard time avoiding it, since as I described above, all it takes is for one dependency to call it and now it affects everyone, right?  Meanwhile a lot of people may like it better if they don't have to do that extra step.

-Kenton

Jason Paryani

unread,
Sep 2, 2013, 3:05:24 AM9/2/13
to Kenton Varda, capnproto
Ya, I'm actually leaning towards turning it on by default now, and with the '_capnp' check right at the top, it will exit fast and not mess with most imports at all. I was mostly modeling pycapnp after cython, and it's pyximport system, where it requires a pyximport.install() call to hook into the import system. That being said, cython's pyximport is much more invasive, and described as less robust, so they're in a different situation.

Kenton Varda

unread,
Sep 2, 2013, 3:14:28 AM9/2/13
to Jason Paryani, capnproto
Cool.

So I still wonder about searching /usr/[local/]include...  It seems like you either need to do that, or you need to make "/capnp/c++.capnp" available on the PYTHONPATH (and probably other languages' annotation schemas in the future), otherwise most cross-language schemas simply won't be usable from Python.

One thing I'm toying with:  the compiler could just ignore annotations defined in an import that it can't find.  And if an import isn't used for anything except annotations, just silently ignore the import altogether.  I worry about this because it would make it easy to screw up when writing annotations and then get really frustrated when you can't figure out why they aren't showing up in the schema...  but it would also be really useful.

Any thoughts?

-Kenton

Jason Paryani

unread,
Sep 2, 2013, 3:28:03 AM9/2/13
to capn...@googlegroups.com, Jason Paryani
I think I'm ok with the /usr/[local]/include problem. There's now 3 good solutions. Either add it to your PYTHONPATH, do sys.path hackery, or call capnp.add_import_hook with it as a parameter. While messing with the sys.path in a program isn't the greatest style, I'm guessing most python programmers will be familiar with it. Just in case you have no clue what I'm talking about:
import sys
sys
.path.append('/usr/local/include')
import schema_capnp
sys
.path.pop()

Kenton Varda

unread,
Sep 2, 2013, 3:43:21 AM9/2/13
to Jason Paryani, capnproto
OK, I'm happy with that.


Jason Paryani

unread,
Sep 2, 2013, 3:49:53 AM9/2/13
to capn...@googlegroups.com, Jason Paryani
Oh also worth noting, in a default sys.path, the first element in it is an empty string. This means you can make the import an absolute path in the .capnp file, and it should work. I'll add all this info to the docs in a bit.

Kenton Varda

unread,
Sep 2, 2013, 3:54:03 AM9/2/13
to Jason Paryani, capnproto
The empty string is supposed to represent the current directory, right?  You should replace it with "." in the import path you pass to SchemaParser.  Don't search the filesystem root unless people explicitly add "/" to the search path.

Jason Paryani

unread,
Sep 4, 2013, 1:59:58 AM9/4/13
to capn...@googlegroups.com
Just a quick announcement of a few nifty features added in the python wrappers:

- Andrew Lutomirski just recently added support for testing DynamicStructs concrete type through `isinstance`

- Andrew also added to_bytes/from_bytes, that will serialize a Cap'n Proto message to/from a byte string. to_bytes_packed/from_bytes_packed coming soon.

- I've recently added to_dict/from_dict methods, which will serialize to/from a python dictionary.

Kenton Varda

unread,
Sep 4, 2013, 1:17:33 PM9/4/13
to Jason Paryani, capnproto
On Tue, Sep 3, 2013 at 10:59 PM, Jason Paryani <jpar...@gmail.com> wrote:
Just a quick announcement of a few nifty features added in the python wrappers:

- Andrew Lutomirski just recently added support for testing DynamicStructs concrete type through `isinstance`

Nice.
 
- Andrew also added to_bytes/from_bytes, that will serialize a Cap'n Proto message to/from a byte string.

Sweet, looks like it uses the correct parts of the C++ API.  Though I wonder if Python byte strings are guaranteed to be aligned?
 
to_bytes_packed/from_bytes_packed coming soon.

Maybe to_packed_bytes / from_packed_bytes would be more intuitive?
 
- I've recently added to_dict/from_dict methods, which will serialize to/from a python dictionary.

Segfaults for me after a long pause, maybe because I tried it on a recursive type.  You should use DynamicStruct::has to check for null pointers and avoid following them.  (Perhaps "has" should be exposed publicly as well.)

Andrew Lutomirski

unread,
Sep 4, 2013, 1:28:38 PM9/4/13
to Kenton Varda, Jason Paryani, capnproto
On Wed, Sep 4, 2013 at 10:17 AM, Kenton Varda <temp...@gmail.com> wrote:
>
>
>
> On Tue, Sep 3, 2013 at 10:59 PM, Jason Paryani <jpar...@gmail.com> wrote:
>>
>> Just a quick announcement of a few nifty features added in the python
>> wrappers:
>>
>> - Andrew Lutomirski just recently added support for testing DynamicStructs
>> concrete type through `isinstance`
>
>
> Nice.
>
>>
>> - Andrew also added to_bytes/from_bytes, that will serialize a Cap'n Proto
>> message to/from a byte string.
>
>
> Sweet, looks like it uses the correct parts of the C++ API. Though I wonder
> if Python byte strings are guaranteed to be aligned?

Well, crap. If a = 'abcdefghi', then a[1:] is almost certainly not
aligned. I won't fix it yet because...

>
>>
>> to_bytes_packed/from_bytes_packed coming soon.
>
>
> Maybe to_packed_bytes / from_packed_bytes would be more intuitive?


...this will probably have to add code to create a Reader backed by a
temporary buffer, and the same code could be reused to fix the
alignment of an unaligned input buffer.

For extra awesomeness, the to/from fd stuff could accept any Python
file-like object. That will be annoying to implement because Cython
is IMO a rather unpleasant language to do fancy things with C++ in.


--Andy

Jason Paryani

unread,
Sep 4, 2013, 2:33:54 PM9/4/13
to capn...@googlegroups.com, Jason Paryani

Segfaults for me after a long pause, maybe because I tried it on a recursive type.  You should use DynamicStruct::has to check for null pointers and avoid following them.  (Perhaps "has" should be exposed publicly as well.)

Alright fixed. There is a '_has' function exposed at the moment, which I'll probably rename to 'has' in the next release to more clearly be public. The weird part of that function is that it's not quite like you'd expect, in that it throws an exception if the field is not a valid field from the struct definition, and false if the field is uninitialized. I'll write up a bit in the docs on how it should be used, and how it differs from the `hasattr` built-in.
Reply all
Reply to author
Forward
0 new messages