struggling with importing and packaging cffi-generated modules

127 views
Skip to first unread message

_k...@yahoo.com

unread,
Apr 14, 2013, 4:35:02 AM4/14/13
to pytho...@googlegroups.com
Hi group!

I've wrapped a large library using cffi, and I've added a fair amount of C declarations via cdef() and C code via verify(). The declarations and definitions weigh in at about 3000 lines of C. Everything is running nicely so far, but the load times are quite long. On my T2...@1.6GHz loading the module takes some 1.9 seconds (compared to ca. 75 miliseconds to load numpypy). I have the suspicion that the module, when imported, actually looks at all the C code, generates a hash from it all and then looks in __pycache__ to see if it can find an already compiled .so file for this specific hash.

If this is the case, is there a shortcut? I'd like to be able to tell cffi to simply go ahead and load the .so from the cache. I'd also like to make a distributable version which packages the readymade .so and loads that without looking at the C code first. It's not that I want to withhold the C code, I just don't want it to be unnecessarily scanned more than once.

From my questions you may infer that my knowledge of the inner workings of cffi and distutils is meagre - I could do with a bit of help, beyond what's in http://cffi.readthedocs.org/en/release-0.6/ which is a bit too terse for my taste. In return I'd try and present the world with a cffi wrapper for libeinspline, which isn't what you'd call popular, but some may think it a gem :)

Kay

Armin Rigo

unread,
Apr 14, 2013, 4:58:29 AM4/14/13
to pytho...@googlegroups.com
Hi,

On Sun, Apr 14, 2013 at 10:35 AM, <_k...@yahoo.com> wrote:
> my taste. In return I'd try and present the world with a cffi wrapper for
> libeinspline, which isn't what you'd call popular, but some may think it a
> gem :)

Cool ! Thanks for mentioning it here.

> I've wrapped a large library using cffi, and I've added a fair amount of C
> declarations via cdef() and C code via verify(). The declarations and
> definitions weigh in at about 3000 lines of C. Everything is running nicely
> so far, but the load times are quite long. On my T2...@1.6GHz loading the
> module takes some 1.9 seconds (compared to ca. 75 miliseconds to load
> numpypy).

We need to fully parse the C code every time in order to build the
"ctype" objects --- including the ctype of all functions. It's a
known issue. One way to speed it up would be to include in the .so
some fast pickle-like data dump out of which we can reload the ctypes.

I think that at least for large projects the approach should be: write
the cdef and verify in its separate Python module, which is not
imported at all in normal runs of the program; the program would only
say: ffi = FFI("my_cffi_lib.so"). This would be a clean split of the
"compile-time" and the "run-time".


A bientôt,

Armin.

_k...@yahoo.com

unread,
Apr 15, 2013, 3:43:29 AM4/15/13
to pytho...@googlegroups.com, ar...@tunes.org


Am Sonntag, 14. April 2013 10:58:29 UTC+2 schrieb Armin Rigo:

We need to fully parse the C code every time in order to build the
"ctype" objects --- including the ctype of all functions.  It's a
known issue.  One way to speed it up would be to include in the .so
some fast pickle-like data dump out of which we can reload the ctypes.

For the interface, parsing declarations should be sufficient. What's passed to verify shouldn't need to be parsed again. Is that so? And is the hash over the cdefs only, or also over the code passed to verify?

I was happy initially to be able to actually define functions in the code passed to verify; it allows me to put a layer of code on top of the library without modifying anything inside the library. But I fear this approach might backfire now and punsih me with long load times.
 
I think that at least for large projects the approach should be: write
the cdef and verify in its separate Python module, which is not
imported at all in normal runs of the program; the program would only
say: ffi = FFI("my_cffi_lib.so").  This would be a clean split of the
"compile-time" and the "run-time".

Do you mean that this is actually an option for me right now, or is it something that you anticipate to become possible at some point in the future?
 
Kay

Armin Rigo

unread,
May 7, 2013, 4:05:06 AM5/7/13
to pytho...@googlegroups.com
Hi Kay,

Sorry for the delay in answering this mail:

On Mon, Apr 15, 2013 at 9:43 AM, <_k...@yahoo.com> wrote:
> Am Sonntag, 14. April 2013 10:58:29 UTC+2 schrieb Armin Rigo:
>> We need to fully parse the C code every time in order to build the
>> "ctype" objects --- including the ctype of all functions. It's a
>> known issue. One way to speed it up would be to include in the .so
>> some fast pickle-like data dump out of which we can reload the ctypes.
>
> For the interface, parsing declarations should be sufficient. What's passed
> to verify shouldn't need to be parsed again. Is that so? And is the hash
> over the cdefs only, or also over the code passed to verify?

What's passed to verify is *never* parsed: only gcc sees it. It is
hashed though, to check if something changed.

> I was happy initially to be able to actually define functions in the code
> passed to verify; it allows me to put a layer of code on top of the library
> without modifying anything inside the library. But I fear this approach
> might backfire now and punsih me with long load times.

No, that's fine.

>> I think that at least for large projects the approach should be: write
>> the cdef and verify in its separate Python module, which is not
>> imported at all in normal runs of the program; the program would only
>> say: ffi = FFI("my_cffi_lib.so"). This would be a clean split of the
>> "compile-time" and the "run-time".
>
> Do you mean that this is actually an option for me right now, or is it
> something that you anticipate to become possible at some point in the
> future?

Sorry for not being clear. This is not something that works now.
It's only an idea about the future.


A bientôt,

Armin.
Reply all
Reply to author
Forward
0 new messages