Experience on Wrapping a Library using CFFI [Long]

127 views
Skip to first unread message

Marco Gario

unread,
Sep 28, 2016, 4:00:32 AM9/28/16
to python-cffi
Hi,

In pySMT [1] we talk with different libraries (e.g. MathSAT) using
C-level APIs. We are considering whether to push for a wider support
for CFFI, mostly due to the possibility of using pypy.

I experimented by converting the existing wrapper of MathSAT (that
uses SWIG) into a CFFI in-line ABI wrapper [2]. The experience was
overall positive, but I have some questions/feedback.


Extending the CFFI Lib object
-----------------------------

In the MathSAT API there are some macros that check some very simple
condition (e.g., whether a field is NULL). These macros are not
interpreted by CFFI, and I wanted to extend the library to mimick
their existence by re-implementing them in Python. Given the library
object (obtained via dlopen) I wanted to add a method to it
(MSAT_ERROR_TERM). From the user perspective, he should not know
whether the method is a method defined in the library, or a Python
method: the user should simply be able to call it (e.g.,
mathsat.MSAT_ERROR_TERM()).

I did not find a nice way to add these methods. Eventually, I had to
create a tiny wrapper around the CFFI library object (MathSATWrapper
Line 63 of [2]) that implements these methods (Line 94 of [2]).

This wrapper is also useful to perform some simple conversions that
are needed to provide a more Pythonic interface (e.g., handle return
value from C-references -- Line 111 of [2]).

I am wondering if there is any suggested/better way to achieve this
result.


Default char* encoding for Python 3
-----------------------------------

Many methods that take a char* in input require some annoying
boiler-plate in Python 3 (Line 240 of [2]). I am wondering if it would
be possible to define a default encoding for str to char* in Python
3. This would be an option of the FFIBuilder that defaults to the
current behavior if unset, but if set to an encoding (e.g., ascii),
then it assumes that all conversions from str to char* are to be done
via ascii encoding.


ABI vs API
----------

My understanding of the main criticism about ABI usage, is its
limitations in dealing with positioning of fields in
structs. However, most of the APIs that we use have very simple
structs, and mostly operate through opaque objects. Are there
other cases that can lead to problems?

The *huge* benefit that I see on using the ABI is that we can
have literally the same code work for Python 3/Python 2, CPython/
PyPy, and OSx/Linux/Win. I've tried with a different wrapper [3],
and I was very excited to see the same code work out-of-the-box
on OSx (I didn't try it on Win).

Where can I find a more detailed discussion of the problems with ABI?


Wrap a return value type into a more complex Python object
----------------------------------------------------------

Many functions of the API of MathSAT return an msat_term struct. There
is a method to serialize this object (msat_term_repr). In the SWIG
wrapper, the msat_term is a Python object that implements __str__ by
calling the function msat_term_repr.

I can clearly create a corresponding class for the CFFI
wrapper. However, I would need to make sure that every function call
that returns an msat_term, wraps the results in my custom python
object. In SWIG terminology, this would be a typemap.

Is there a simple way to achieve this in CFFI? Or is this outside the
scope of CFFI?


Direct .h parsing and versioning information
--------------------------------------------

It would be nice to be able to deal with the original .h file, instead
of having to copy/paste the relevant information.

I think that using the original .h provides at least 2 benefits.

First, it makes it simple for users to understand which version of the
library you are using, whether all methods are wrapped etc. From a
developer point of view, it also simplifies the process of "updating"
the wrapper to a newer version of the the library (since a diff of the
.h immediately highlights the changes in the release).

Second, a lot of information like comments is lost in the process,
while it could be particularly useful to maintain/copy/embedd it in 
the wrapped library.

This is a minor point, but I was wondering if there are ongoing
efforts to build some utilities (maybe on top of cffi, and not
necessarily within) to deal with better parsing of original .h files.



Thank you for the great work!

Cheers,
Marco




Daniel Holth

unread,
Sep 28, 2016, 11:46:19 AM9/28/16
to python-cffi
I have a hobby project pysdl2-cffi, it has a (hacky) builder.py to automatically generate Python code that implements wrapping rules based on the SDL2 library's conventions. It parses the header files with pycparser, same parser used by cffi. The most special thing about pysdl2-cffi's build chain is that it parses SDL2's own per-function documentation comments and translates them into Python docstrings.


There exist much more elegant wrapper generators built on top of cffi, don't recall the names.

An important part of the wrapper generator is that it renders the unspoken conventions of the SDL2 library into Python (first method parameter is usually "self", ownership rules, when a pointer is an "in" or "out" parameter, whether NULL is allowed) and so this generator would not work with some other C library.

Then the SDL2 API is exposed as generated Python modules that wrap the cffi binding, so the user never deals with the ffi object directly. Constants and pure-Python macros can be added to that module with no trouble.

Like you, I've also wrapped picosat just for fun, I took the approach of exposing it as a class, iterating through all properties of the ffi object, and assigning them as methods on the class based on a few simple (and library specific) rules. https://bitbucket.org/dholth/picosat-cffi/src/tip/picosat/__init__.py

Armin Rigo

unread,
Nov 21, 2016, 4:37:37 AM11/21/16
to pytho...@googlegroups.com
Hi Marco,

Sorry for not answering you until now.


On 28 September 2016 at 10:00, Marco Gario <marco...@gmail.com> wrote:
> ABI vs API
> ----------
>
> My understanding of the main criticism about ABI usage, is its
> limitations in dealing with positioning of fields in
> structs. However, most of the APIs that we use have very simple
> structs, and mostly operate through opaque objects. Are there
> other cases that can lead to problems?
>
> The *huge* benefit that I see on using the ABI is that we can
> have literally the same code work for Python 3/Python 2, CPython/
> PyPy, and OSx/Linux/Win. I've tried with a different wrapper [3],
> and I was very excited to see the same code work out-of-the-box
> on OSx (I didn't try it on Win).
>
> Where can I find a more detailed discussion of the problems with ABI?

See http://cffi.readthedocs.io/en/latest/overview.html#abi-versus-api .

I'm surprized that you see the benefit of ABI to be portability. It
is actually the other way around: the API mode is more portable. You
have to use a C compiler, but that's the only drawback (which exists
mainly on Windows). (Note for example that Python 3 added a "stable
ABI" which means you don't even need to recompile when upgrading the
version of CPython.)

The benefits of the API mode are that it is generally much safer
because it works at the level of C, instead of at the level of the
machine's ABI. This fact is the main reason for why I generally push
for API mode instead of ABI mode.

But, more generally, this is also an answer to some of your questions.
Let's start with the .h file containing lots of small macros doing
some extra checks. If you use the ABI mode you can't use the macros
at all. In the API mode, you can use them as if they were functions.
You don't need to dig inside the .h file to find out what are the real
functions invoked by these macros.

That's also a reason for portability: the authors of the library can
change what is a function and what is a macro, and some internal
details in the .h file, without you needing to adapt---as long as the
changes are done in a way that should be invisible to typical C
programs using the header.

That also gives my answer to the question of copy-pasting the whole
.h. Yes, there are various tools for various use cases that exist
outside the scope of CFFI, but my answer is that there is no general
way to do that. In your case, for example, it wouldn't work when you
have macros in (this version of) the .h file. For that case you
really have to write in the cdef a line that looks like a function
declaration, and this exact line is not in the .h. More precisely, I
mean that if the .h contains something like:

#define foo(a, b, c) ((a) == NULL ? -1 : _internal_foo(a, b, c))
int _internal_foo(T *a, int b, long c);

Then what you'd like to have in API mode is:

cdef("""
int foo(T *a, int b, long c);
""")

Finally, as pointed out by Daniel Holth, usually the CFFI wrapper part
should be hidden from the rest of the Python program. For example, if
you have a lot of opaque pointers, then ideally you shouldn't pass
around cdata pointer objects through the rest of the Python program,
but only wrappers in the form of instances of a class. For example,
if your .h has these lines:

struct foo *create_foo(void);
int length_of_foo(structr foo *);
void destroy_foo(struct foo *);

Then you'd write this in your Python library:

class Foo(object):
def __init__(self):
h = lib.create_foo()
if h == ffi.NULL:
raise Exception("oups")
self._handle = ffi.gc(h, lib.destroy_foo)

def length(self):
return lib.length_of_foo(self._handle)

Doing that has the advantage of giving a library that feels a bit more
Pythonic; but also, it lets you more easily hide repetitive behavior,
or tweak it later. In the example above, Foo hides the destruction of
the object. You can also transform the result of functions at this
point:

class Foo(object):
def call_that_returns_msat_term(self, arg):
return _msat_term(lib.call_that_returns_msat_term(self._handle, arg))

def _msat_term(cdata):
return (cdata.foo, cdata.bar) # or an instance with a
__repr__, or anything Python-like

Or, as you mention unicode/bytes:

def stuff_with_text(self, mytext):
# e.g. in this library, ascii encoding is fine
return lib.stuff_with_text(self._handle, mytext.encode('ascii'))

Yes, what I'm describing is more work than a bare dlopen() link: you
need to look at the documentation (or reverse-engineer from the
header) to find a proper C signature for something that may be a
macro; and you need to write three lines of Python code for every C
function. This process is not automatic, but you get a much more
Python-friendly wrapper in the end.

In order to automate this process, you could try to look around: there
are some CFFI wrapper generators that can guess what should be written
from looking at the header, and write automatically the boilerplate I
described above---in specific cases. They always have to assume a lot
of things and certainly won't work when given a random header. It's
likely not worth it unless the C API that you need to expose has
several hundred functions and you can't go with an
"add-them-as-I-need-them" approach. If these conditions are not met,
then you'll spend much more time tweaking the binding generator than
just writing directly N times 3 lines of code...


A bientôt,

Armin.
Reply all
Reply to author
Forward
0 new messages