Experience on Wrapping a Library using CFFI [Long]

127 views

Skip to first unread message

Marco Gario

unread,

Sep 28, 2016, 4:00:32 AM9/28/16

to python-cffi

Hi,

In pySMT [1] we talk with different libraries (e.g. MathSAT) using

C-level APIs. We are considering whether to push for a wider support

for CFFI, mostly due to the possibility of using pypy.

I experimented by converting the existing wrapper of MathSAT (that

uses SWIG) into a CFFI in-line ABI wrapper [2]. The experience was

overall positive, but I have some questions/feedback.

Extending the CFFI Lib object

-----------------------------

In the MathSAT API there are some macros that check some very simple

condition (e.g., whether a field is NULL). These macros are not

interpreted by CFFI, and I wanted to extend the library to mimick

their existence by re-implementing them in Python. Given the library

object (obtained via dlopen) I wanted to add a method to it

(MSAT_ERROR_TERM). From the user perspective, he should not know

whether the method is a method defined in the library, or a Python

method: the user should simply be able to call it (e.g.,

mathsat.MSAT_ERROR_TERM()).

I did not find a nice way to add these methods. Eventually, I had to

create a tiny wrapper around the CFFI library object (MathSATWrapper

Line 63 of [2]) that implements these methods (Line 94 of [2]).

This wrapper is also useful to perform some simple conversions that

are needed to provide a more Pythonic interface (e.g., handle return

value from C-references -- Line 111 of [2]).

I am wondering if there is any suggested/better way to achieve this

result.

Default char* encoding for Python 3

-----------------------------------

Many methods that take a char* in input require some annoying

boiler-plate in Python 3 (Line 240 of [2]). I am wondering if it would

be possible to define a default encoding for str to char* in Python

3. This would be an option of the FFIBuilder that defaults to the

current behavior if unset, but if set to an encoding (e.g., ascii),

then it assumes that all conversions from str to char* are to be done

via ascii encoding.

ABI vs API

----------

My understanding of the main criticism about ABI usage, is its

limitations in dealing with positioning of fields in

structs. However, most of the APIs that we use have very simple

structs, and mostly operate through opaque objects. Are there

Daniel Holth

unread,

Sep 28, 2016, 11:46:19 AM9/28/16

to python-cffi

I have a hobby project pysdl2-cffi, it has a (hacky) builder.py to automatically generate Python code that implements wrapping rules based on the SDL2 library's conventions. It parses the header files with pycparser, same parser used by cffi. The most special thing about pysdl2-cffi's build chain is that it parses SDL2's own per-function documentation comments and translates them into Python docstrings.

https://bitbucket.org/dholth/pysdl2-cffi/src/tip/builder/?at=default

There exist much more elegant wrapper generators built on top of cffi, don't recall the names.

An important part of the wrapper generator is that it renders the unspoken conventions of the SDL2 library into Python (first method parameter is usually "self", ownership rules, when a pointer is an "in" or "out" parameter, whether NULL is allowed) and so this generator would not work with some other C library.

Then the SDL2 API is exposed as generated Python modules that wrap the cffi binding, so the user never deals with the ffi object directly. Constants and pure-Python macros can be added to that module with no trouble.

Like you, I've also wrapped picosat just for fun, I took the approach of exposing it as a class, iterating through all properties of the ffi object, and assigning them as methods on the class based on a few simple (and library specific) rules. https://bitbucket.org/dholth/picosat-cffi/src/tip/picosat/__init__.py

Armin Rigo

unread,

Nov 21, 2016, 4:37:37 AM11/21/16

to pytho...@googlegroups.com

Hi Marco,

Sorry for not answering you until now.

On 28 September 2016 at 10:00, Marco Gario <marco...@gmail.com> wrote:
> ABI vs API
> ----------
>
> My understanding of the main criticism about ABI usage, is its
> limitations in dealing with positioning of fields in
> structs. However, most of the APIs that we use have very simple
> structs, and mostly operate through opaque objects. Are there
> other cases that can lead to problems?
>
> The *huge* benefit that I see on using the ABI is that we can
> have literally the same code work for Python 3/Python 2, CPython/
> PyPy, and OSx/Linux/Win. I've tried with a different wrapper [3],
> and I was very excited to see the same code work out-of-the-box
> on OSx (I didn't try it on Win).
>
> Where can I find a more detailed discussion of the problems with ABI?

See http://cffi.readthedocs.io/en/latest/overview.html#abi-versus-api .

I'm surprized that you see the benefit of ABI to be portability. It
is actually the other way around: the API mode is more portable. You
have to use a C compiler, but that's the only drawback (which exists
mainly on Windows). (Note for example that Python 3 added a "stable
ABI" which means you don't even need to recompile when upgrading the
version of CPython.)

The benefits of the API mode are that it is generally much safer
because it works at the level of C, instead of at the level of the
machine's ABI. This fact is the main reason for why I generally push
for API mode instead of ABI mode.

But, more generally, this is also an answer to some of your questions.
Let's start with the .h file containing lots of small macros doing
some extra checks. If you use the ABI mode you can't use the macros
at all. In the API mode, you can use them as if they were functions.
You don't need to dig inside the .h file to find out what are the real
functions invoked by these macros.

That's also a reason for portability: the authors of the library can
change what is a function and what is a macro, and some internal
details in the .h file, without you needing to adapt---as long as the
changes are done in a way that should be invisible to typical C
programs using the header.

That also gives my answer to the question of copy-pasting the whole
.h. Yes, there are various tools for various use cases that exist
outside the scope of CFFI, but my answer is that there is no general
way to do that. In your case, for example, it wouldn't work when you
have macros in (this version of) the .h file. For that case you
really have to write in the cdef a line that looks like a function
declaration, and this exact line is not in the .h. More precisely, I
mean that if the .h contains something like:

#define foo(a, b, c) ((a) == NULL ? -1 : _internal_foo(a, b, c))
int _internal_foo(T *a, int b, long c);

Then what you'd like to have in API mode is:

cdef("""
int foo(T *a, int b, long c);
""")

Finally, as pointed out by Daniel Holth, usually the CFFI wrapper part
should be hidden from the rest of the Python program. For example, if
you have a lot of opaque pointers, then ideally you shouldn't pass
around cdata pointer objects through the rest of the Python program,
but only wrappers in the form of instances of a class. For example,
if your .h has these lines:

struct foo *create_foo(void);
int length_of_foo(structr foo *);
void destroy_foo(struct foo *);

Then you'd write this in your Python library:

class Foo(object):
def __init__(self):
h = lib.create_foo()
if h == ffi.NULL:
raise Exception("oups")
self._handle = ffi.gc(h, lib.destroy_foo)

def length(self):
return lib.length_of_foo(self._handle)

Doing that has the advantage of giving a library that feels a bit more
Pythonic; but also, it lets you more easily hide repetitive behavior,
or tweak it later. In the example above, Foo hides the destruction of
the object. You can also transform the result of functions at this
point:

class Foo(object):
def call_that_returns_msat_term(self, arg):
return _msat_term(lib.call_that_returns_msat_term(self._handle, arg))

def _msat_term(cdata):
return (cdata.foo, cdata.bar) # or an instance with a
__repr__, or anything Python-like

Or, as you mention unicode/bytes:

def stuff_with_text(self, mytext):
# e.g. in this library, ascii encoding is fine
return lib.stuff_with_text(self._handle, mytext.encode('ascii'))

Yes, what I'm describing is more work than a bare dlopen() link: you
need to look at the documentation (or reverse-engineer from the
header) to find a proper C signature for something that may be a
macro; and you need to write three lines of Python code for every C
function. This process is not automatic, but you get a much more
Python-friendly wrapper in the end.

In order to automate this process, you could try to look around: there
are some CFFI wrapper generators that can guess what should be written
from looking at the header, and write automatically the boilerplate I
described above---in specific cases. They always have to assume a lot
of things and certainly won't work when given a random header. It's
likely not worth it unless the C API that you need to expose has
several hundred functions and you can't go with an
"add-them-as-I-need-them" approach. If these conditions are not met,
then you'll spend much more time tweaking the binding generator than
just writing directly N times 3 lines of code...

A bientôt,

Armin.

Reply all

Reply to author

Forward

0 new messages