Wrapper for setjmp/longjmp

154 views
Skip to first unread message

Alex Orange

unread,
Jun 20, 2015, 11:08:41 PM6/20/15
to pytho...@googlegroups.com
I'm working on a cffi wrapper for the Lua C API. I've run into a problem that's likely common to any cross-language exception handling. Basically I need to setjmp in the C environment before I call the function in question. A similar situation exists for libpng and I'm sure any other C library that uses setjmp/longjmp to handle exception like events. I imagine there might be other things that could use such a wrapper as well.

Assuming that such an automatic wrapping system doesn't exist I have two ideas of how this could work:

1. My first idea, a separate wrapper construct something like the following:

>>> wrapper = SomeWrapperThing()
>>> wrapper.call(lib.my_func, arg1, arg2)  # a wrapping of lib.my_func(arg1, arg2)

The wrapper would prepare arguments (do whatever needs to be python aware), do whatever the wrapper does before a function call, call the function call (by pointer), do whatever the wrapper does after a function call, do python work to convert results back to python.

Advantages:
You don't have to call a given function with the wrapper every time

Disadvantages:
You have to prepare all the arguments in a generic fashion (can't be compiled in unless you do NxM wrappings with N wrappers and M functions)
Have to call the function with a function pointer, not the best for optimizing out extra levels of functions

2. A somewhat cleaner idea. Some extra arguments to cdef for pre/post call wrappings which apply to all functions in that cdef. This would allow some functions to be wrapped and others not even in the same library.

Advantages:
The wrapping "configurations" are available at compile time so more optimizations are possible

Disadvantages:
Wrapped functions always have to be called with the wrapper


If you have a suggestion as to how to do this with cffi as it is, I'm looking for a solution that doesn't involve writing a new wrapper for every function and putting said wrappers in the set_source call.

Alex

Armin Rigo

unread,
Jun 21, 2015, 4:11:51 AM6/21/15
to pytho...@googlegroups.com
Hi Alex,

On 21 June 2015 at 05:08, Alex Orange <crazy...@gmail.com> wrote:
> I'm working on a cffi wrapper for the Lua C API. I've run into a problem
> that's likely common to any cross-language exception handling. Basically I
> need to setjmp in the C environment before I call the function in question.

I solved it manually in one use case: see
https://bitbucket.org/pypy/stmgc/raw/default/c8/test/support.py .
With some C macros it is not a high burden to define one C wrapper for
every function in the set_source() part. If you want it more
automated, you can instead have in Python a list of functions to
protect, and add the correct lines to the C part programmatically from
Python---this is potentially more flexible than C macros. There are
probably semi-involved solutions to write the function declarations
only once, and have them automatically put in the cdef() under a
different name and the C wrapper generated.

I think it's the simplest choice, assuming that automatically
repeating the wrapper functions over and over doesn't start to give
concerns about the total size of the produced .so. It is also the
most flexible solution, as you define the C wrappers as you need them.
I think trying to add arguments to the cdef() to ask it to wrap the
functions for you is always going to be a 75%-solution that doesn't
cover all use cases, so in the "spirit" of CFFI, better do it
explicitly instead.

(From a performance perspective, these extra small wrappers cost
nothing. If in doubt, declare them as "static inline" functions. The
point is that the C compiler should be able to inline them inside the
automatically-generated CFFI wrappers, which are generated after the
set_source() code in the same C file.)


A bientôt,

Armin.

Alex Orange

unread,
Jun 21, 2015, 2:32:55 PM6/21/15
to pytho...@googlegroups.com, ar...@tunes.org
Hi Armin,

Thanks for the reply. After looking at your solution I realize what you're saying about it being more complicated then I'd thought of. I forgot about the need to condition the execution of the function based on the return value of setjmp. I am aware of the solution you mention and it's what I'm specifically looking to avoid. I have a ton of functions and don't want to have to write a wrapper function for each one (i.e. the function declaration details and then the curly braces and the macro and the inner function call).

Given that I did not author CFFI I don't want to assert too much about the spirit of CFFI. That being said, what I find most refreshing about CFFI, compared to other Python extension methods, is that it requires the least amount (none as far as I can see) of repetitive/meaningless coding. In CFFI all I do is include the necessary headers, describe the types I plan to use, and give declarations for the functions I need to call. Writing a wrapper for each of a hundred functions that I want to wrap seems counter to this.

Upon further thinking, what I realize is that the only thing I can't do from the python side is setjmp/longjmp. Perhaps the best solution would be to register jmp_buf's with exceptions in python, CFFI would then be in charge of calling setjmp, and if setjmp returned with anything other than 0 it would use that value to create the corresponding exception and raise it. This would have the benefit of providing a nice clean way of handling exceptions, the alternative being passing in a pointer for the normal results and then returning a value that would indicate success/failure. Also, this would seem to be a 1:1 mapping of concepts, setjmp/longjmp provide alternative routes of execution and exceptions are the corresponding means of taking alternative execution paths in python. Finally, a form of exception would be provided by CFFI that would provide a means for callbacks to do longjmps. If a callback allowed such an exception to propagate "back to C", the jmp_buf enclosed in the exception would then be longjmp'd to with the return value also provided by the exception.

If you have any suggestions of how to perfect what I'm suggesting I'd love to hear them. I'm going to try to add this functionality to the CPython side of CFFI (since I've worked with CPython extensions before) to test the concept.

Alex

Armin Rigo

unread,
Jun 21, 2015, 4:23:37 PM6/21/15
to pytho...@googlegroups.com
Hi Alex,

On 21 June 2015 at 20:32, Alex Orange <crazy...@gmail.com> wrote:
> Given that I did not author CFFI I don't want to assert too much about the
> spirit of CFFI. That being said, what I find most refreshing about CFFI,
> compared to other Python extension methods, is that it requires the least
> amount (none as far as I can see) of repetitive/meaningless coding. In CFFI
> all I do is include the necessary headers, describe the types I plan to use,
> and give declarations for the functions I need to call. Writing a wrapper
> for each of a hundred functions that I want to wrap seems counter to this.

I think you missed my point: in my small example I wrote the functions
manually because there are only a few of them; but for your use case,
I described how you can automate it. It's mostly a matter of writing
Python code that itself writes the repetitive C declarations. In
effect, you'd be using Python as a more powerful preprocessor than C's
native one. You don't have to write manually the hundreds of wrappers
needed: you generate them with string manipulations.

> Upon further thinking, what I realize is that the only thing I can't do from
> the python side is setjmp/longjmp. Perhaps the best solution would be to
> register jmp_buf's with exceptions in python, CFFI would then be in charge
> of calling setjmp, and if setjmp returned with anything other than 0 it
> would use that value to create the corresponding exception and raise it.

This doesn't work: setjmp() itself is a "special function" in C that
you can't call via CFFI. Indeed, setjmp() sets up a jmp_buf that is
only valid as long as the C function *containing the call to setjmp*
did not return. So it has to be used in a C function that itself
calls more functions (and these called functions may directly or
indirectly call longjmp()). It's not possible to write some wrapper
that calls setjmp(), in CFFI or in any C program: the jmp_buf would
become immediately invalid as soon as this wrapper returns.


A bientôt,

Armin.

Alex Orange

unread,
Jun 24, 2015, 8:38:37 PM6/24/15
to pytho...@googlegroups.com, ar...@tunes.org
On Sunday, June 21, 2015 at 2:23:37 PM UTC-6, Armin Rigo wrote:
Hi Alex,

On 21 June 2015 at 20:32, Alex Orange <crazy...@gmail.com> wrote:
> Given that I did not author CFFI I don't want to assert too much about the
> spirit of CFFI. That being said, what I find most refreshing about CFFI,
> compared to other Python extension methods, is that it requires the least
> amount (none as far as I can see) of repetitive/meaningless coding. In CFFI
> all I do is include the necessary headers, describe the types I plan to use,
> and give declarations for the functions I need to call. Writing a wrapper
> for each of a hundred functions that I want to wrap seems counter to this.

I think you missed my point: in my small example I wrote the functions
manually because there are only a few of them; but for your use case,
I described how you can automate it.  It's mostly a matter of writing
Python code that itself writes the repetitive C declarations.  In
effect, you'd be using Python as a more powerful preprocessor than C's
native one.  You don't have to write manually the hundreds of wrappers
needed: you generate them with string manipulations.

If I understand you're suggesting that I write some python code that will process my C declarations and wrap the declarations or however you want to call it w/ some code that will then end up in set_source. That's something I don't want to write, unless CFFI (or some standard part of python) can parse my list of declarations and give me a list of function declarations with names and arguments in a way that I don't have to parse.
 
> Upon further thinking, what I realize is that the only thing I can't do from
> the python side is setjmp/longjmp. Perhaps the best solution would be to
> register jmp_buf's with exceptions in python, CFFI would then be in charge
> of calling setjmp, and if setjmp returned with anything other than 0 it
> would use that value to create the corresponding exception and raise it.

This doesn't work: setjmp() itself is a "special function" in C that
you can't call via CFFI.  Indeed, setjmp() sets up a jmp_buf that is
only valid as long as the C function *containing the call to setjmp*
did not return.  So it has to be used in a C function that itself
calls more functions (and these called functions may directly or
indirectly call longjmp()).  It's not possible to write some wrapper
that calls setjmp(), in CFFI or in any C program: the jmp_buf would
become immediately invalid as soon as this wrapper returns.

I may be miscommunicating my exact plan here. Let me write the code for CPython and then you can see what I mean and critique it.

A bientôt,

Armin.

Armin Rigo

unread,
Jun 26, 2015, 6:24:34 AM6/26/15
to pytho...@googlegroups.com
Hi Alex,

On 25 June 2015 at 02:38, Alex Orange <crazy...@gmail.com> wrote:
> If I understand you're suggesting that I write some python code that will
> process my C declarations and wrap the declarations or however you want to
> call it w/ some code that will then end up in set_source. That's something I
> don't want to write, unless CFFI (or some standard part of python) can parse
> my list of declarations and give me a list of function declarations with
> names and arguments in a way that I don't have to parse.

That's right. There are various ways to do that. The one I can
present in a few lines requires peeking inside the internals of CFFI:

ffi.cdef("int f1(int); long f2(long, void *);")

for key, modeltype in ffi._parser._declarations.items():
if key.startswith('function '):
funcname = key.split()[1]
args = [tp.get_c_name('a%d' % i) for i, tp in enumerate(modeltype.args)]
print args
decl = '%s(%s)' % (funcname, ', '.join(args))
print modeltype.result.get_c_name(decl)

Here, "modeltype" is of some internal class that you should normally
not see directly. It's not the first case though where I learn about
a potential reason to peek. Maybe we can discuss about having a way
to access the same information more officially.


A bientôt,

Armin.

Alex Orange

unread,
Jun 28, 2015, 3:01:10 PM6/28/15
to pytho...@googlegroups.com
Ok, that is an interesting idea. I'd have to see how it fits into the whole process because I thought you had to do the set_source before you could to the cdef.

Anyway, I implemented half of what I'm suggesting. My fork: https://bitbucket.org/CrazyCasta/cffi/commits/8cbeed427476a5cc10dfdee9734a86fb475c3412 provides a way of calling setjmp before the call and then converting setjmp's into specific exceptions if they were triggered. Here is some code to test:

(ve) $ cat test_build.py
from cffi import FFI

ffi = FFI()
ffi.set_source("setjmp",
               """
               #include <setjmp.h>
               jmp_buf my_jmp_buf;
               jmp_buf *my_jmp_buf_p = &my_jmp_buf;
               jmp_buf my_jmp_buf2;
               jmp_buf *my_jmp_buf_p2 = &my_jmp_buf2;

               void my_longjmp() {
                   longjmp(my_jmp_buf, 1);
               }
               void my_longjmp2() {
                   longjmp(my_jmp_buf2, 1);
               }
               """, extra_compile_args=["-g"])

types = """
typedef ... jmp_buf;
static const jmp_buf *my_jmp_buf_p;
static const jmp_buf *my_jmp_buf_p2;
void my_longjmp();
void my_longjmp2();
"""

ffi.cdef(types)

ffi.compile()
(ve) $ cat test_run.py
from setjmp import ffi, lib

class MyException(Exception):
    pass

class MyException2(Exception):
    pass

ffi._exception_setjmp_map = {lib.my_jmp_buf_p: MyException,
                             lib.my_jmp_buf_p2: MyException2}

print "About to call longjmp, expect more messages, if not test has failed."

try:
    lib.my_longjmp()
    print "No exception thrown, fail"
except MyException:
    print "Caught MyException, pass"
except:
    print "Caught some other exception, fail"

print "About to call longjmp2, expect more messages, if not test has failed."

try:
    lib.my_longjmp2()
    print "No exception thrown, fail"
except MyException2:
    print "Caught MyException2, pass"
except:
    print "Caught some other exception, fail"
(ve) $ python test_build.py
(ve) $ python test_run.py
About to call longjmp, expect more messages, if not test has failed.
setjmp caught
Caught MyException, pass
About to call longjmp2, expect more messages, if not test has failed.
setjmp caught
Caught MyException2, pass

Notes:
* I've only tested this with python2.7, it might well be broken with python3.x and unless pypy has artificial intelligence it won't work at all.
* This is a demonstration of the idea, I'm not suggesting that the API w/ this _exception_setjmp_map attribute be used. Also, things like where code is placed, style, etc I'm open to suggestions.
* I've only tested this on windows, I noticed some ifdefs relating to WIN32 and that would probably need to get handled.
* The code changes only somewhat affect ffi's that don't include a jmp_buf declaration and this could be extended to make it exactly the same as it previously was if there are no jmp_buf's (to avoid performance degradation if that's a concern).
* This doesn't include the throw exception in callback->longjmp in C. Hopefully you'll believe that that's at least a bit easier then handling the setjmp code.

Alex

Alex Orange

unread,
Jun 28, 2015, 3:04:46 PM6/28/15
to pytho...@googlegroups.com
Oh, and I forgot to mention. I specifically dropped the support for passing the return value of setjmp to the exception because apparently according to C99 you can't store the result of a setjmp in a variable.


On Saturday, June 20, 2015 at 9:08:41 PM UTC-6, Alex Orange wrote:

Armin Rigo

unread,
Jun 29, 2015, 3:51:20 AM6/29/15
to pytho...@googlegroups.com
Hi Alex,

On 28 June 2015 at 21:01, Alex Orange <crazy...@gmail.com> wrote:
> * This is a demonstration of the idea, I'm not suggesting that the API w/
> this _exception_setjmp_map attribute be used. Also, things like where code
> is placed, style, etc I'm open to suggestions.

Just to be clear, this looks cool but is not going to be accepted for
CFFI, as it is very much a special case. I'm more open to discussions
about introspecting what functions/constants/variables/types are
declared in a cdef(). It could be used either on the same ffi.cdef()
as in my example (yes, it can be called before or after set_source()),
or on a completely separate "ffi2=FFI();ffi2.cdef(substring)" where we
pass a substring containing only the functions we're interested in.


A bientôt,

Armin.

Alex Orange

unread,
Jun 29, 2015, 12:09:06 PM6/29/15
to pytho...@googlegroups.com, ar...@tunes.org
Hi Armin,
 
 Well, I respect your right to choose what does or doesn't go into cffi, but could you explain to me why this is different from errno? It seems that you support errno because some function that you might like to call might change the value of errno which could be a problem for python. Along with the fact that going C->python->C might invalidate the value of errno in the meanwhile. That seems to be the same problem as setjmp/longjmp. It seems that cffi is exceptionally close to supporting all of C99 without the need for wrappers in set_source. The only cases I'm aware of that require wrappers in set_source are setjmp/longjmp, variable argument callbacks and unless I'm mistaken you can't call functions with a valist argument (as opposed to a ... argument). My point being that offering all that C99 has to offer (being a bit arbitrary using C99, but it's a C spec w/ the std readily available) doesn't seem like it would add too much to cffi API while making it "complete". The only thing that would require wrappers then would be certain user defined macros given that you can do w/e you like w/ a macro and it quite conceivable to make a macro that can't be wrapped by a function call. However, not being part of the C standard, I think everyone could agree that lack of support for such is quite understandable.

Alex

Armin Rigo

unread,
Jun 29, 2015, 1:04:24 PM6/29/15
to pytho...@googlegroups.com
Hi Alex,

On 29 June 2015 at 18:09, Alex Orange <crazy...@gmail.com> wrote:
> Well, I respect your right to choose what does or doesn't go into cffi, but
> could you explain to me why this is different from errno?

Right, you have a point I suppose. One difference is that in practice
libraries that use errno, on Posix, far outnumber libraries that
require setjmp/longjmp.

Ignoring that, the issue I have is that there are several ways to do
setjmp/longjmp calls, and they require different support. In your
test you are assuming that the function must be called with the
jmp_buf pointer set in some global variable, and the library (or
callbacks) will call longjmp() with this global variable. That's one
way, but a poor one in the presence of multithreading, if I understand
correctly. Other libraries might have an interface where the "jmp_buf
*" needs to be stored in some data structure that the called function
can access, or even directly passed as a "jmp_buf *" argument. And
maybe there are even more variants.

It would be simple if we had a way to do "x = ffi.new_jmp_buf()"
directly from Python code, but that's close to impossible to do
reliably, in each of CPython and PyPy.

(...after a bit more thoughts...)

An alternative that might work generally would be with an intermediate
Python function: "ffi.call_with_jmp_buf(func)", where "func" is a
Python function which is called with the "jmp_buf *". The Python
function can then store the "jmp_buf *", or pass it around, as
needed---as long as it is clear that it will stop being a valid
pointer when that Python function returns.

Maybe it should be "ffi.call_with_jmp_buf(ExceptionClass, func)" and
raise the given exception if a longjmp() occurs; or maybe just always
raise "LongJmpOccurred".

Note that this solution is not perfect in CPython, because it will
leak reference counts...


A bientôt,

Armin.

Armin Rigo

unread,
Jun 29, 2015, 1:19:14 PM6/29/15
to pytho...@googlegroups.com
Hi again,

On 29 June 2015 at 19:03, Armin Rigo <ar...@tunes.org> wrote:
> Note that this solution is not perfect in CPython, because it will
> leak reference counts...

Ah, found a different approach: assume that we only need to pass a
*pointer* to the jmp_buf, which seems to be reasonable, and never the
actual content. In that case we can store the pointer, or prepare it as
an argument, before the actual content of the jmp_buf is initialized.
You'd do:

x = ffi.new("jmp_buf") # contents initially zero
lib.glob = x # for example, store as a global ptr
res = ffi.setjmp_and_call(x, c_func, *args)

or, if say "c_func" expects to receive the "jmp_buf *" as argument:

x = ffi.new("jmp_buf")
res = ffi.setjmp_and_call(x, c_func, x)

or if it should be in a field called "on_err" of a "struct foo" argument:

p = ffi.new("struct foo *")
# fill p...
x = ffi.new("jmp_buf")
p.on_err = x
res = ffi.setjmp_and_call(x, c_func, p)

Note in each of these examples how "x" is used in two places. We can
discuss details, like what occurs exactly after a longjmp() (is
raising LongJmpOccurred enough?). At least it is nicely isolated and
independent on the ABI-or-API mode. Does it seem like a reasonable
approach?


A bientôt,

Armin.
Reply all
Reply to author
Forward
0 new messages