Accessing Cython functions/variables from C

56 views
Skip to first unread message

Jeroen Demeyer

unread,
Sep 27, 2017, 7:59:34 AM9/27/17
to cython-users
Hello,

I'm writing about a problem which came up several times for me while
writing Cython code, in particular with my cysignals package. The issue
is variables (or functions, but I'll focus on variables) which are
defined by Cython and should be accessed from C code which is compiled
with the same module.

I'm talking about

cdef int myvar
cdef extern from "helper.c":
some_function_which_needs_myvar()

where some_function_which_needs_myvar() is a function (or macro, or
something more complicated) implemented in helper.c which needs access
to "myvar". This is in a .pxd file, so "myvar" is actually imported by
Cython at runtime. This .pxd file would be installed somewhere in
site_packages, such that other Cython packages can use it.

Moreover, I want to do this with good performance: I need this in
performance-critical functions. I don't want to make user code more
complicated (no NumPy-style import_array()). Finally, I want to support
both C and C++.

As far as I know, there is no good way to do this currently in Cython. I
have proposed two pull requests which could help implementing this, but
both have been rejected:

1. https://github.com/cython/cython/pull/483

2. https://github.com/cython/cython/pull/1650

It seems that these pull requests were rejected because I wasn't doing
things the proper way and there are better ways to solve my problem. But
so far, I haven't seen a single actual working example which solves this
problem. It would be good to have an "officially sanctioned" solution to
this problem, in the sense that it could be documented and added to the
Cython testsuite.

Any ideas?


Thanks,
Jeroen.

Stefan Behnel

unread,
Sep 27, 2017, 9:06:04 AM9/27/17
to cython...@googlegroups.com
Hi Jeroen!

Jeroen Demeyer schrieb am 27.09.2017 um 13:59:
> I'm writing about a problem which came up several times for me while
> writing Cython code, in particular with my cysignals package. The issue is
> variables (or functions, but I'll focus on variables) which are defined by
> Cython and should be accessed from C code which is compiled with the same
> module.
>
> I'm talking about
>
> cdef int myvar
> cdef extern from "helper.c":
>     some_function_which_needs_myvar()
>
> where some_function_which_needs_myvar() is a function (or macro, or
> something more complicated) implemented in helper.c which needs access to
> "myvar". This is in a .pxd file, so "myvar" is actually imported by Cython
> at runtime. This .pxd file would be installed somewhere in site_packages,
> such that other Cython packages can use it.
>
> Moreover, I want to do this with good performance: I need this in
> performance-critical functions. I don't want to make user code more
> complicated (no NumPy-style import_array()). Finally, I want to support
> both C and C++.

I can see at least two possible approaches.

You could pass a pointer to "myvar" explicitly through some registry
function and have the helper code remember it. It's basically an initial
global configuration step, not uncommon with C libraries. They often use
some predefined struct type for that.

You could also declare "myvar" as "public" and link against "helper.o".
That would allow both modules to use their respective symbols. If helper.c
comes from an external package, however, which seems to be the case for
cysignals, that would require the package to know about the user code, and
that's a bad design. Dependencies should always be unidirectional.

I think I would go for the first solution. Maybe with an exported inline
function in a .pxd file that does all the necessary initialisation in one
call, and receives the (static) setup struct as argument.

Stefan

Jeroen Demeyer

unread,
Sep 27, 2017, 9:26:01 AM9/27/17
to cython...@googlegroups.com
On 2017-09-27 15:06, Stefan Behnel wrote:
> You could pass a pointer to "myvar" explicitly through some registry
> function and have the helper code remember it. It's basically an initial
> global configuration step, not uncommon with C libraries. They often use
> some predefined struct type for that.

So, that's the "user code needs to change" option, analogous to NumPy C
API functions requiring import_array(). I don't find this acceptable.

It's especially unacceptable because the dependency on my .pxd file
could be transitive (user package X cimports package Y which cimports Z
which is the problematic package. So then X would need to initialize
something related to Z even if it's not directly using Z).

One way to solve this in Cython would be allow defining extra
initialization steps in a .pxd file. Allow putting code in a .pxd file
which will be executed when a module using that .pxd file is
initialized. This is by the way what motivated PR #483 since .pxi files
*can* include code.

> You could also declare "myvar" as "public" and link against "helper.o".

This isn't a complete solution: how do you do the 'link against
"helper.o"' part using only stuff that you can put in a .pxd file?

Second, I find it an ugly solution because it requires an additional
file (you need a helper.h file with declarations for "cdef extern from
'helper.h'"). There is also a potential issue of performance since
functions can't be inline. Although, as you mentioned on #1654, this
might be solved with LTO.

> If helper.c
> comes from an external package, however, which seems to be the case for
> cysignals, that would require the package to know about the user code, and
> that's a bad design. Dependencies should always be unidirectional.

I think you are understanding this wrong. Both the .pxd file and the
helper.c file are from the same package. I don't consider the
relationship between the .pxd file and the .c file as a dependency, it's
more like one thing which needs to be split in two pieces for technical
reasons.

The point is that the .pxd and .c files should be able to be *used*
(really: cimported) from a different package.

Stefan Behnel

unread,
Sep 27, 2017, 9:47:35 AM9/27/17
to cython...@googlegroups.com
Jeroen Demeyer schrieb am 27.09.2017 um 15:25:
> On 2017-09-27 15:06, Stefan Behnel wrote:
>> You could pass a pointer to "myvar" explicitly through some registry
>> function and have the helper code remember it. It's basically an initial
>> global configuration step, not uncommon with C libraries. They often use
>> some predefined struct type for that.
>
> So, that's the "user code needs to change" option, analogous to NumPy C API
> functions requiring import_array(). I don't find this acceptable.

If the alternative is "magic", then I personally prefer "simplicity". ;)


> It's especially unacceptable because the dependency on my .pxd file could
> be transitive (user package X cimports package Y which cimports Z which is
> the problematic package. So then X would need to initialize something
> related to Z even if it's not directly using Z).

Not if the packages provide actual modules that get imported transitively.
Then the initialisation would happen correctly at module init time of the
intermediate module, and X wouldn't have to know about it at all.


> The point is that the .pxd and .c files should be able to be *used*
> (really: cimported) from a different package.

What is the reason why you cannot dump the implementation of your package
into a normal extension module, and pass around normal pointers?

Stefan

Jeroen Demeyer

unread,
Sep 27, 2017, 4:14:07 PM9/27/17
to cython...@googlegroups.com
On 2017-09-27 15:47, Stefan Behnel wrote:
> Jeroen Demeyer schrieb am 27.09.2017 um 15:25:
>> On 2017-09-27 15:06, Stefan Behnel wrote:
>>> You could pass a pointer to "myvar" explicitly through some registry
>>> function and have the helper code remember it. It's basically an initial
>>> global configuration step, not uncommon with C libraries. They often use
>>> some predefined struct type for that.
>>
>> So, that's the "user code needs to change" option, analogous to NumPy C API
>> functions requiring import_array(). I don't find this acceptable.
>
> If the alternative is "magic", then I personally prefer "simplicity". ;)

The "simplicity" should be on the user side and the "magic" on the
implementation side. The whole point of Cython is that it has lots of
*magic* to make it *simple* for others to write Python C extensions. I'm
arguing for just a little bit more magic to solve my use case. And the
rejected PRs #483 and #1654 aren't even that magical, they are actually
pretty simple conceptually.

> Not if the packages provide actual modules that get imported transitively.
> Then the initialisation would happen correctly at module init time of the
> intermediate module, and X wouldn't have to know about it at all.

As I mentioned many times, this is all happening in .pxd files, which do
get cimported transitively.

>> The point is that the .pxd and .c files should be able to be *used*
>> (really: cimported) from a different package.
>
> What is the reason why you cannot dump the implementation of your package
> into a normal extension module, and pass around normal pointers?

First of all, this question is besides the point. I think that accessing
Cython-defined variables from C is a valid use case which doesn't need
further justification.

But since you asked: cysignals specifically calls setjmp(). Calls to
setjmp() must be done without introducing an extra stack frame. So in
practice this means that it must be called from a macro, and it cannot
be in an external module.

Stefan Behnel

unread,
Sep 27, 2017, 4:31:39 PM9/27/17
to cython...@googlegroups.com
Jeroen Demeyer schrieb am 27.09.2017 um 22:14:
> cysignals specifically calls setjmp(). Calls to
> setjmp() must be done without introducing an extra stack frame. So in
> practice this means that it must be called from a macro, and it cannot be
> in an external module.

I think that gets us back to the point where it would be nice to add
language support for setjmp() to Cython. Similar to how we map C++
exceptions to Python exceptions, or so. Otherwise, it seems difficult to
write non-leaky code in Cython that uses setjmp().

Stefan

Robert Bradshaw

unread,
Sep 28, 2017, 8:03:45 PM9/28/17
to cython...@googlegroups.com
First of all, thanks, Jeroen, for your patience with me especially in
pursuing this. Cython code that depends on C code that depends on
Cython code doesn't isn't possible to do cleanly right now.

However, I do really like your implicit late includes idea. It places
includes in the correct spot based on the intrinsic requirements of
the included file (or, at least, what we've told Cython about it). I
haven't looked at the code yet, but I'm not too worried about that.
Excellent proposal!

- Robert


(As an aside, it would be good to handle signals (and C++ exceptions
for that matter) in a builtin way with try..except..finally syntax,
but no one's working on that right now.)
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "cython-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cython-users...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Stefan Behnel

unread,
Sep 29, 2017, 1:47:56 AM9/29/17
to cython...@googlegroups.com
Am 29. September 2017 02:03:12 MESZ schrieb Robert Bradshaw:
>(As an aside, it would be good to handle signals (and C++ exceptions
>for that matter) in a builtin way with try..except..finally syntax,
>but no one's working on that right now.)

Regarding this bit, wouldn't it be enough to declare external C functions that call longjmp() with "except longjmp", let Cython issue a setjmp() for them, and raise a "LongjmpIssued" Python exception for it? That seems trivial to implement, but I can't say if it catches all use cases.

Stefan

Robert Bradshaw

unread,
Sep 29, 2017, 2:01:42 AM9/29/17
to cython...@googlegroups.com
This would let one catch (hijack) longjumps, but I'm not sure that's
what one would want.

The end goal is to catch (and handle) signals which is a better fit
for try...except. E.g.

try:
...
except cython.signal.SIGINT:
# would set up a setjmp at the opening of the try block with
# a signal handler that longjmps to it which triggers a goto the except block
...
finally:
# catches all signals?

Stefan Behnel

unread,
Sep 29, 2017, 2:43:59 AM9/29/17
to cython...@googlegroups.com
Robert Bradshaw schrieb am 29.09.2017 um 08:01:
> On Thu, Sep 28, 2017 at 10:47 PM, Stefan Behnel wrote:
>> Am 29. September 2017 02:03:12 MESZ schrieb Robert Bradshaw:
>>> (As an aside, it would be good to handle signals (and C++ exceptions
>>> for that matter) in a builtin way with try..except..finally syntax,
>>> but no one's working on that right now.)
>>
>> Regarding this bit, wouldn't it be enough to declare external C functions that call longjmp() with "except longjmp", let Cython issue a setjmp() for them, and raise a "LongjmpIssued" Python exception for it? That seems trivial to implement, but I can't say if it catches all use cases.
>
> This would let one catch (hijack) longjumps, but I'm not sure that's
> what one would want.

It would probably be good enough for talking to C libraries that use
longjmp for error handling, such as Lua.


> The end goal is to catch (and handle) signals which is a better fit
> for try...except. E.g.
>
> try:
> ...
> except cython.signal.SIGINT:
> # would set up a setjmp at the opening of the try block with
> # a signal handler that longjmps to it which triggers a goto the except block
> ...
> finally:
> # catches all signals?

I can't see how that should work. How should this interact with threading
or subinterpreters?

Signals can occur entirely apart from regular execution. It seems insane to
jump back into the normal code path directly from a signal handler. The
fact that CPython calls functions from signal handlers is not due to a lack
of goto, it's actually a good design.

Stefan

Robert Bradshaw

unread,
Sep 29, 2017, 4:00:39 AM9/29/17
to cython...@googlegroups.com

Stefan Behnel

unread,
Sep 29, 2017, 5:38:51 AM9/29/17
to cython...@googlegroups.com
Robert Bradshaw schrieb am 29.09.2017 um 10:00:
I read through the code a little. Correct me if I'm wrong, but it's
essentially a replacement for PyErr_CheckSignals(), with the intention to
be faster because it uses inlined code, shared flags and counters. Looking
at the source of PyErr_CheckSignals() showed that it already uses a
(private) global atomic interrupt flag "is_tripped" to speed things up
here. The main difference is the C call overhead and the atomic flag access
in CPython. I trust Jeroen that his benchmarks have shown that that's still
too high in some cases, probably in long running tight loops.

cysignals also seems to replace all signal handlers, which suggests to me
that it can't be integrated into Cython as a general feature. It's
something that can be done at an application level, but not at a level as
low as a programming language that users commonly write libraries with.
That, IMHO, puts it out of scope for integration into Cython, even though I
understand that doing something fast requires intercepting the signals.

Regarding Cython support for signals, I'm really not convinced that
try-except is the right construct for signal handling, simply because they
are not linked to the code currently being executed.

The main problem that Cython has is that it's not clear when and how often
to check for signals. CPython has a complete infrastructure for signal
handling and uses a reasonable heuristic for signal checking based on
runtime byte code execution. That is much better than what Cython could do,
because a static compiler cannot really estimate how much "time" has passed
since the last check or if a loop is going to execute long enough to merit
injecting a signal check into its body.

Thus, it's actually most efficient to let users sprinkle their code with
explicit signal checks, but that is also easy to forget. We could make that
a tiny bit less easy by providing a Cython global "check_signals()" as an
alias that is "just there", but I don't see much we could (or should) do on
top of that. CPython pretty much does things right, and I can fully
understand it if they do not want to expose the internals of
PyErr_CheckSignals().

Stefan

Robert Bradshaw

unread,
Sep 29, 2017, 8:27:26 AM9/29/17
to cython...@googlegroups.com
I was mostly thinking about sig_on/sig_off. A more appropriate
construct would be a with statement (perhaps taking a condition that
would turn it into a no-op if false). language support could also add
overloading (aka optional arguments) rather than requiring, say,
sig_str for sig_on-with-parameter.

Stefan Behnel

unread,
Sep 29, 2017, 9:01:30 AM9/29/17
to cython...@googlegroups.com
Robert Bradshaw schrieb am 29.09.2017 um 14:26:
I agree that a context manager is an appropriate construct. It provides a
well-defined and safe scope for replacing the signal handlers, and it's
even local to a function. That allows to use a (volatile) "local" variable
to flag that a signal has occurred, which is very fast to check. And it
screams for language support.

One problem, safely replacing the signal handlers around thread context
switches (nogil, Python code execution, etc.) will be difficult, or
basically any temporary signal setup in multithreaded environments. Any
idea how to handle this? How does cysignals deal with this?

Stefan

Jeroen Demeyer

unread,
Sep 29, 2017, 10:03:04 AM9/29/17
to cython...@googlegroups.com
On 2017-09-29 15:01, Stefan Behnel wrote:
> One problem, safely replacing the signal handlers around thread context
> switches (nogil, Python code execution, etc.) will be difficult, or
> basically any temporary signal setup in multithreaded environments. Any
> idea how to handle this? How does cysignals deal with this?

It actually doesn't handle this. It's a long-standing cysignals issue
that it doesn't handle threads. I have some ideas to make it work, but
it's a classical "pick 2 out of 3": speed, correctness, portability.

Jeroen Demeyer

unread,
Sep 29, 2017, 10:11:08 AM9/29/17
to cython...@googlegroups.com
On 2017-09-29 11:38, Stefan Behnel wrote:
> I read through the code a little. Correct me if I'm wrong, but it's
> essentially a replacement for PyErr_CheckSignals(), with the intention to
> be faster because it uses inlined code, shared flags and counters. Looking
> at the source of PyErr_CheckSignals() showed that it already uses a
> (private) global atomic interrupt flag "is_tripped" to speed things up
> here. The main difference is the C call overhead and the atomic flag access
> in CPython. I trust Jeroen that his benchmarks have shown that that's still
> too high in some cases, probably in long running tight loops.

cysignals has two ways to deal with signals: there is sig_check() which
is essentially analogous to PyErr_CheckSignals() but there is also
sig_on()/sig_off() which uses a setjmp()/longjmp() mechanism.

> cysignals also seems to replace all signal handlers, which suggests to me
> that it can't be integrated into Cython as a general feature.

Well, that's not a defining feature of cysignals. Remember that this was
initially designed for SageMath, where this made sense. Now that it is a
separate project, I also think that automatically replacing the signal
handlers is not appropriate. Maybe that is something to change in
cysignals 2.0.

> The main problem that Cython has is that it's not clear when and how often
> to check for signals.

In many cases, the fact that Cython does *NOT* check for signals is
actually a feature! It makes it possible to write signal-safe code
because you know that your code cannot arbitrarily be interrupted by a
signal.

My favorite example is this one:
try:
result = os.fork() # or any other os() call
except KeyboardInterrupt:
# Did the fork() succeed or not? There is no way to find out!

Jeroen Demeyer

unread,
Sep 29, 2017, 10:16:27 AM9/29/17
to cython...@googlegroups.com
On 2017-09-29 14:26, Robert Bradshaw wrote:
> I was mostly thinking about sig_on/sig_off. A more appropriate
> construct would be a with statement

I agree here. The thing with try/except is that it only makes sense if
you typically want to catch the exception. But in most cases, you just
want the KeyboardInterrupt to be raised.

In case it's not obvious: the "with signals" (or whatever you want to
call it) wouldn't be a normal Python with statement. It would require
specific Cython language support, analogous to "with gil".
Reply all
Reply to author
Forward
0 new messages