I'm trying to wrap a proprietary, closed-source C++ library using
Cython. The library essentially fails to work when used through a Cython
wrapper, but works when compiling as a standalone executable with the
same compiler flags.
I.e., I have a function "foomain" with some C++ code which attempts to
make a certain network connection. If I call foomain() from main() in a
C++ program, things work. If I call foomain() from Cython, compiled in
C++ mode, then things don't work (no connection is made).
So what are the possible differences in the runtime environment you can
think of?
The first thing I had to do was this:
import sys, ctypes
sys.setdlopenflags(sys.getdlopenflags()|ctypes.RTLD_GLOBAL)
This fixed something else (a function mysteriously returning a NULL
pointer when it shouldn't -- as I don't have the sources I don't know
why, really). But apparently it didn't fix everything.
I've tried both with and without creating a thread using the threading
module (so that a GIL is set up), and both with and without "with nogil"
around the call to foomain().
Dag Sverre
Got an idea just after writing this email (an instance of [1]).
Apparently it all works OK when linking with the static version of the
library (which, in a rare instance of being lucky, appears to be
compiled with -fPIC).
So my problem is solved, but I'm still puzzled about what the
differences in linkage could do to transparently break the behaviour of
the library.
Wrapping dynamic libraries have often been a pain in the past (i.e. when
wrapping various scientific codes like LAPACK), I wish I understood in
what ways Python loading a .so is different from a program loading a .so
(once RTLD_GLOBAL has been set).
Dag Sverre
It might be useful to trace it with strace.
> So what are the possible differences in the runtime environment you can
> think of?
>
> The first thing I had to do was this:
>
> import sys, ctypes
> sys.setdlopenflags(sys.getdlopenflags()|ctypes.RTLD_GLOBAL)
>
> This fixed something else (a function mysteriously returning a NULL pointer
> when it shouldn't -- as I don't have the sources I don't know why, really).
> But apparently it didn't fix everything.
>
> I've tried both with and without creating a thread using the threading
> module (so that a GIL is set up), and both with and without "with nogil"
> around the call to foomain().
>
>
> Dag Sverre
--
vitja.
Can you run "ldd" on the working executable and on the failing ext
module to figure out any differences on library loading order? Can you
run "nm" on your propietary library to figure out if it is invoking
"dlopen" ?
--
Lisandro Dalcin
---------------
CIMEC (INTEC/CONICET-UNL)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1011)
Tel/Fax: +54-342-4511169
There's no difference in loading order (except for the libraries loaded
extra because of Python). But 'dlopen' is indeed present in one of the
shared libraries (it's a collection of 10 or so, of which I load 5 or so).
How does dlopen behave differently under Python, once RTLD_GLOBAL is on?
I need to work a bit further with the static one before looking further
into this. Thanks for the hint!
Dag Sverre
Well, these extra libraries could be the root of your issues. Remember
that the dynamic linker do have a global namespace for symbol
resolution. You could add RTLD_DEEPBIND (it is integer 0x8 in Linux)
to mode flags to try to workaround the problem (assuming this is
actually the cause of your issues).
> But 'dlopen' is indeed present in one of the
> shared libraries (it's a collection of 10 or so, of which I load 5 or so).
>
It could be a good idea to run your working program and Python under
"strace" to figure out what exact libraries the proprietary code is
loading.
>
> How does dlopen behave differently under Python, once RTLD_GLOBAL is on?
>
As I said before, Python loads other libraries before loading yours,
that could be problematic.