Re: [cython-users] Compile multiple Cython objects into single shared library

2,778 views
Skip to first unread message

Robert Bradshaw

unread,
Oct 1, 2012, 1:19:17 PM10/1/12
to cython...@googlegroups.com
On Mon, Oct 1, 2012 at 7:37 AM, Michael F. Peintinger
<michael.p...@googlemail.com> wrote:
> I have managed to compile and link the files into a shared library with a
> makefile.
> When I run the code I get a "NameError" and the module cannot be found.
>
> I tried various import/cimport statements with and without the path. If I
> grep the .so file I even find the module
> cyaiccm.cyaiccm.CYAICCM but the import fails.
>
> Creating a master Cython file and including the other files works but that
> looks pretty messy...
>
> Is it possible to compile multiple Cython objects into one shared library
> without including them into one master C file?

Is there any specific reason you want to do this? In general, Python
expects a 1:1 correspondence between modules and .so files; this is
like asking if you could have one .py(c) file corresponding to several
different modules. (It probably could be done via some import hacking,
but I don't know that it'd be very nice...)

- Robert

Bradley Froehle

unread,
Oct 1, 2012, 1:23:48 PM10/1/12
to cython...@googlegroups.com
I assume you mean, "can multiple modules live in one shared library."  The answer to that question is no.  It's not so much a limitation of Cython as it is a limitation of Python's extension mechanism.  In general, when you run "import PACKAGE.MODULE" Python looks for PACKAGE/MODULE.so, dynamically loads the object (dlopen) and runs the function "initMODULE".

In theory you could have several such init function in one library, but in practice there isn't a reliable way to tell Python about this fact.

-Brad

Gabriel Jacobo

unread,
Oct 1, 2012, 1:37:36 PM10/1/12
to cython...@googlegroups.com
2012/10/1 Michael F. Peintinger <michael.p...@googlemail.com>

I have managed to compile and link the files into a shared library with a makefile.
When I run the code I get a "NameError" and the module cannot be found.

I tried various import/cimport statements with and without the path. If I grep the .so file I even find the module
cyaiccm.cyaiccm.CYAICCM but the import fails.

Creating a master Cython file and including the other files works but that looks pretty messy...

Is it possible to compile multiple Cython objects into one shared library without including them into one master C file?



You can use the "bare" mode of my game engine build tool to do something along those lines (though it will compile the entire Python interpreter along with your code) or at least get ideas on what needs to be done to get it working (as far as I could figure out, you have to patch the interpreter because the design philosophy allows only a shallow structure for builtins and a 1:1 correspondence between .so files and modules).

See schafer.py (also the stuff inside modules and patches may be useful) here: https://bitbucket.org/gabomdq/ignifuga/src/8a1ba01f70ea/tools


--
Gabriel.

Chris Barker

unread,
Oct 1, 2012, 2:59:56 PM10/1/12
to cython...@googlegroups.com
On Mon, Oct 1, 2012 at 10:23 AM, Bradley Froehle <brad.f...@gmail.com> wrote:
> I assume you mean, "can multiple modules live in one shared library." The
> answer to that question is no.

Actually -- that is not correct. You can create a package as a single
c extension, with multiple modules in it.

In fact, there is not much to it, its a couple lines of code in the
initMODULE in the extension.

However, Cython doesn't know how -- so you'd need to do a touch of
post-Cython-pre-compilation processing to do it.

I posted example code (in straight C) for this a good while back (a
couple years at least) on this list. If you can't find that let me
know and I'll try to dig it out again.

On the other hand, I haven't done it for years, because I've decided
there is no point. Almost all my Cython modules have python interfaces
anyway, so have a single python packages that import the separate
Cython modules in various ways works fine-- having 5 rather thn 1 so
really makes no difference to anyone.

-Chris
--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception

Chris....@noaa.gov

Ralf Schmitt

unread,
Oct 2, 2012, 5:04:03 AM10/2/12
to cython...@googlegroups.com
Robert Bradshaw <robe...@gmail.com> writes:

> Is there any specific reason you want to do this? In general, Python
> expects a 1:1 correspondence between modules and .so files; this is
> like asking if you could have one .py(c) file corresponding to several
> different modules. (It probably could be done via some import hacking,
> but I don't know that it'd be very nice...)

apipkg (http://pypi.python.org/pypi/apipkg/) provides functionality to
do exactly that. IMHO it's actually quite nice.

--
cheers
ralf

Chris Barker

unread,
Oct 2, 2012, 11:57:27 AM10/2/12
to cython...@googlegroups.com
On Tue, Oct 2, 2012 at 2:04 AM, Ralf Schmitt <ra...@systemexit.de> wrote:
> apipkg (http://pypi.python.org/pypi/apipkg/) provides functionality to
> do exactly that. IMHO it's actually quite nice.

This looks like it builds up a package from a bunch of modules to give
your users simpler imports. It does not, however, appear to put
multiple compiled modules into one *.so (or *.pyd), which I think is
what the OP was looking for.

-Chris
Message has been deleted

Robert Bradshaw

unread,
Oct 10, 2012, 2:47:06 PM10/10/12
to cython...@googlegroups.com
On Wed, Oct 10, 2012 at 3:17 AM, Michael F. Peintinger
<michael.p...@googlemail.com> wrote:
> Hi Robert,
> I develop a quantum chemical code in Cython with C++ Extensions. The reason
> why I would like to do this is that it would be nice only to copy one shared
> library
> and have the complete program.
> This also makes uninstall much easier...
> But if it is not possible without serious import hacking I'll just live with
> it :-)

Yeah, you could work around it, but I'm sure the pain is much higher
than having to manage a single directory rather than a single file.

Stefan Behnel

unread,
Oct 10, 2012, 3:20:24 PM10/10/12
to cython...@googlegroups.com
Robert Bradshaw, 01.10.2012 19:19:
> On Mon, Oct 1, 2012 at 7:37 AM, Michael F. Peintinger wrote:
>> Is it possible to compile multiple Cython objects into one shared library
>> without including them into one master C file?
>
> Is there any specific reason you want to do this? In general, Python
> expects a 1:1 correspondence between modules and .so files; this is
> like asking if you could have one .py(c) file corresponding to several
> different modules. (It probably could be done via some import hacking,
> but I don't know that it'd be very nice...)

I'd actually like to do this for lxml at some point. When it is built
statically (the normal case on perpetually broken platforms like Windows
and MacOS-X), the two main modules lxml.etree and lxml.objectify both end
up with their own copy of libxml2, libxslt and libiconv. It still works
because lxml.objectify only uses stateless code from these libraries, so
the whole state is kept in the copy inside of lxml.etree. But it definitely
wastes a lot of space and presents a somewhat fragile setup.

The static build would be way cleaner if both were in one shared library
and thus shared the same copy of the dependencies.

Stefan

Chris Barker

unread,
Oct 11, 2012, 6:06:45 PM10/11/12
to cython...@googlegroups.com
On Wed, Oct 10, 2012 at 12:20 PM, Stefan Behnel <stef...@behnel.de> wrote:

> I'd actually like to do this for lxml at some point. When it is built
> statically (the normal case on perpetually broken platforms like Windows
> and MacOS-X), the two main modules lxml.etree and lxml.objectify both end
> up with their own copy of libxml2, libxslt and libiconv. It still works
> because lxml.objectify only uses stateless code from these libraries, so
> the whole state is kept in the copy inside of lxml.etree. But it definitely
> wastes a lot of space and presents a somewhat fragile setup.
>
> The static build would be way cleaner if both were in one shared library
> and thus shared the same copy of the dependencies.

A few thoughts:

1) you really can build a compiled package with multiple modules in it
-- it wouldn't take a whole lot of hacking to do that -- just a
hand-written package file and a bit of monkey-patching of the
cython-generated modules (to get rid of the init_module.

2) However -- maybe you don't need to do that. We have a similar
situation, with a bunch of our (C++) code that is used my a handful of
Cython modules. The first thing we did that worked is compile the
whole pile in with each Cython module -- works, but as you say, lots
of code duplication, so:

- on OS-X (and linux?) we found that if you built the code into one
module, the rest of them could use it just fine, as long as the "main"
one was imported first -- pretty easy to enforce with the package
__init__.py

- on Windows -- no such luck -- it appears the Windows linker wants to
make sure the dll you are linking against has what it needs at link
time, not waiting for run time. And the dll build by distutils for a
module extension doesn't export all the symbols, so linking with that
didn't work either. What we've ended up doing is building a dll of our
C++ code, and linking all the individual modules against that -- it
works fine, and wasn't that much of a pain. Why not make a dll and
ship and use that?

I don't know that Windows and OS-X are "broken" but they are different
-- for us, the shared lib option doesn' t work well on the Mac, as it
hard-codes the paths to the lib at build time (yes, you can use the
right tool to move them, but...) Windows, ont he other hand, looks at
the saem dir for a dll, so it can be moved around.

So different methods for different platforms...

I've attached an example of putting multipel modules in one extension
(this one hand written a LONG time ago...)

In this case, the actually code is in two separate c file, but the
init* functions are defined here, and then there is a main init*
function that builds up the package.

You can see how you could hack Cython-generated code to be used like
this. I haven't used this in years (and quite a few version ago of
Python), so who knows if it still works.
TAP_ext.c

Stefan Behnel

unread,
Oct 12, 2012, 1:08:53 AM10/12/12
to cython...@googlegroups.com
Chris Barker, 12.10.2012 00:06:
> On Wed, Oct 10, 2012 at 12:20 PM, Stefan Behnel wrote:
>
>> I'd actually like to do this for lxml at some point. When it is built
>> statically (the normal case on perpetually broken platforms like Windows
>> and MacOS-X), the two main modules lxml.etree and lxml.objectify both end
>> up with their own copy of libxml2, libxslt and libiconv. It still works
>> because lxml.objectify only uses stateless code from these libraries, so
>> the whole state is kept in the copy inside of lxml.etree. But it definitely
>> wastes a lot of space and presents a somewhat fragile setup.
>>
>> The static build would be way cleaner if both were in one shared library
>> and thus shared the same copy of the dependencies.
>
> A few thoughts:
>
> 1) you really can build a compiled package with multiple modules in it
> -- it wouldn't take a whole lot of hacking to do that -- just a
> hand-written package file and a bit of monkey-patching of the
> cython-generated modules (to get rid of the init_module.

I know, thanks (also for digging out the example). It still works to
register multiple extensions from a single library in today's CPython, we
do that in the cython_freeze utility.


> 2) However -- maybe you don't need to do that. We have a similar
> situation, with a bunch of our (C++) code that is used my a handful of
> Cython modules. The first thing we did that worked is compile the
> whole pile in with each Cython module -- works, but as you say, lots
> of code duplication

That's only the minor part of the problem. The main issue is that each
module uses its own truly separate code and you are just lucky when the
different copies don't depend on global static C variables for the state
that they need to share. libxml2 has a couple of configuration options that
are global and some of them really live in static C variables. Anything
that depends on those would stop working the same way in the different
modules once an option gets modified in one. It's not a problem in lxml
because lxml.objectify hands all work over to lxml.etree anyway that
depends on global state (it only works on heap data structures), but it's
still a bit fragile because you have to know how these things work.

If you allocate everything on the heap (which isn't unlikely, especially in
C++), that would obviously be shared just fine because passing pointers
around to data structures that everyone has the same idea about is
obviously safe, regardless of where that idea came from.


> - on OS-X (and linux?) we found that if you built the code into one
> module, the rest of them could use it just fine, as long as the "main"
> one was imported first -- pretty easy to enforce with the package
> __init__.py

Right. Flat namespaces. Just lovely.

However, doesn't the static linker strip out public symbols after resolving
them? So, what I assume you did is you linked everything at the .o level,
right? Not against an existing static library file? So far, I've preferred
not digging into the external CMMI library builds too deeply.


> - on Windows -- no such luck -- it appears the Windows linker wants to
> make sure the dll you are linking against has what it needs at link
> time, not waiting for run time. And the dll build by distutils for a
> module extension doesn't export all the symbols, so linking with that
> didn't work either. What we've ended up doing is building a dll of our
> C++ code, and linking all the individual modules against that -- it
> works fine, and wasn't that much of a pain. Why not make a dll and
> ship and use that?

Well, yes, I could actually use the normal Windows DLLs of the dependencies
that others have already provided. I think that might make life a lot
easier for users.

Sadly, I don't currently have a Windows system, let alone a properly
configured Windows system that can build lxml, so I depend on Windows users
to figure the details out for themselves and send me a patch for the build
mechanism. It seems from my previous experience that there are not so many
of them that are a) able to figure something like this out and b) send a
patch after doing it. My guess is that it's usually more than just a patch
on their side.


> I don't know that Windows and OS-X are "broken"

Windows in the sense that you can't target it with a source build at all
and even the available 'standard' compilers have their own, erm,
particularities.

MacOS-X in the sense that it's really badly maintained by Apple (with
horribly outdated system libraries) and that it seems to have problems with
distinguishing shared libraries as soon as a copy exists in the default
system library folders. I've had users report really weird behaviour for
lxml when it's built against a different libxml2 installation than what's
in /usr/lib, even though it reports that the correct library is being used.

Plus, both are commercial platforms, which makes it costly to support them
at all.


> but they are different
> -- for us, the shared lib option doesn' t work well on the Mac, as it
> hard-codes the paths to the lib at build time (yes, you can use the
> right tool to move them, but...) Windows, ont he other hand, looks at
> the saem dir for a dll, so it can be moved around.

Yes, I faintly remember that from my old Win95 days. (Actually, I still
have a licensed Win95 install CD lying around, maybe I should get a VM up
and running and start trying things out - a matter of time, though...)


> So different methods for different platforms...

Which definitely complicates things - although I can't say that the current
build setup in lxml isn't complicated already.


> I've attached an example of putting multipel modules in one extension
> (this one hand written a LONG time ago...)
>
> In this case, the actually code is in two separate c file, but the
> init* functions are defined here, and then there is a main init*
> function that builds up the package.
>
> You can see how you could hack Cython-generated code to be used like
> this. I haven't used this in years (and quite a few version ago of
> Python), so who knows if it still works.

Yes, I think it would be cool if Cython could just compile a set of
extensions (or a whole package) into one shared library. Could just be an
option to cythonize().

Stefan

Chris Barker

unread,
Oct 12, 2012, 6:33:14 PM10/12/12
to cython...@googlegroups.com
On Oct 11, 2012, at 10:08 PM, Stefan Behnel <stef...@behnel.de> wrote:

Chris Barker, 12.10.2012 00:06:

I know, thanks (also for digging out the example). It still works to
register multiple extensions from a single library in today's CPython, we
do that in the cython_freeze utility.

Good to know -- and there may be code there to leverage -- or at least look at.

That's only the minor part of the problem. The main issue is that each


However, doesn't the static linker strip out public symbols after resolving
them?

Could be- I have no idea.

So, what I assume you did is you linked everything at the .o level,
right? Not against an existing static library file?

Exactly.

So far, I've preferred
not digging into the external CMMI library builds too deeply.

I can understand that. Might be worth a try though- perhaps the linker does leave the symbols intact.


- on Windows -- no such luck -- it

Well, yes, I could actually use the normal Windows DLLs of the dependencies
that others have already provided. I think that might make life a lot
easier for users.

That often is the way to go with windows libs.

to figure the details out for themselves and send me a patch for the build
mechanism. It seems from my previous experience that there are not so many
of them that are a) able to figure something like this out and b) send a
patch after doing it. My guess is that it's usually more than just a patch
on their side.

Right- as we see it may take a whole different approach- you'd need someone that knows what they are doing.


I don't know that Windows and OS-X are "broken"

Windows in the sense that you can't target it with a source build at all

What does that mean?

MacOS-X in the sense that it's really badly maintained by Apple (with
horribly outdated system libraries) and that it seems to have problems with
distinguishing shared libraries as soon as a copy exists in the default
system library folders.

Hmm-that would be a disaster- I haven't seen that, though maybe I didn't identify it as the problem when something odd happened.

Plus, both are commercial platforms, which makes it costly to support them
at all.

True: mac's aren't cheap - you could try a hackintosh...

but they are different
-- for us, the shared lib option doesn' 

Yes, I think it would be cool if Cython could just compile a set of
extensions (or a whole package) into one shared library. Could just be an
option to cythonize().

That would be nice-- and maybe a good way to solve your issues as well.

-Chris 
Reply all
Reply to author
Forward
0 new messages