Chris Barker, 12.10.2012 00:06:
> On Wed, Oct 10, 2012 at 12:20 PM, Stefan Behnel wrote:
>
>> I'd actually like to do this for lxml at some point. When it is built
>> statically (the normal case on perpetually broken platforms like Windows
>> and MacOS-X), the two main modules lxml.etree and lxml.objectify both end
>> up with their own copy of libxml2, libxslt and libiconv. It still works
>> because lxml.objectify only uses stateless code from these libraries, so
>> the whole state is kept in the copy inside of lxml.etree. But it definitely
>> wastes a lot of space and presents a somewhat fragile setup.
>>
>> The static build would be way cleaner if both were in one shared library
>> and thus shared the same copy of the dependencies.
>
> A few thoughts:
>
> 1) you really can build a compiled package with multiple modules in it
> -- it wouldn't take a whole lot of hacking to do that -- just a
> hand-written package file and a bit of monkey-patching of the
> cython-generated modules (to get rid of the init_module.
I know, thanks (also for digging out the example). It still works to
register multiple extensions from a single library in today's CPython, we
do that in the cython_freeze utility.
> 2) However -- maybe you don't need to do that. We have a similar
> situation, with a bunch of our (C++) code that is used my a handful of
> Cython modules. The first thing we did that worked is compile the
> whole pile in with each Cython module -- works, but as you say, lots
> of code duplication
That's only the minor part of the problem. The main issue is that each
module uses its own truly separate code and you are just lucky when the
different copies don't depend on global static C variables for the state
that they need to share. libxml2 has a couple of configuration options that
are global and some of them really live in static C variables. Anything
that depends on those would stop working the same way in the different
modules once an option gets modified in one. It's not a problem in lxml
because lxml.objectify hands all work over to lxml.etree anyway that
depends on global state (it only works on heap data structures), but it's
still a bit fragile because you have to know how these things work.
If you allocate everything on the heap (which isn't unlikely, especially in
C++), that would obviously be shared just fine because passing pointers
around to data structures that everyone has the same idea about is
obviously safe, regardless of where that idea came from.
> - on OS-X (and linux?) we found that if you built the code into one
> module, the rest of them could use it just fine, as long as the "main"
> one was imported first -- pretty easy to enforce with the package
> __init__.py
Right. Flat namespaces. Just lovely.
However, doesn't the static linker strip out public symbols after resolving
them? So, what I assume you did is you linked everything at the .o level,
right? Not against an existing static library file? So far, I've preferred
not digging into the external CMMI library builds too deeply.
> - on Windows -- no such luck -- it appears the Windows linker wants to
> make sure the dll you are linking against has what it needs at link
> time, not waiting for run time. And the dll build by distutils for a
> module extension doesn't export all the symbols, so linking with that
> didn't work either. What we've ended up doing is building a dll of our
> C++ code, and linking all the individual modules against that -- it
> works fine, and wasn't that much of a pain. Why not make a dll and
> ship and use that?
Well, yes, I could actually use the normal Windows DLLs of the dependencies
that others have already provided. I think that might make life a lot
easier for users.
Sadly, I don't currently have a Windows system, let alone a properly
configured Windows system that can build lxml, so I depend on Windows users
to figure the details out for themselves and send me a patch for the build
mechanism. It seems from my previous experience that there are not so many
of them that are a) able to figure something like this out and b) send a
patch after doing it. My guess is that it's usually more than just a patch
on their side.
> I don't know that Windows and OS-X are "broken"
Windows in the sense that you can't target it with a source build at all
and even the available 'standard' compilers have their own, erm,
particularities.
MacOS-X in the sense that it's really badly maintained by Apple (with
horribly outdated system libraries) and that it seems to have problems with
distinguishing shared libraries as soon as a copy exists in the default
system library folders. I've had users report really weird behaviour for
lxml when it's built against a different libxml2 installation than what's
in /usr/lib, even though it reports that the correct library is being used.
Plus, both are commercial platforms, which makes it costly to support them
at all.
> but they are different
> -- for us, the shared lib option doesn' t work well on the Mac, as it
> hard-codes the paths to the lib at build time (yes, you can use the
> right tool to move them, but...) Windows, ont he other hand, looks at
> the saem dir for a dll, so it can be moved around.
Yes, I faintly remember that from my old Win95 days. (Actually, I still
have a licensed Win95 install CD lying around, maybe I should get a VM up
and running and start trying things out - a matter of time, though...)
> So different methods for different platforms...
Which definitely complicates things - although I can't say that the current
build setup in lxml isn't complicated already.
> I've attached an example of putting multipel modules in one extension
> (this one hand written a LONG time ago...)
>
> In this case, the actually code is in two separate c file, but the
> init* functions are defined here, and then there is a main init*
> function that builds up the package.
>
> You can see how you could hack Cython-generated code to be used like
> this. I haven't used this in years (and quite a few version ago of
> Python), so who knows if it still works.
Yes, I think it would be cool if Cython could just compile a set of
extensions (or a whole package) into one shared library. Could just be an
option to cythonize().
Stefan