Self contained python packages with external C++ dependencies

53 views
Skip to first unread message

Jariullah Safi

unread,
Oct 8, 2018, 6:02:00 PM10/8/18
to scikit-build
Hi, 

Apologies for asking what may be obvious or already answered in the documentation (my perusal did not indicate so but I wasn't thorough ... sorry). I'm trying to figure out if scikit-build will solve all of my problems!

I've recently started writing python extensions in C++ (using the excellent pybind11). One snag I've hit is that of external dependencies. As an example, one of my open source projects (https://github.com/safijari/pyOpenKarto) needs libboost_thread.so. 

I'm trying to figure out what the best way is for me to include this file into the python wheel that I build for my package such that upon install of the wheel (virtualenv or otherwise) the .so file is found correctly by the package.

Is there a way for me to accomplish this easily using scikit-build? I kind of can do it manually by hand but that feels like pulling teeth and with so many other packages that can do this (e.g. opencv) I feel like there must be a nicer way.

Thanks :)

Omar Padron

unread,
Oct 8, 2018, 6:50:30 PM10/8/18
to Jariullah Safi, scikit-build
Hello!

With scikit-build, we're mostly trying to solve the problem of building and bundling python packages that use compiled python extensions, and preferably with an approach that is much less of a hassle for maintainers (compared to e.g.: distutils).  There are certainly features we support that facilitate packaging, but there are some packaging concerns that are beyond the scope of what scikit-build is trying to accomplish, imo.

I think your use case is an example of these concerns.  Indeed, how external dependencies are linked into python extensions is as much a question of packaging policy as it is one of application assembly.  You have many choices, and scikit-build tries to avoid being opinionated on how you should choose to link in external dependencies, among other concerns.

I have a few ideas on some example policies that have various pros and cons, but for the most general and portable solution, I'm pretty sure you'd have to take an approach similar to what is done in anaconda.  That is, you would need to manage your loaders runtime path to ensure that the right version of your external dependencies are loaded.

Fun fact: python wheels don't need to contain any python packages; at all!  We've taken advantage of this in the past to use wheels as a generic packaging facility to portably distribute applications, like CMake, without having to tie them to a python package.  Whatever approach you choose, consider packaging libboost_thread into its own wheel.  That way, you have a copy you can link to and share with any other python wheels that have libboost_thread as an external dependency.

I wish we could solve *all* your problems, but packaging policy is especially tricky! :)  I think even now, there is still a healthy amount of debate on how to best approach the issue on the distutils mailing list, and I think we'd rather see what comes out of that before committing to a particular approach.

I hope this helps!

 -- Omar

--
You received this message because you are subscribed to the Google Groups "scikit-build" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scikit-build...@googlegroups.com.
To post to this group, send email to scikit...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/scikit-build/d233cc57-af3b-407e-9528-1474bbbed0e4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Omar Padron
Senior R&D Engineer
Kitware Inc.

Jariullah Safi

unread,
Oct 8, 2018, 7:55:09 PM10/8/18
to scikit-build
Thank you very much for the detailed response Omar. The context is very helpful.

When you say look at anaconda, do you mean I should look at how conda is used for packaging/installation or did you mean something else?

One of the saving graces of my current dilemma is that I don't quite need portability. My target architecture/OS is fixed. The only requirement I have is that the wheel should be self contained so that a fresh virtual environment on the target platform/OS can work. I guess what I'm ultimately trying to understand perhaps has less to do with scikit-build, but if you can kindly answer my following two questions that would be immensely helpful:

1) Is there a way for me (e.g. using Cmake) to automatically discover and copy over the shared object files used by my python extension? This way I can include them in package_data and at least ship them with the wheel.
2) Is there a way for me to force the python module to search for the shared object files in the virtualenv site-packages folder (or wherever they get copied I guess) without modifying the LD_LIBRARY_PATH and running ldconfig? I sort of have a hack for this involving the imp module but it's ... really really dirty.

Thank you so much again :)

Omar Padron

unread,
Oct 8, 2018, 9:20:49 PM10/8/18
to Jariullah Safi, scikit-build
Sure thing!  And thanks for taking a look at scikit-build.

In referencing Anaconda, I wanted to point out that when building your own module with external dependencies:

 1 - You will need to decide how to link in those external dependencies.
 2 - You will need to decide what requirements (if any) your package will demand from your environment so that its external dependencies are resolved correctly.
 3 - Most importantly, points 1 and 2 are somewhat coupled.  The choices you make regarding one will have implications for those regarding the other.

Anaconda is an example of one way you can make these choices, but there are others.  Based on your requirements and trying to avoid extra requirements on your environment, one possible approach would be to build your extension module and *statically* link in your external dependencies (libboost_thread, in this case).  Prepackaged libraries are not usually built for this, so you might have to compile libboost_thread, yourself.  The idea is to compile with -fPIC, but link into a static .a archive, and link against that when building your extension module.  Your module will have all its code (including whatever parts of libboost_thread that it uses) all contained in the one .so, so there wouldn't be any external symbols to resolve when python imports it -- no need to fiddle with LD_LIBRARY_PATH.  Since all the code is in one object, virtualenv should "just work", and your code should happily run without problem as long as you keep to the same architecture/OS.

The above suggestion has some considerable limitations, but is also pretty constrained.  More flexible options await if you're willing to relax some of these constraints, such as if you're willing to manage your LD_LIBRARY_PATH and/or relax your definition of "self-contained".

Regarding your specific questions:

 1 - For the majority of cases, CMake provides facilities by which you can find installed libraries on your system.  However, I hesitate to suggest copying external libraries directly into wheels.  Even if that would work for you, you would still have to set your LD_LIBRARY_PATH before running any applications.  I'm pretty sure this would work, but seems like a lot of hassle since you're sticking to one architecture/OS.  If it were me, I would just leave the library out, entirely, and accept the fact that my module is going to load something outside of my virtualenv.

 2 - This is actually a really interesting idea.  There's no end to the potential for magic features (and shot feet) with the right imp module hackery.  With the linux loader, however, you're pretty limited, since you're trying to dig into internals that are well beneath the level of the Python interpreter.  I think the only thing you can really do is temporarily modify the environment of the running interpreter before the import, proper, and then (preferably) restore the environment after.  I suspect your "dirty" hack is probably doing something similar? 

 -- Omar


For more options, visit https://groups.google.com/d/optout.

Matt McCormick

unread,
Oct 8, 2018, 10:04:13 PM10/8/18
to Omar Padron, jans...@gmail.com, scikit-build
Hi,

As Omar said, wheels do not officially support native shared libraries, especially libraries shared between packages. It takes some work to use them across packages.


Building the dependencies as a static library and linking them into the C extension module is one good option.


Another option is to build dependency as a shared library, then install it with CMake next to the C extension module in the wheel. In your CMake configuration, install the library in the same location as the C extension module relative to the Python site-packages directory.

On Windows, the library can be found if it is in the same directory.

On macOS, run delocate-wheel on the wheel after scikit-build generates it, and the C extension will find the library relative to its location:


On Linux, run auditwheel on the wheel after scikit-build generates it, and the C extension will find the library relative to its location:



HTH,
Matt

Omar Padron

unread,
Oct 8, 2018, 11:35:53 PM10/8/18
to Matt McCormick, jans...@gmail.com, scikit-build
Neato!  Those are definitely more flexible solutions that work with shared objects.  I'd go with one of Matt's suggestions if they work out for you.

 -- Omar

Jariullah Safi

unread,
Oct 9, 2018, 12:20:21 AM10/9/18
to Omar Padron, Matt McCormick, scikit-build
Auditwheel looked interesting but I was running into a problem getting it to work on Ubuntu 18.04, something about tooling being too new or some such thing. I'll set up a 16.04 or 14.04 VM and try to use that as my build environment and see if that works.

Thank you both for taking the time to answer my questions. It has been very helpful :)

Matthew McCormick

unread,
Oct 9, 2018, 1:06:23 AM10/9/18
to jans...@gmail.com, omar....@kitware.com, Matt McCormick, scikit-build
Glad the discussion has been helpful!

Try using dockcross/manylinux-x64 to build the wheels and use auditwheel. It will make your wheels more compatible with other Linux's because of the glibc and libstdc++ versions.


Reply all
Reply to author
Forward
0 new messages