[Python-Dev] Safely importing zip files with C extensions

15 views
Skip to first unread message

Vinay Sajip

unread,
Mar 27, 2013, 3:38:03 PM3/27/13
to pytho...@python.org
> This quote is here to stop GMane complaining that I'm top-posting. Ignore.

I've already posted this to distutils-sig, but thought that it might be of
interest to readers here as it relates to importing C extensions ...

zipimport is great, but there can be issues importing software that contains
C extensions. But the new wheel format (PEP 427) may give us a better way of
importing zip files containing C extensions. Since wheels are .zip files, they
can sometimes be used to provide functionality without needing to be installed.
But whereas .zip files contain no convention for indicating compatibility with
a particular Python, wheels do contain this compatibility information. Thus, it
is possible to check if a wheel can be directly imported from, and the wheel
support in distlib allows you to take advantage of this using the mount() and
unmount() methods. When you mount a wheel, its absolute path name is added to
sys.path, allowing the Python code in it to be imported. (A DistlibException is
raised if the wheel isn't compatible with the Python which calls the mount()
method.)

You don't need mount() just to add the wheel's name to sys.path, or to import
pure-Python wheels, of course. But the mount() method goes further than just
enabling Python imports - any C extensions in the wheel are also made available
for import. For this to be possible, the wheel has to be built with additional
metadata about extensions - a JSON file called EXTENSIONS which serialises an
extension mapping dictionary. This maps extension module names to the names in
the wheel of the shared libraries which implement those modules.

Running unmount() on the wheel removes its absolute pathname from sys.path and
makes its C extensions, if any, also unavailable for import.

Wheels built with the new "distil" tool contain the EXTENSIONS metadata, so can
be mounted complete with C extensions:

$ distil download -d /tmp simplejson
Downloading simplejson-3.1.2.tar.gz to /tmp/simplejson-3.1.2
63KB @ 73 KB/s 100 % Done: 00:00:00
Unpacking ... done.
$ distil package --fo=wh -d /tmp /tmp/simplejson-3.1.2/
The following packages were built:
/tmp/simplejson-3.1.2-cp27-none-linux_x86_64.whl
$ python
Python 2.7.2+ (default, Jul 20 2012, 22:15:08)
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from distlib.wheel import Wheel
>>> w = Wheel('/tmp/simplejson-3.1.2-cp27-none-linux_x86_64.whl')
>>> w.mount()
>>> import simplejson._speedups
>>> dir(simplejson._speedups)
['__doc__', '__file__', '__loader__', '__name__', '__package__',
'encode_basestring_ascii', 'make_encoder', 'make_scanner', 'scanstring']
>>> simplejson._speedups.__file__
'/home/vinay/.distlib/dylib-cache/simplejson/_speedups.so'
>>>

Does anyone see any problems with this approach to importing C extensions from
zip files?

Regards,

Vinay Sajip

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Amaury Forgeot d'Arc

unread,
Mar 27, 2013, 4:13:27 PM3/27/13
to Vinay Sajip, pytho...@python.org
2013/3/27 Vinay Sajip <vinay...@yahoo.co.uk>

When you mount a wheel, its absolute path name is added to
sys.path, allowing the Python code in it to be imported.

Better: just put the wheel path to sys.path
    sys.path.append('/tmp/simplejson-3.1.2-cp27-none-linux_x86_64.whl')
and let a sys.path_hook entry do the job.

Such a WheelImporter could even inherit from zipimporter, plus the magic required for C extensions.

It avoids the mount/nomount methods, only the usual sys.path operations are necessary from the user.

--
Amaury Forgeot d'Arc

Stefan Behnel

unread,
Mar 27, 2013, 4:34:19 PM3/27/13
to pytho...@python.org
Vinay Sajip, 27.03.2013 20:38:
> >>> w = Wheel('/tmp/simplejson-3.1.2-cp27-none-linux_x86_64.whl')
> >>> w.mount()
> >>> import simplejson._speedups
> >>> dir(simplejson._speedups)
> ['__doc__', '__file__', '__loader__', '__name__', '__package__',
> 'encode_basestring_ascii', 'make_encoder', 'make_scanner', 'scanstring']
> >>> simplejson._speedups.__file__
> '/home/vinay/.distlib/dylib-cache/simplejson/_speedups.so'

I've always hated this setuptools misfeature of copying C extensions from
an installed archive into a user directory, one for each user. At least
during normal installation, they should be properly unpacked into normal
shared library files in the file system.

Whether it then makes sense to special case one-shot trial imports like the
above without installation is a bit of a different question, but I don't
see a compelling reason for adding complexity here. It's not really an
important use case.

Stefan

Vinay Sajip

unread,
Mar 27, 2013, 4:41:05 PM3/27/13
to pytho...@python.org
Amaury Forgeot d'Arc <amauryfa <at> gmail.com> writes:


> Better: just put the wheel path to sys.path   
sys.path.append('/tmp/simplejson-3.1.2-cp27-none-linux_x86_64.whl')
> and let a sys.path_hook entry do the job.

That's what the mount() actually does - adds the wheel to a registry that an
import hook uses. You also need a place to check that the wheel being mounted
is compatible with the Python doing the mounting - I'm not sure whether what
the import hook should do if e.g. there is a compatibility problem with the
wheel (e.g. is it clear that it should always raise an ImportError? Or ignore
the wheel - seems wrong? Or do something else?)

Daniel Holth

unread,
Mar 27, 2013, 4:49:59 PM3/27/13
to Vinay Sajip, Jim Fulton, Python-Dev
Jim Fulton is right that weird failures are a characteristic of zipped
eggs, so one of the #1 requests for setuptools is how to prohibit
zipping from ever happening. This is an important reason why wheel is
billed as an installation format -- fewer users with pitchforks. It's
very cool that it works though. Debugging is slightly easier than it
was in the old days because pdb can now read the source code from the
zip.

An unzipped wheel as a directory with the same name as the wheel would
be a more reliable solution that might be interesting to work with. It
would work the same as egg unless you had important files in the
.data/ (currently mostly used for console scripts and include files)
directory. However it was always confusing to have more than one kind
(zipped, unzipped) of egg.
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/dholth%40gmail.com

Vinay Sajip

unread,
Mar 27, 2013, 4:59:11 PM3/27/13
to pytho...@python.org
Stefan Behnel <stefan_ml <at> behnel.de> writes:


> I've always hated this setuptools misfeature of copying C extensions from
> an installed archive into a user directory, one for each user. At least
> during normal installation, they should be properly unpacked into normal
> shared library files in the file system.

The user directory location is not a key part of the functionality, it could
just as well be a shared location across all users. And this is an option for
specific scenarios, not a general substitute for installing the wheel (which
unpacks everything into FHS-style locations). A lot of people use virtual envs,
which are per-user anyway. I'm not suggesting this is a good idea for system-wide
deployments of software.

> Whether it then makes sense to special case one-shot trial imports like the
> above without installation is a bit of a different question, but I don't
> see a compelling reason for adding complexity here. It's not really an
> important use case.

Well, my post was to elicit some comment about the usefulness of the feature,
so fair enough. It doesn't seem especially complex though, unless I've missed
something.

Regards,

Vinay Sajip

Vinay Sajip

unread,
Mar 27, 2013, 5:04:40 PM3/27/13
to pytho...@python.org
Daniel Holth <dholth <at> gmail.com> writes:

> zipping from ever happening. This is an important reason why wheel is
> billed as an installation format -- fewer users with pitchforks. It's
> very cool that it works though. Debugging is slightly easier than it
> was in the old days because pdb can now read the source code from the
> zip.

Well, it's just an experiment, and I was soliciting comments because I'm not as
familiar with the issues as some others are. Distlib is still only at version
0.1.1, and the mount()/unmount() functionality is not set in stone :-)

Regards,

Vinay Sajip

_______________________________________________
Python-Dev mailing list
Pytho...@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/dev-python%2Bgarchive-30976%40googlegroups.com

Bradley M. Froehle

unread,
Mar 27, 2013, 5:19:39 PM3/27/13
to pytho...@python.org, Vinay Sajip
On Wed, Mar 27, 2013 at 1:13 PM, Amaury Forgeot d'Arc <amau...@gmail.com> wrote:
2013/3/27 Vinay Sajip <vinay...@yahoo.co.uk>
When you mount a wheel, its absolute path name is added to
sys.path, allowing the Python code in it to be imported.

Better: just put the wheel path to sys.path
    sys.path.append('/tmp/simplejson-3.1.2-cp27-none-linux_x86_64.whl')
and let a sys.path_hook entry do the job.

Such a WheelImporter could even inherit from zipimporter, plus the magic required for C extensions.

I implemented just such a path hook ---- zipimporter plus the magic required for C extensions --- as a challenge to myself to learn more about the Python import mechanisms.


Cheers,
Brad

PJ Eby

unread,
Mar 28, 2013, 12:10:32 AM3/28/13
to Bradley M. Froehle, Vinay Sajip, pytho...@python.org
On Wed, Mar 27, 2013 at 5:19 PM, Bradley M. Froehle
<brad.f...@gmail.com> wrote:
> I implemented just such a path hook ---- zipimporter plus the magic required
> for C extensions --- as a challenge to myself to learn more about the Python
> import mechanisms.
>
> See https://github.com/bfroehle/pydzipimport.

FYI, there appears to be a bug for Windows with packages: you're using
'/__init__' in a couple places that should actually be
os.sep+'__init__'.

This does seem like a good way to address the issue, for those rare
situations where this would be a good idea.

The zipped .egg approach was originally intended for user-managed
plugin directories for certain types of extensible platforms, where
"download a file and stick it in the plugins directory" is a
low-effort way to install plugins, without having to build a lot of
specialized install capability.

As Jim has pointed out, though, this doesn't generalize well to a
full-blown packaging system.

Technically, you can blame Bob Ippolito for this, since he's the one
who talked me into using eggs to install Python libraries in general,
not just as a plugin packaging mechanism. ;-)

That being said, *unpacked* egg, er, wheels, are still a great way to
meet all of the "different apps needing different versions" use cases
(without needing one venv per app), and nowadays the existence of
automated installer tools means that using one to install a plugin for
a low-tech plugin system is not a big deal, as long as that tool
supports the simple unpacked wheel scenario. So I wholeheartedly
support some kind of mount/unmount or "require"-type mechanism for
finding plugins. pkg_resources even has an API for handling simple
dynamic plugin dependency resolution scenarios:

http://peak.telecommunity.com/DevCenter/PkgResources#locating-plugins

It'd be a good idea if distlib provides a similar feature, or at least
the APIs upon which apps or frameworks can implement such features.

(Historical note for those who weren't around back then: easy_install
wasn't even an *idea* until well after eggs were created; the original
idea was just that people would build plugins and libraries as eggs
and manually drop them in directories, where a plugin support library
would discover them and add them to sys.path as needed. And Bob and I
also considered a sort of "update site" mechanism ala Eclipse, with a
library to let apps fetch plugins. But as soon as eggs existed and
PyPI allowed uploads, it was kind of an obvious follow-up to make an
installation tool as a kind of "technology demonstration".... which
promptly became a monster. The full story with all its twists and
turns can also be found here:
http://mail.python.org/pipermail/python-dev/2006-April/064145.html )

Thomas Heller

unread,
Mar 28, 2013, 10:44:08 AM3/28/13
to pytho...@python.org
Am 27.03.2013 20:38, schrieb Vinay Sajip:
>> This quote is here to stop GMane complaining that I'm top-posting. Ignore.
>
> I've already posted this to distutils-sig, but thought that it might be of
> interest to readers here as it relates to importing C extensions ...
>
> zipimport is great, but there can be issues importing software that contains
> C extensions. But the new wheel format (PEP 427) may give us a better way of
> importing zip files containing C extensions. Since wheels are .zip files, they
> can sometimes be used to provide functionality without needing to be installed.
> But whereas .zip files contain no convention for indicating compatibility with
> a particular Python, wheels do contain this compatibility information. Thus, it
> is possible to check if a wheel can be directly imported from, and the wheel
> support in distlib allows you to take advantage of this using the mount() and
> unmount() methods. When you mount a wheel, its absolute path name is added to
> sys.path, allowing the Python code in it to be imported. (A DistlibException is
> raised if the wheel isn't compatible with the Python which calls the mount()
> method.)

The zip-file itself could support importing compiled extensions when it
contains a python-wrapper module that unpacks the .so/.dll file
somewhere, and finally calls imp.load_dynamic() to import it and replace
itself.

Thomas

Brett Cannon

unread,
Mar 28, 2013, 12:09:26 PM3/28/13
to Thomas Heller, python-dev
On Thu, Mar 28, 2013 at 10:44 AM, Thomas Heller <the...@ctypes.org> wrote:
Am 27.03.2013 20:38, schrieb Vinay Sajip:

This quote is here to stop GMane complaining that I'm top-posting. Ignore.

I've already posted this to distutils-sig, but thought that it might be of
interest to readers here as it relates to importing C extensions ...

zipimport is great, but there can be issues importing software that contains
C extensions. But the new wheel format (PEP 427) may give us a better way of
importing zip files containing C extensions. Since wheels are .zip files, they
can sometimes be used to provide functionality without needing to be installed.
But whereas .zip files contain no convention for indicating compatibility with
a particular Python, wheels do contain this compatibility information. Thus, it
is possible to check if a wheel can be directly imported from, and the wheel
support in distlib allows you to take advantage of this using the mount() and
unmount() methods. When you mount a wheel, its absolute path name is added to
sys.path, allowing the Python code in it to be imported. (A DistlibException is
raised if the wheel isn't compatible with the Python which calls the mount()
method.)

The zip-file itself could support importing compiled extensions when it contains a python-wrapper module that unpacks the .so/.dll file somewhere, and finally calls imp.load_dynamic() to import it and replace itself.

Which must be done carefully to prevent a security issue. It shouldn't be unzipped anywhere but into a directory only writable by the process. 

Christian Heimes

unread,
Mar 28, 2013, 1:12:05 PM3/28/13
to Brett Cannon, Thomas Heller, python-dev
Am 28.03.2013 17:09, schrieb Brett Cannon:
> Which must be done carefully to prevent a security issue. It shouldn't
> be unzipped anywhere but into a directory only writable by the process.

Cleanup is going to be tricky or even impossible. Windows locks loaded
DLLs and therefore prevents their removal. It's possible to unload DLLs
but I don't know the implications.

Gregory P. Smith

unread,
Mar 28, 2013, 9:06:35 PM3/28/13
to Brett Cannon, Thomas Heller, python-dev
Once http://sourceware.org/bugzilla/show_bug.cgi?id=11767 is implemented and available in libc, no extraction of .so's should be needed (they will likely need to be stored uncompressed in the .zip file for that though).

Thomas Heller

unread,
Mar 29, 2013, 8:00:29 AM3/29/13
to pytho...@python.org
For windows there is already code that does it:

http://www.py2exe.org/index.cgi/Hacks/ZipExtImporter

This page is not up-to-date, but it describes the idea and the
implementation. The code currently is 32-bit only and for Python 2
but that probably can be fixed.

It is based on Joachim Bauch's MemoryModule:
https://github.com/fancycode/MemoryModule
Reply all
Reply to author
Forward
0 new messages