Announcement: Pip 10 is coming, and will move all internal APIs

263 views
Skip to first unread message

Paul Moore

unread,
Oct 20, 2017, 9:22:04 AM10/20/17
to Distutils, pypa-dev
We're in the process of starting to plan for a release of pip (the
long-awaited pip 10). We're likely still a month or two away from a
release, but now is the time for people to start ensuring that
everything works for them. One key change in the new version will be
that all of the internal APIs of pip will no longer be available, so
any code that currently calls functions in the "pip" namespace will
break. Calling pip's internal APIs has never been supported, and
always carried a risk of such breakage, so projects doing so should,
in theory, be prepared for such things. However, reality is not always
that simple, and we are aware that people will need time to deal with
the implications.

Just in case it's not clear, simply finding where the internal APIs
have moved to and calling them under the new names is *not* what
people should do. We can't stop people calling the internal APIs,
obviously, but the idea of this change is to give people the incentive
to find a supported approach, not just to annoy people who are doing
things we don't want them to ;-)

So please - if you're calling pip's internals in your code, take the
opportunity *now* to check out the in-development version of pip, and
ensure your project will still work when pip 10 is released.

And many thanks to anyone else who helps by testing out the new
version, as well :-)

Thanks,
Paul

Paul Moore

unread,
Oct 20, 2017, 9:30:27 AM10/20/17
to Matthew Brett, Distutils, pypa-dev
On 20 October 2017 at 14:26, Matthew Brett <matthe...@gmail.com> wrote:
> Thanks for the heads-up.
>
> Will y'all be doing a PyPI pre-release so we can test with `pip
> install --pre -U pip`?

We've not yet decided on that. Traditionally I don't think we have
done so, but I'm inclined to think it's a good idea. It might not be
until noticeably closer to the release, though...

Paul

Paul Moore

unread,
Oct 20, 2017, 10:20:03 AM10/20/17
to Jannis Gebauer, Distutils, pypa-dev
On 20 October 2017 at 14:55, Jannis Gebauer <ja....@me.com> wrote:
> Thanks for the heads-up, Paul.
>
> I’m currently using `pip.get_installed_distributions` and as far as I can
> see that has moved into `_internal`, too:
> https://github.com/pypa/pip/blob/master/src/pip/_internal/utils/misc.py#L333
>
> Any recommendations?

See https://github.com/pypa/pip/pull/4743

Unfortunately, the "latest" docs build doesn't seem to reflect this (I
don't know why).

I guess you probably want something from pkg_resources? Depends
precisely what you're trying to do, I guess.
Paul

Noah Kantrowitz

unread,
Oct 20, 2017, 1:42:49 PM10/20/17
to Paul Moore, Distutils, pypa-dev
So as someone on the tooling side, is there any kind of install dry-run yet? I've got https://github.com/poise/poise-python/blob/master/lib/poise_python/resources/python_package.rb#L34-L78 which touches a toooon of internals. Basically I need a way to know exactly what versions `pip install` would have used in a given situation without actually changing the system. Happy for a better solution!

--Noah
> _______________________________________________
> Distutils-SIG maillist - Distut...@python.org
> https://mail.python.org/mailman/listinfo/distutils-sig

Donald Stufft

unread,
Oct 20, 2017, 2:34:33 PM10/20/17
to Paul Moore, Matthew Brett, Distutils, pypa-dev
> until noticeably closer to the release, though…
>

I used to cut pre-releases for pip, and after awhile I gave up on doing them because it felt like nobody ever actually reported any issues with them anyways, and it wasn’t until we cut the final release that we started finding bugs with them. I don’t have any problem with us starting to issue them again though and seeing if we start catching issues earlier this time.


Noah Kantrowitz

unread,
Oct 20, 2017, 2:41:51 PM10/20/17
to xoviat, Paul Moore, pypa-dev, Distutils
Installing to a temp dir is really not an option for automated tooling (if nothing else, it takes way too long). `pip list --outdated` does already get fairly close to this (and doesn't install anything I suspect you can actually get a lot closer than you think) but it calculates for all packages (read: is slow) and doesn't give a good way to restrict things (hence that hack-y script which is a modified version of the pip list code). This is 100% a hard requirement for config management systems and if not fixed in pip, will require continued use of internal APIs. I would recommend just making pip list take a set of install-compatible names/version patterns and apply that as a filter in a similar way to what I've done there.

--Noah

> On Oct 20, 2017, at 11:35 AM, xoviat <xov...@gmail.com> wrote:
>
> There's no dry-run functionality that I know of so far. However, you could use the following:
>
> pip install --prefix=tmpdir
>
> This command is actually about the same speed as a proper implementation, because we can't actually know what we're installing until we build the requirements.

Noah Kantrowitz

unread,
Oct 20, 2017, 2:49:07 PM10/20/17
to xoviat, Paul Moore, pypa-dev, Distutils
While I understand that pip itself has to be very careful about edge cases and all the pathological things you can do in setup.py, as a higher-level tooling author my priorities are on the happy path UX and speed is a big factor there. So yes, using PackageFinder is potentially inaccurate, but it's also _usually_ accurate :) Anyways, if there is true concern that finder-based approaches are too risky, probably don't offer it in the pip list output.

--Noah

> On Oct 20, 2017, at 11:43 AM, xoviat <xov...@gmail.com> wrote:
>
> A correct dry-run implementation will do about the same amount of work as installing to a temporary directory right now. In the future, that could be optimized, but any patch to the finder doesn't actually detect the requirements correctly (as they're not necessarily known until after the wheels are built).

Doug Hellmann

unread,
Oct 20, 2017, 4:55:47 PM10/20/17
to pypa-dev
Excerpts from Paul Moore's message of 2017-10-20 14:22:03 +0100:
What do you think about posting to http://blog.python.org to try
to get more attention for this?

Doug

Paul Moore

unread,
Oct 20, 2017, 5:27:57 PM10/20/17
to Doug Hellmann, pypa-dev
It's OK with me if someone wants to do that.
Paul

Richard Jones

unread,
Oct 20, 2017, 6:53:44 PM10/20/17
to Paul Moore, Distutils, pypa-dev
Hiya Paul,

There's a bunch of tooling out there using pip's internals to extending pip's functionality. Could you please provide a some reasoning as to why they're all going to be broken at pip 10, and possibly some guidance on how to get that functionality back?


Cheers,

     Richard

Paul Moore

unread,
Oct 21, 2017, 6:03:23 AM10/21/17
to Richard Jones, Distutils, pypa-dev
On 20 October 2017 at 23:53, Richard Jones <r1char...@gmail.com> wrote:
> Hiya Paul,
>
> There's a bunch of tooling out there using pip's internals to extending
> pip's functionality. Could you please provide a some reasoning as to why
> they're all going to be broken at pip 10, and possibly some guidance on how
> to get that functionality back?

Hi Richard,
There was a change to the pip docs that clarified the status of pip's
internal code. The PR for that is at
https://github.com/pypa/pip/pull/4743 but unfortunately it appears
that the docs build has been failing so it hasn't yet made it to the
formal pip docs site.

To summarise, pip has *never* supported the use of its internal APIs
by external code. Over time, we've had a steady trickle of people
raising issues when their code broke because of doing so, and it
usually turned out to be because they violated assumptions made by the
pip code - such as that it's running in a single-threaded application,
or it has sole control over the logging subsystem, or even just that
you can run your own code after calling a pip API. We've explained
this position regularly on those issues, but as is typical, people
don't manage to find similar issues when raising new ones, so we spent
a lot of time repeating ourselves.

Coming up to pip 10, there's been a *lot* of internal work going on,
and it's fairly likely that a decent chunk of code using pip's
internal APIs will break anyway. We don't document internals changes,
so we faced the possibility of an extended period of people raising
issues saying "you broke my code" and us having no better response
than "you shouldn't do that", which would likely hinder adoption of
pip 10, and cause problems for the ecosystem as a whole. Rather than
do this, we decided to make a clean compatibility break, where we
could send out a clear message - "everything's moved, if that matters
to you, then you were using unsupported functionality and you should
find a better way". The breakage is still there (and certainly we made
it affect more people, as there are no doubt some people who would
have survived the pip 10 release unscathed if we hadn't done this) but
at least it's clearly defined and contained.

As to alternatives, we don't have all the answers here but I can offer
the following suggestions:

1. There are a number of external packages that provide functionality
equivalent to what pip does - packaging, wheel, distlib, pkg_resources
are the ones I'm aware of. These are designed as libraries and so *do*
provide supported APIs.
2. We're making a strong push to standardise *all* of the external
interfaces that pip uses, so you should be far more able to write your
own code if that's necessary, without worrying that it'll work
differently than pip does, or that things will suddenly change and
break your code.
3. You can call pip as a subprocess - that's always been supported and
will remain so. We've added some automation-friendly features there
(such as json output format for "pip list") and we'd be happy to add
more if people want to submit PRs.

Likely the biggest problems will be for people who call into the pip
resolver and build APIs, as I don't know of any alternatives out
there. But they were *definitely* breaking anyway, as we've made major
changes to that code (and will be making more).

Also, I should note that we didn't take this decision lightly. We
don't have any particular objection in principle to having a supported
stable pip API, it's just that we don't have anything even close to
the resources needed to define a supported API, maintain it with
acceptable backward compatibility guarantees, and support users who
will inevitably be using it in unexpected and creative ways (we don't
even have the resources to support the limited use of pip's internals
that we see today). Also, pip was never designed originally as a
library, so we *would* have to design that API from scratch. As I
alluded to above, the existing internals code makes some strong
assumptions about how it's called - assumptions that would be
unacceptable in library code, but are fine in an application's
internal code.

Paul

PS People who want to, can of course hunt out the new equivalents of
the code they were using, and just switch. It's not like we can stop
them. But the new names make it clear that they shouldn't be doing
this, so there's an obvious warning there.
PPS Please disregard xoviat's response. This is something we've been
considering for a while, and most definitely not a spur of the moment
decision. It's unfortunate that he was the one most immediately
affected by the change when we made it, but that was just bad timing
(we didn't suddenly do this just because "someone complained").

Nick Coghlan

unread,
Oct 21, 2017, 7:15:19 AM10/21/17
to Paul Moore, Richard Jones, pypa-dev, Distutils
On 21 October 2017 at 20:03, Paul Moore <p.f....@gmail.com> wrote:
Likely the biggest problems will be for people who call into the pip
resolver and build APIs, as I don't know of any alternatives out
there. But they were *definitely* breaking anyway, as we've made major
changes to that code (and will be making more).

Also, I should note that we didn't take this decision lightly. We
don't have any particular objection in principle to having a supported
stable pip API, it's just that we don't have anything even close to
the resources needed to define a supported API, maintain it with
acceptable backward compatibility guarantees, and support users who
will inevitably be using it in unexpected and creative ways (we don't
even have the resources to support the limited use of pip's internals
that we see today). Also, pip was never designed originally as a
library, so we *would* have to design that API from scratch. As I
alluded to above, the existing internals code makes some strong
assumptions about how it's called - assumptions that would be
unacceptable in library code, but are fine in an application's
internal code.

(Note: this is entirely speculative, and I have no idea how hard it would be, so please read it as the question it's intended to be)

Do you know if there any key APIs (like installation) that could be turned into wrappers around pip CLI calls in order to mitigate some of the impact?

The reason I ask is because it's unlikely anyone else is going to understand how to emulate the previous functionality better than the pip devs would, and if there's an API for those particular invocations, than they can be covered directly by pip's test suite.

Plus if there are previous API capabilities that *can't* currently be emulated via the CLI, then the pip devs are the folks in the best position to add the necessary CLI enhancements (such as the ones Noah asked about for doing a more selective `pip list`).

If that's an approach you might be amenable to, then a 10.0 pre-release could be a good time to solicit PRs from folks that were using particular APIs and would be prepared to invest time in defining comparable CLI wrappers for them.

Cheers,
Nick.

--
Nick Coghlan   |   ncog...@gmail.com   |   Brisbane, Australia

Richard Jones

unread,
Oct 21, 2017, 7:43:34 AM10/21/17
to Paul Moore, Distutils, pypa-dev
Thanks for writing that detailed explanation, Paul (and all your other hard work!)


     Richard

Paul Moore

unread,
Oct 21, 2017, 8:34:39 AM10/21/17
to Nick Coghlan, Richard Jones, pypa-dev, Distutils
On 21 October 2017 at 12:15, Nick Coghlan <ncog...@gmail.com> wrote:
> (Note: this is entirely speculative, and I have no idea how hard it would
> be, so please read it as the question it's intended to be)

No problem - I don't know myself how hard some of this would be, either ;-)

> Do you know if there any key APIs (like installation) that could be turned
> into wrappers around pip CLI calls in order to mitigate some of the impact?

The obvious one is pip.main(). That's the one a lot of people use, but
it's easily replaceable by a simple subprocess call. That's actually
one of the reasons this was so frustrating to us - the bug reports we
got were often from people doing things they didn't need to, that they
could handle trivially via a supported approach.

Otherwise, no. We've had little or no feedback on the tracker from
people using more complex internals, so our working assumption has
been there's very little that can't be handled via the CLI or existing
packages. Feedback so far from this mail hints that maybe we were
wrong, but it's still hard to know if it's one or two key projects, or
a whole range of people that we've yet to hear from. I'm pretty sure,
for example, that pipenv uses internals, either directly or via one of
their dependencies, but we've not seen any feedback from them yet.

> The reason I ask is because it's unlikely anyone else is going to understand
> how to emulate the previous functionality better than the pip devs would,
> and if there's an API for those particular invocations, than they can be
> covered directly by pip's test suite.

Understood. We understand *how*, but don't know what is needed. One of
the points of all this is to tease out such requirements. We'd hope to
get them addressed by including them into *other* packages like
distlib or packaging, but that's mostly just a matter of where the API
goes, not what is needed. It does help us avoid tying fixes to pip's
release cycle, though.

It's also worth noting that the pip devs are possibly way too deep
into how pip does things, rather than what the standards say. Getting
others to implement libraries based on the published standards would
help immensely to ensure that we're not falling into the
"implementation defined" trap of the community being stuck with having
to use pip because no-one else knows how to do what pip does.

> Plus if there are previous API capabilities that *can't* currently be
> emulated via the CLI, then the pip devs are the folks in the best position
> to add the necessary CLI enhancements (such as the ones Noah asked about for
> doing a more selective `pip list`).

Oh, sure - apart from the aforementioned "pip.main()", most
capabilities of the internal API are *not* easily emulated by the CLI.

But 3rd party libraries are just as much an option. Remember, the
issue here isn't so much about designing an API as about exposing (and
therefore locking in stone) internal implementation details of how pip
does things. And the pip devs are arguably in the *worst* position to
handle that option, precisely because they know so much about how pip
does things.

> If that's an approach you might be amenable to, then a 10.0 pre-release
> could be a good time to solicit PRs from folks that were using particular
> APIs and would be prepared to invest time in defining comparable CLI
> wrappers for them.

Well, I get your point here, but the implication of this is that we
have to design and build an API before we can release pip 10. Calling
it CLI wrappers (and implementing it that way) doesn't do anything to
ease the work needed on design, or the maintenance burden of providing
stability. We're already getting a lot of pressure to release pip 10,
and trying to do that would push it way further off.

To be blunt, no-one on the pip team is really interested in trying to
provide or support a stable API at this point in time. We have our
hands full, as everyone is aware, and this is way down all our
priorities. Community-submitted PRs would help, but there's still work
in agreeing a design, reviewing those PRs, and maintaining them
(exposing details from the internal APIs limits how much we can change
those internals later, which is something we have to consider).

What would be ideal would be for the community to build
standards-based libraries that satisfied the needs that people
currently handle by using pip internals. Heck, we could even consider
vendoring such libraries inside of pip and saving ourselves a bunch of
complexity ;-) The packaging, pkg_resources and distlib libraries are
examples of this happening. But it doesn't seem to be a popular route,
unfortunately. And those particular libraries are all maintained by
PyPA members, so don't really count as examples of the community
getting involved...

The most likely alternative solution would be to revert the internal
API moves. If that happened, I'd still prefer to lose pip.main, as
that's the major pain point for us and the easiest for users to solve,
but we could put the rest back. But it would be naive to assume that
as a result people like Noah would be unaffected - they'd actually be
worse off, as the internal APIs they use would probably break anyway
(the installer/resolver stuff has been heavily modified for pip 10)
but not all would do so in obvious ways.

Beyond this, I don't know.

Paul

Donald Stufft

unread,
Oct 21, 2017, 2:26:29 PM10/21/17
to Brett Cannon, xoviat, Distutils, pypa-dev, Richard Jones


On Oct 21, 2017, at 2:15 PM, Brett Cannon <br...@python.org> wrote:

as long as the module isn't already imported it's fine.

Negative imports get cached too don’t they?

Nick Coghlan

unread,
Oct 21, 2017, 10:30:46 PM10/21/17
to xoviat, Paul Moore, Distutils, pypa-dev, Richard Jones
On 22 October 2017 at 04:03, xoviat <xov...@gmail.com> wrote:
Nick:

That's generally a good idea, but one significant problem that can occur is that the Python import system will cache certain libraries, people will run "pip install," and then they will expect such libraries to be available. I don't even know exactly how the caching for the import system works, so I don't want to go and make claims about it that may be incorrect (maybe you do?). However, it is important to keep that in mind when considering an API.

Yep, since we switched to the implementation that uses fewer stat calls, you need to call `importlib.invalidate_caches()` to be sure your current process will see anything you just installed. However, for modules you haven't previously imported, that's independent of how you installed them - the problem is that the granularity on the import system's automated cache invalidation depends on the granularity of the filesystem's mtime records for directories, so you either have to sleep for a few seconds (since the mtime resolution on filesystems like FAT & FAT32 is only a couple of seconds), or else force the cache invalidation.

The problem with the sys.modules cache and already imported libraries is different: for those, you either need to use `importlib.reload(existing_module)` to force an in-place update of the existing namespace, or else `del sys.modules[existing_module.__name__]` to force creation of a new module without affecting old references to it.

It's those subtleties that keep "Restart all Python processes if you expect them to see components you just installed" in place as our default guidance: it's the only option that's guaranteed to work in all cases. While hot reloading can be made to work, it has assorted caveats and restrictions that you need to learn in order to do it correctly and reliably (many of them are actually pretty similar to the self-imposed restrictions needed to make lazy loading work reliably).

However, none of that impacts the question of whether `pip.main()` runs code in the current process or implicitly runs it in a subprocess - `pip` doesn't import the modules it installs either way, so it will all look the same as far as the import system is concerned.

Donald Stufft

unread,
Oct 22, 2017, 2:10:09 AM10/22/17
to Nick Coghlan, xoviat, Paul Moore, Distutils, pypa-dev, Richard Jones

On Oct 21, 2017, at 10:30 PM, Nick Coghlan <ncog...@gmail.com> wrote:

However, none of that impacts the question of whether `pip.main()` runs code in the current process or implicitly runs it in a subprocess - `pip` doesn't import the modules it installs either way, so it will all look the same as far as the import system is concerned.


That of course also ignores things like “foo.py optionally imports bar.py if it is available”, with something like:

try:
    import bar
except ImportError:
    bar = None


If you then did ``import foo``, noticed the optional features powered by bar weren’t added, you would also have to reload ``foo`` (and track down any references to it!). Reloading modules in Python is hard :/


The additional thing here is that the import system isn’t the only cache at play either— pkg_resources builds a cache of installed packages at import time too and that has to get invalidated somehow as well. This took pip like 3 or 4 versions to get right because we tried to detect what version of pip was installed _after_ we installed all the versions in order to stop doing the outdated warning with ``pip install -U pip`` and I’m still not entirely convinced we’re not erroneously spitting out the warning in some cases.

The end result of all of this is unless you’re really really careful and design things just right and know exactly how not only yourself, but all your dependencies are written (and track how they change in different versions!) and also how things that use _you_ are going to import you and your dependencies… it’s basically playing whack a mole and is almost never worth the effort.

Matthew Brett

unread,
Apr 14, 2018, 4:57:47 PM4/14/18
to Elvis Stansvik, Paul Moore, Distutils, pypa-dev, Richard Jones
Hi,

On Sun, Oct 22, 2017 at 8:52 AM, Elvis Stansvik
<elvis.s...@orexplore.com> wrote:
> 2017-10-21 14:34 GMT+02:00 Paul Moore <p.f....@gmail.com>:
>> On 21 October 2017 at 12:15, Nick Coghlan <ncog...@gmail.com> wrote:
>>> (Note: this is entirely speculative, and I have no idea how hard it would
>>> be, so please read it as the question it's intended to be)
>>
>> No problem - I don't know myself how hard some of this would be, either ;-)
>>
>>> Do you know if there any key APIs (like installation) that could be turned
>>> into wrappers around pip CLI calls in order to mitigate some of the impact?
>>
>> The obvious one is pip.main(). That's the one a lot of people use, but
>> it's easily replaceable by a simple subprocess call. That's actually
>> one of the reasons this was so frustrating to us - the bug reports we
>> got were often from people doing things they didn't need to, that they
>> could handle trivially via a supported approach.
>>
>> Otherwise, no. We've had little or no feedback on the tracker from
>> people using more complex internals, so our working assumption has
>> been there's very little that can't be handled via the CLI or existing
>> packages. Feedback so far from this mail hints that maybe we were
>> wrong, but it's still hard to know if it's one or two key projects, or
>> a whole range of people that we've yet to hear from. I'm pretty sure,
>> for example, that pipenv uses internals, either directly or via one of
>> their dependencies, but we've not seen any feedback from them yet.
>
> Another one that immediately comes to mind is pip-tools [1], which I
> think is quite widely used.
>
> But I just checked, and they filed a "check out how to deal with pip
> 10" issue two days ago [2] (I'm guessing in response to this thread).

Now pip 10 is out, of course, I discover that I've lost the
implementation of `get_supported` in pip.pep425tags. It's my fault
for not remembering I had used it.

Is the suggestion to use the `_internal` import, or carry a copy of
the pep425tags code myself, that I have to keep up to date with the
internal pip copy? That would also involve me copying the `glibc`
part of the code. I see that the `wheel` package has an old copy of
that code too, which doesn't deal with manylinux wheels. You
probably saw that `pip-tools` ended up vendoring the whole of pip9
[1].

Cheers,

Matthew

[1] https://github.com/jazzband/pip-tools/tree/master/piptools/_vendored/pip

Donald Stufft

unread,
Apr 14, 2018, 5:31:44 PM4/14/18
to Matthew Brett, Elvis Stansvik, Paul Moore, Distutils, pypa-dev, Richard Jones

On Apr 14, 2018, at 4:57 PM, Matthew Brett <matthe...@gmail.com> wrote:

Is the suggestion to use the `_internal` import, or carry a copy of
the pep425tags code myself, that I have to keep up to date with the
internal pip copy?  That would also involve me copying the `glibc`
part of the code.  I see that the `wheel` package has an old copy of
that code too, which doesn't deal with manylinux wheels.    You
probably saw that `pip-tools` ended up vendoring the whole of pip9
[1].

The best solution is to figure out what APIs people need, and either add them to packaging and have pip consume that as well as anyone else, or make a new library for the same.

If that’s unacceptable, vendoring or version pinning is probably the best option.

Nick Coghlan

unread,
Apr 14, 2018, 11:24:18 PM4/14/18
to Donald Stufft, Matthew Brett, Elvis Stansvik, Paul Moore, Distutils, pypa-dev, Richard Jones
I think there are going to be at least two steps involved for most
projects affected by the API change:

1. The quick fix to add pip 10 compatibility (which is likely a matter
of "copy the code you need into the project that needs it")
2. The technical debt reduction to reduce code duplication (which is
likely a matter of "add the required APIs to the 'packaging' project")

Step 2 is the step that the pip internals refactoring is designed to
encourage, as we believe a lot of tool developers were just using
pip's internal APIs rather than filing RFEs and submitting PRs to help
guide the evolution of the stable APIs in packaging in a use-case
driven manner.

FWIW, `pipenv`'s currently still on "Step 1" at the moment (and has a
lot of internal refactoring of its own ahead of it before it will
really move on to step 2).

Dan Ryan

unread,
Apr 15, 2018, 2:57:01 AM4/15/18
to Nick Coghlan, Donald Stufft, Matthew Brett, Elvis Stansvik, Paul Moore, Distutils, pypa-dev, Richard Jones
FWIW we are using quite a bit of the internal api. My plan was to go over this once we cut over to the new warehouse uris. Of note might be the fact that pip-tools is a core dependency we bundle in pipenv and the current maintainer is a pipenv maintainer as well. For our specific case we have made sizeable changes to the dependency resolution stack and bundling allows us to patch freely.

I don’t know that we are a good example though, we are doing significantly more with pip internals than the average project

-dan

Dan Ryan // pipenv maintainer
gh: @techalchemy
Reply all
Reply to author
Forward
0 new messages