New Python pipeline is open for business!

118 views
Skip to first unread message

Benjy Weinberger

unread,
Mar 24, 2017, 6:28:07 PM3/24/17
to pants-devel
Hi all, 

I just merged https://github.com/pantsbuild/pants/pull/4316, which enables the new Python pipeline in the pants repo.

The old pipeline will be deprecated very soon, so please try the new one out in your repo as soon as you can (see that pull request for how to enable it). Your users will benefit from a much more snappy Python experience.

If you have any custom tasks based on `PythonTask` you will need to rewrite them in the new pipeline, but this is very straightforward.

Meanwhile, as you develop in the pants repo, please be alert for any issues due to the new pipeline.

Happy Friday, 
~ Benjy


Benjy Weinberger

unread,
Mar 24, 2017, 7:50:12 PM3/24/17
to pants-devel
In case you're curious, the main improvement is that the new pipeline is a proper product pipeline, where the main products are "selected interpreter", "python sources" and "3rdparty requirements".  The new pipeline manages the source and requirement pexes separately, and  virtually "merges" them before execution, using pex's efficient PEX_PATH mechanism.

As a result, if you only modify python sources, but the external requirements don't change, only the source pex needs rebuilding, and this is much faster than today's need to expensively rebuild the single sources and requirements pex.  

The new pipeline also uses normal Pants artifact caching, and its invalidation is more fine-grained.

The overall effect of all these changes is a much snappier Python experience!


Stu Hood

unread,
Mar 30, 2017, 10:08:56 PM3/30/17
to Benjy Weinberger, pants-devel
Twitter's entire python test suite passed with the new backend: looking really solid.

Congratulations, and thank you!

Benjy Weinberger

unread,
Mar 31, 2017, 12:43:16 AM3/31/17
to Stu Hood, pants-devel
Wow! That's huge!!  How much massaging did it take?

Mateo Rodriguez

unread,
Mar 31, 2017, 12:51:45 AM3/31/17
to Benjy Weinberger, Stu Hood, pants-devel
Wow!

Twitter moving onto it so quickly is a strong endorsement of the work, which I already assumed to be excellent.
We are _very_ interested in consuming this as well, I hope to carve out the time to dig in again soon.

Thanks for tackling such a big project Benjy. The Python backend was such a long running TODO, it is hard to believe how far it has come.

- Mateo

Stu Hood

unread,
Apr 1, 2017, 7:17:39 PM4/1/17
to Benjy Weinberger, pants-devel
Soo, it looks like one of our internal plugins was wrapping the PytestRun task, and using the old classname to do so. Consequently, our clean run was not actually using the new backend at all.

From scanning through the logs, it looks like we have a fair amount of undeclared dependency work (and thus possibly a few cycles) to shake out before we'll be able to uncover any more substantive issues.

Very sorry for the false report! We'll But enabling by default upstream would probably make perfect sense, so that users with new or fewer targets can immediately gain the benefits.

Benjamin Yates

unread,
Apr 13, 2017, 9:40:04 AM4/13/17
to Pants Developers

The old pipeline will be deprecated very soon, so please try the new one out in your repo as soon as you can (see that pull request for how to enable it). Your users will benefit from a much more snappy Python experience.


The new pipeline indicates that it only supports running pytest in the Fast mode.  How would I run tests that have different pytest configurations without the slow mode?  For example, I host multiple Django projects in a monorepo and in the fast mode I can't run the global test suite since everything runs within the scope of a a single interpreter/session.

Benjy Weinberger

unread,
Apr 13, 2017, 7:20:24 PM4/13/17
to Benjamin Yates, Pants Developers
You would have to run them in separate pants invocations. Is that infeasible? It's what one has to do today for JVM.

What's an example of a different configuration? E.g., different dependencies?

Benjamin Yates

unread,
Apr 18, 2017, 11:53:13 AM4/18/17
to Pants Developers, benj...@rqdq.com
It seems consistent with large-scape monorepo-tooling.  I should be able to run tests in e.g. a common library and I shouldn't need to have intimate knowledge of every individual subproject to do it.

In this case, I have generic library code, and also some Django projects.  Django has its own testing peculiarities, but there is a plugin to make it work with Pytest, and it works fine, except that once Django is "started" it can't be torn down (afaik) and even if it could be torn down, I'm not sure that would still fit into the pytest/unittest wrapper.

So despite being slow, running each python_test target in its own interpreter is a viable solution.  Some of that performance hit can also be made up for with sharding/multicore.

Also, I'm not having any luck getting py coverage to work in recent builds, seemed to work fine earlier in the 1.3 series.  Is that expected or possibly a configuration issue of mine?

Thanks

Benjy Weinberger

unread,
Apr 18, 2017, 2:51:32 PM4/18/17
to Benjamin Yates, Pants Developers
On Tue, Apr 18, 2017 at 8:53 AM, Benjamin Yates <benj...@rqdq.com> wrote:
It seems consistent with large-scape monorepo-tooling.  I should be able to run tests in e.g. a common library and I shouldn't need to have intimate knowledge of every individual subproject to do it.

In this case, I have generic library code, and also some Django projects.  Django has its own testing peculiarities, but there is a plugin to make it work with Pytest, and it works fine, except that once Django is "started" it can't be torn down (afaik) and even if it could be torn down, I'm not sure that would still fit into the pytest/unittest wrapper.

So the issue is not that you have colliding dependencies (e.g., incompatible versions of 3rdparty deps) but that everything runs in a single pytest invocation? Because that, specifically, wouldn't be hard to change.  The old "nofast" mode actually created a new chroot for each test target, including resolving all its 3rdparty requirements. So it was horribly slow, but also completely isolated. But it sounds like possibly your requirements are less severe?
 

So despite being slow, running each python_test target in its own interpreter is a viable solution.  Some of that performance hit can also be made up for with sharding/multicore. 

Also, I'm not having any luck getting py coverage to work in recent builds, seemed to work fine earlier in the 1.3 series.  Is that expected or possibly a configuration issue of mine?

How are you specifying that config? We upgraded to non-prehistoric versions of pytest, pytest-cov and coverage, which necessitated a change in how those options work on our end. 

Benjamin Yates

unread,
Apr 18, 2017, 3:07:47 PM4/18/17
to Pants Developers, benj...@rqdq.com


So the issue is not that you have colliding dependencies (e.g., incompatible versions of 3rdparty deps) but that everything runs in a single pytest invocation? Because that, specifically, wouldn't be hard to change.  The old "nofast" mode actually created a new chroot for each test target, including resolving all its 3rdparty requirements. So it was horribly slow, but also completely isolated. But it sounds like possibly your requirements are less severe?

In my case, the entire monorepo is required to use the same 3rdparty libs.  For development (i.e. virtualenv) one can just install the 3rdparty/requirements.txt and run any code in the system.  So no, the chroot-level isolation is not necessary to solve my issue. Having new interpreter contexts would be sufficient.  A potential downside would be masking deficient dependency lists, but that seems to be the case with the fast-mode method now.




Also, I'm not having any luck getting py coverage to work in recent builds, seemed to work fine earlier in the 1.3 series.  Is that expected or possibly a configuration issue of mine?

How are you specifying that config? We upgraded to non-prehistoric versions of pytest, pytest-cov and coverage, which necessitated a change in how those options work on our end. 

If nothing strikes you as obvious, I will compare my working/non-working branches again and start a new thread once I'm more confident about the presence of a problem.

Benjy Weinberger

unread,
Apr 18, 2017, 3:14:04 PM4/18/17
to Benjamin Yates, Pants Developers
Well, one obvious thing is that you don't specify the 'path:' or 'module:' prefixes any more. Just list directories and/or packages (under new coverage versions they must be directories, not individual .py files).
 

Benjamin Yates

unread,
Apr 18, 2017, 3:20:08 PM4/18/17
to Benjy Weinberger, Pants Developers
On 4/18/2017 3:13 PM, Benjy Weinberger wrote:
> Well, one obvious thing is that you don't specify the 'path:' or
> 'module:' prefixes any more. Just list directories and/or packages
> (under new coverage versions they must be directories, not individual
> .py files).

I've followed the layouts of other pants projects, so I have
/src/python/{{ owned-python-modules-begin-here }}
and
/tests/python/{{ tests, but not necessarily parallel to /src/python }}

What would I pass to specify everything in e.g. /src/python/* ?

Also, is there documentation on the /tests/python/ tree and how it is
mapped into the interpreter scope? It seems to be trivial to make a
test module/dir that shadows modules of the same name in the /src/python/.

Benjy Weinberger

unread,
Apr 18, 2017, 3:42:13 PM4/18/17
to Benjamin Yates, Pants Developers
If you specify nothing, Pants maps tests/python structure to src/python structure, e.g., tests/python/foo/bar is assumed to cover src/python/foo/bar.

To specify a dir explicitly, just provide it (absolute path, or relative to the buildroot) as the value to `--coverage` in the `test.pytest` scope.

Benjy Weinberger

unread,
Apr 20, 2017, 2:43:25 PM4/20/17
to Benjamin Yates, Pants Developers
Update: --no-fast now works again in the new pytest task.  But it doesn't support conflicting dependencies, just running each target in its own pytest invocation.  All targets in play must still be able to select an interpreter and resolve requirements together.

Nadav Samet

unread,
May 17, 2017, 7:50:12 PM5/17/17
to Pants Developers, benj...@rqdq.com
The new pipeline is really awesome - things are way faster than what they used to be. One improvement that I'd like to have is the ability to test/build pexes for multiple incompatible targets in a single pants run. It looks like it should be possible to teach "SelectInterpreter" to divide the set of root targets into disjoint subsets groups such that each subset is compatible. Instead of producing a single interpreter, SelectInterpreter would select an interpreter for each group. Then subsequent tasks would do their think based on that grouping. How does it sound? Probably too late for 1.3.0, but would happy to hack on it and send a PR in hope it gets into the 1.4.x series.

Benjy Weinberger

unread,
May 18, 2017, 2:10:23 AM5/18/17
to Nadav Samet, Pants Developers, Benjamin Yates
Hey Nadav,

Yes - this is on my mental TODO list.

Your sketch is correct - PytestRun can examine its options in its prepare() method, and use those to request global interpreter/deps/sources products, or per-target-root ones (it might be overkill to attempt to group those into compatible subsets), depending on what the relevant option (we can overload `--fast` for this) is set to.  Then SelectInterpreter, ResolveRequirements and GatherSources can act accordingly. 

A PR for 1.4.x would be awesome! Will be happy to review.

~ Benjy



Nadav Samet

unread,
May 18, 2017, 12:13:48 PM5/18/17
to Benjy Weinberger, Pants Developers, Benjamin Yates
Besides PytestRun, there are lint, setup-py and a few other custom tasks I developed for internal use. It would be great if all of these would be able to work by default when there are incompatible targets. Should each of them have a "--fast" mode that's off by default that affect behavior of SelectInterpreter and co? Or maybe SelectInterpreter would do a best effort to find a single interpreter, but fall back to per-target-root if that's impossible - and then the individual high level tasks can define if fail or proceed based on whether a single interpreter was identified?
--
-Nadav

Benjy Weinberger

unread,
May 18, 2017, 12:49:07 PM5/18/17
to Nadav Samet, Pants Developers, Benjamin Yates
Lint, SetupPy and Binary operate target-by-target anyway, and Repl makes no sense in the case of incompatible targets anyway. So I think it's really just PytestRun that has this issue, no?  What do your custom tasks do?

Nadav Samet

unread,
May 18, 2017, 1:30:38 PM5/18/17
to Benjy Weinberger, Pants Developers, Benjamin Yates
We have a custom lint and a custom task that builds and publishes wheels - all of which work target-by-target, but rely on getting the right interpreter per target. I'll start hacking on this in the next few days.
--
-Nadav

Benjy Weinberger

unread,
May 18, 2017, 1:46:47 PM5/18/17
to Nadav Samet, Pants Developers, Benjamin Yates
Right, but they can hard-code the requirement for per-target-root interpreters/requirements/sources, they don't need any fancy logic to figure out which product to ask for, which PytestRun would need.

Benjy Weinberger

unread,
May 18, 2017, 1:48:34 PM5/18/17
to Nadav Samet, Pants Developers, Benjamin Yates
Basically, the most straightforward model is, I think: tasks ask for the interpreter/source/requirements products they need (per-target-root or global) and the tasks that know how to produce these produce either, or both, as requested.

Nadav Samet

unread,
May 18, 2017, 1:57:41 PM5/18/17
to Benjy Weinberger, Pants Developers, Benjamin Yates
Yes - that makes sense.
--
-Nadav
Reply all
Reply to author
Forward
0 new messages