rules_pyz: Another set of Python rules that attempts to work with PyPI

2,249 views
Skip to first unread message

Evan Jones

unread,
Jan 29, 2018, 2:40:36 PM1/29/18
to bazel-discuss
I started a previous thread about Bazel's native Python rules not totally working with PyPI packages [1]. I mentioned that we've been experimenting with our own set of rules, so I figured I would share them, in case it helps with the discussion. These rules build an executable .zip for every Python target, very similar to Pex. It just takes much less time since it doesn't do as much, doesn't generate .pyc files, and doesn't compress the zip. The advantage is that since there is a single directory tree of Python files (and native code), there are no problems with namespace packages.


The rules also container a .bzl generator inspired by bazel-deps [2] that outputs rules that reference PyPI wheels. The advantage is that the generated build should be reproducible: It does not depend on executing tools outside of Bazel's sandbox.
It also attempts to generate select statements for Linux and Mac OS X native wheels. This is definitely gross and fragile and likely to break, but it works for our code.

I'm not sure I would totally recommend that others use them, but I am happy to help if you try!

Evan



Oleg Tsarev

unread,
Jan 29, 2018, 5:45:45 PM1/29/18
to bazel-discuss
Hi! 

I briefly looked and it looks like so, very, much, a lot promising.

Can you quickly answer to one question?

It's hard to teach tool install packages for set of platforms, not only for one?
We need it for build binaries with c extensions (psycopg2, numpy, pandas, PyYaml) for run under OSX (testing, for instance) and for linux (from OSX) for pack inside rules_docker.

We solve this problem by forking the native rules_python.

There is proof-of-concept fully compatible with native rules_python / rules_docker.

If rules_pyz compatible with rules_docker in our case (if we able to build under OSX docker-compatible py_binary), we will use your solution instead of our fork.

By the way, we have tool "panda" - similar to gazelle for go - which generate BUILD files for python based on python imports.
We are preparing tool for publish to opensource, and if rules_pyz suitable for rules_docker, we can switch fork on rules_python to rules_pyz and switch or "panda" BUILD-file generator to rules_pyz before publish.

With king regards, Oleg

Oleg Tsarev

unread,
Jan 29, 2018, 5:49:59 PM1/29/18
to bazel-discuss
Aha, looked deeply.

Seems like we need just:
1. for platform-specific wheels generate several platform-specific native.http_file (like pypi_psycopg2_any_linux... and pypi_psycopg2_darwin...)
2. implement special workspace rule, which produce pypi_psycopg2 based pypi_psycopg2_any_linux and pypi_psycopg2_darwin like we already do in our fork

Evan Jones

unread,
Jan 29, 2018, 5:56:00 PM1/29/18
to bazel-discuss
Hello! Wow, thank you for looking so quickly and carefully. It sounds like you figured it out, but I will add a platform-specific dependency to the example repo. For details:

Cross platform packages are a challenge. Here is our current approach, which is very flawed:


No cross-compilation: For now, if you build on OS X, you get a binary that runs on OS X. (This is also a place where Pex is more sophisticated: it supports cross-platform Pexes). If you want to build a Docker container, this means you need to run bazel build (target) on Linux. It should be possible to support cross-complication, but I don't understand how this works in Bazel. It seems easier to not worry about this for now.

Binary wheels must exist on PyPI: My pip_generate tool looks for platform-specific wheels. If it finds them, it tries to find other platforms. This is pretty hard coded to just Mac and Linux for now.

* If wheels don't exist on PyPI: It uses pip wheel to save them to some directory. The intention is you will upload them somewhere (e.g. a Google Cloud Storage bucket, S3?), so they can be downloaded by future builds. For some targets, this means I need to manually run the generate tool on Linux and Mac and manually edit the output.


It sounds like you figured it out though!


And yes, the general approach is VERY similar to https://github.com/bazelbuild/rules_python/pull/61

I need to look at that code more carefully: It would be nice if we could have a single, high quality tool, rather than a bunch of separate hacks. :(


I also admit: It is also inconvenient that my tools are written in Go, when building Python targets. This is partly just because I like writing in Go at the moment. I think ideally Python rules should not have a Go dependency.

Oleg Tsarev

unread,
Jan 29, 2018, 6:03:20 PM1/29/18
to bazel-discuss


On Monday, January 29, 2018 at 11:56:00 PM UTC+1, Evan Jones wrote:
Hello! Wow, thank you for looking so quickly and carefully. It sounds like you figured it out, but I will add a platform-specific dependency to the example repo. For details:

Cross platform packages are a challenge. Here is our current approach, which is very flawed:


No cross-compilation: For now, if you build on OS X, you get a binary that runs on OS X. (This is also a place where Pex is more sophisticated: it supports cross-platform Pexes). If you want to build a Docker container, this means you need to run bazel build (target) on Linux. It should be possible to support cross-complication, but I don't understand how this works in Bazel. It seems easier to not worry about this for now.


Binary wheels must exist on PyPI: My pip_generate tool looks for platform-specific wheels. If it finds them, it tries to find other platforms. This is pretty hard coded to just Mac and Linux for now.

* If wheels don't exist on PyPI: It uses pip wheel to save them to some directory. The intention is you will upload them somewhere (e.g. a Google Cloud Storage bucket, S3?), so they can be downloaded by future builds. For some targets, this means I need to manually run the generate tool on Linux and Mac and manually edit the output.


It sounds like you figured it out though!

Of course, we understand limitation. In general this will works only for packages which have pre-compiled to target platforms wheels.
It's perfectly acceptable for us, we ready to build platform-specific wheels (for instance, for PyYaml).

In general it would be default behavior like now, and by some special flags enable the switch between platforms wheels/aggregation of wheels like I described.

For this we need:
- extra options for generate dependencies for several platforms (like --platform=manylinux1_x86_64,macosx_10_9_intel)
- detect the package which REQUIRES compilation for platform and  packages DOES NOT have platform-specific wheels (and report it's as error)
- provide the way to use vendored wheels (for repository) - seems like it's perfectly doable, we need just inform bazel-bin/external/com_bluecore_rules_pyz/pypi/pip_generate_wrapper about local directory with pre-built wheels).

If it's ok for you,
1.  we will test your solution for our case with docker
2. we will provide PR to your repository for support build for list-of-platform

This PR will allow users run tests under the OSX and AT THE SAME TIME use these binaries for py_image/py3_image rules from native rules_docker

Thank you very much for share your work,
Oleg

Evan Jones

unread,
Jan 29, 2018, 6:13:39 PM1/29/18
to bazel-discuss
This would be amazing! I'm happy to accept pull requests.

I'll also attempt to document how pip_generate works better. It already does PART of what you want (see the -wheelDir and -wheelURLPrefix flags), but it is ... very hacky about how it uses them. I apologize in advance! This part is pretty ugly. I think the pyz_* rules themselves in "better" shape than the pip_generator tool. It may be best to take the tools from that other PR and get it to make the output in the format expected by the pyz_* rules, if they work better?



The output of pip_generate_wrapper --help can be slightly helpful at the moment, but I'll attempt to write some better documentation tomorrow.

Oleg Tsarev

unread,
Jan 29, 2018, 6:20:27 PM1/29/18
to Evan Jones, bazel-discuss
We will look deeper. 
Particularly, I am worry about the IDEA debugger, how it will works (right now we create separate virtualenv for that and debugger works).

Your documentation in general OK.

I missed following point in your message:

Binary wheels must exist on PyPI: My pip_generate tool looks for platform-specific wheels. If it finds them, it tries to find other platforms. This is pretty hard coded to just Mac and Linux for now.

If wheels don't exist on PyPI: It uses pip wheel to save them to some directory. The intention is you will upload them somewhere (e.g. a Google Cloud Storage bucket, S3?), so they can be downloaded by future builds. For some targets, this means I need to manually run the generate tool on Linux and Mac and manually edit the output.

Seems like, it's would be enough for our case - we can store platform-specific wheels in our repository (using git LFS for instance) or in special "vendor" repository with blobs (using git LFS, to keep pre-push/pre-commit hooks clean for internal usage).


I will test this behavior for our case tomorrow, if everything fine - will rethink again about the missed binary wheels, probably we DON'T need any patches.
And you community contribution motivate me to complete work on "panda" BUILD file generate and provide it to your repo as out-of-the-box solution.

After that rules_pyz will be very close to rules_go (the single difference after that -  download specific version of python to platform need to be implemented, and rules_pyz will be as cool as rules_go cool).
 

--
You received this message because you are subscribed to a topic in the Google Groups "bazel-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-discuss/oXz0o6B9tAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/49bd3e17-fc26-4498-89de-43f9913e0407%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Evan Jones

unread,
Jan 29, 2018, 7:20:11 PM1/29/18
to Oleg Tsarev, bazel-discuss
For the debugger: You can unzip the generated into a new directory, and try executing the directory or __main__.py. For example, this should work:

mkdir my_target
cd my_target
unzip ../bazel-bin/path/my_target
cd ..
python my_target

That makes my_target contain the entire contents of the generated zip, which is really just a "plain" PYTHONPATH with all the files packed into it. __main__.py has a bit of "magic" to make the zip/directory executable, and remove the system site-packages directories from sys.path, but you don't even need to use that if it isn't useful.


Replies inline below:


On Mon, Jan 29, 2018 at 6:19 PM, Oleg Tsarev <ol...@oleg.sh> wrote:
Seems like, it's would be enough for our case - we can store platform-specific wheels in our repository (using git LFS for instance) or in special "vendor" repository with blobs (using git LFS, to keep pre-push/pre-commit hooks clean for internal usage).

I will test this behavior for our case tomorrow, if everything fine - will rethink again about the missed binary wheels, probably we DON'T need any patches.
And you community contribution motivate me to complete work on "panda" BUILD file generate and provide it to your repo as out-of-the-box solution.

I think this should be possible! If you can generate the right output into pypi_rules.bzl, I *think* the pyz rules should work correctly. At the moment, pip_generate won't output the exact right pyz_libraries or http_archives, but it should be close.

 
After that rules_pyz will be very close to rules_go (the single difference after that -  download specific version of python to platform need to be implemented, and rules_pyz will be as cool as rules_go cool).

Ha! I hope it can be useful! I will warn you this has not been used outside of our organization, so it is highly likely there are bugs! I'm happy to try and help fix them though.

Evan

Marcel Hlopko

unread,
Jan 30, 2018, 11:54:51 AM1/30/18
to Evan Jones, lbe...@google.com, Oleg Tsarev, bazel-discuss

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CACRPLjhH3bKFrFCzzyCfr5qq2%3DV%2BOoLZjVnkeakL840KyeBGOg%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
-- 
Marcel Hlopko | Software Engineer | hlo...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Alexey Dudko

unread,
Jan 30, 2018, 2:38:38 PM1/30/18
to bazel-discuss
Hi Evan!

My colleague Oleg Tsarev gave me a link to this discussion. That's an interesting approach for working with distributed python binaries! 

Since my issues is to manage external dependencies I mainly focused on this part of your solution.

What I really liked is that you have a separate bazel tool to generate the .bzl file from requirements.txt, where the skalark extention is just a single rule that provide wheel files. This approach I think is more flexible and stable then the way it is implemented in rules_python using repository rules. I still face troubles when requirements.txt file is updated or the code in rules_python is changed.
However the implementation of generator looks a bit hacky and it is partially implemented in golang as you mentioned above. But I think if we will combine it with some ideas from my PR-61 we can have a great implementation.

So from the go code you run "pip wheel --verbose --requirement <file> --wheel-dir <dir>", then parse urls from stdout and generate the pypi_rules.bzl. I suggest to use "pip download" and provide the label to the downloaded wheels instead of parsed urls. This approach I think is more reliable and it will also allow to use local wheel files which does not exist in PyPI index.

It is really nice to see a different approach to solve the same problems, thank you for sharing it

If you do not mind I will use some ideas of yours in rules_python :-)

Cheers, 
Alexey

Evan Jones

unread,
Jan 31, 2018, 9:34:02 AM1/31/18
to Alexey Dudko, bazel-discuss
Thanks for the feedback, and I hope it was helpful! I basically agree with all your comments:

* pip_generate is *very* hacky. It was not ... well designed. It grew organically as we got more of our PyPI packages working. In our code base, the output still needs some manual massaging.

* It should not be in Go: I agree. I started this as a prototype just to understand the problem better. It is weird that a Python tool depends on Go code, and a "real" implementation probably shouldn't (no matter what my personal opinions about Go or Python may be)

* Thanks for the suggestion with pip download: I'll try it, although I believe that has the disadvantage that it will *not* build wheels, right? Currently, the way we use the generator is as follows:


1. Someone runs pip_generate, pointing it at a local directory.
2. It downloads/builds all the wheels, and generates the .bzl
3. We upload the directory of wheels to a Google Cloud Storage bucket.
4. The generated .bzl is checked in.


I'll experiment with this a bit.


However, I think the pyz_* rules themselves should be fairly robust, and I'm happy to help with bug reports or suggestions there :)

Evan




--
You received this message because you are subscribed to a topic in the Google Groups "bazel-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-discuss/oXz0o6B9tAI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-discuss+unsubscribe@googlegroups.com.

Evan Jones

unread,
Jan 31, 2018, 11:34:31 AM1/31/18
to Oleg Tsarev, Alexey Dudko, bazel-discuss
On Wed, Jan 31, 2018 at 9:51 AM, Oleg Tsarev <ol...@oleg.sh> wrote:
In general, I guess you can avoid zip-stage, it's would be enough just wrap native.py_{binary,library,test}.
We checked: you can copy wrapper and runfiles somewhere or inside docker, and everything works fine.

The problem I had without zipping a directory tree is namespace packages. The way that the native rules leave everything in the "workspace" paths and try to fix PYTHONPATH to find all of them does not work: https://github.com/bazelbuild/rules_python/issues/55

Putting everything into a single directory tree makes the layout look the same as when packages are pip installed. I wish this was not necessary, and maybe it isn't, but it works for us at the moment!

 
So, your solution with requirements.bzl is brilliant, I think, this is the most "bazel" way to work with external dependencies.
"wtool" way for rules_go - is not completed movement in this direction :)

Thanks! I totally just copied the design from the Go rules and from bazel-deps!

Evan


Oleg Tsarev

unread,
Feb 1, 2018, 7:24:48 AM2/1/18
to Evan Jones, Alexey Dudko, bazel-discuss
On Wed, Jan 31, 2018 at 5:33 PM, Evan Jones <evan....@bluecore.com> wrote:
On Wed, Jan 31, 2018 at 9:51 AM, Oleg Tsarev <ol...@oleg.sh> wrote:
In general, I guess you can avoid zip-stage, it's would be enough just wrap native.py_{binary,library,test}.
We checked: you can copy wrapper and runfiles somewhere or inside docker, and everything works fine.

The problem I had without zipping a directory tree is namespace packages. The way that the native rules leave everything in the "workspace" paths and try to fix PYTHONPATH to find all of them does not work: https://github.com/bazelbuild/rules_python/issues/55

Putting everything into a single directory tree makes the layout look the same as when packages are pip installed. I wish this was not necessary, and maybe it isn't, but it works for us at the moment!

Got it. We did not meet problem like your. But in general python provider, particularly python runfiles  is the big pain.


 
So, your solution with requirements.bzl is brilliant, I think, this is the most "bazel" way to work with external dependencies.
"wtool" way for rules_go - is not completed movement in this direction :)

Thanks! I totally just copied the design from the Go rules and from bazel-deps!

Interesting. We do not have java, that's why I did not look how maven support works.
 

Evan



Doug Greiman

unread,
Feb 1, 2018, 1:53:27 PM2/1/18
to bazel-discuss
Regarding the pain of generating .bzl from requirements.txt, have you seen the WORKSPACE.locked proposal?  https://docs.google.com/document/d/1HfRGRW4MwnVUG24rw3HJIBkdEYfXlESnlKOboO97A3A/edit?ts=59dcd33c#heading=h.t4l4z8rauhsq

Lukács T. Berki

unread,
Feb 2, 2018, 4:44:03 AM2/2/18
to Evan Jones, bazel-discuss
Wow. Very promising set of rules! I have a few questions about their design and the design of the platonic ideal of Python rules, which I'll break out into separate threads on the brand-new bazel-si...@googlegroups.com mailing list so that we don't end up with a centithread.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891
Reply all
Reply to author
Forward
0 new messages