--
You received this message because you are subscribed to the Google Groups "Bazel/Python Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-sig-pyth...@googlegroups.com.
To post to this group, send email to bazel-si...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-sig-python/CAOu%2B0LVdEMC-e92yG00xZ0SpL86x_E0k1oi2pVU%2BPOBs3FgFzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
On Wed, Feb 28, 2018 at 7:27 AM 'Lukács T. Berki' via Bazel/Python Special Interest Group <bazel-si...@googlegroups.com> wrote:Hey there,We've been pondering how best to express dependencies on PyPI (Python package manager) packages and I'd like to make sure that we don't contradict your plans too much.The brief description of the landscape is that PyPI processes a file called requirements.txt which contains a set of packages with version constraints (which version is installed possibly depends on the Python version and the OS) and some metadata (e.g. URL to find the packages), then it does transitive dependency resolution, fetches the packages required, possibly compiles native code, then installs the Python + native code somewhere.Question is, how best to integrate this in the brave new world of WORKSPACE files? In particular, your plans seem to hinge upon separating dependency resolution (non-hermetic) and actually fetching and building them (hermetic), which is difficult, because pip does both of these things. pip does have provisions for repeatability, but not for separating out the dependency resolution part.My best plan is that we would have a WORKSPACE file like this:pip_dependency_set(name="mydeps", requirements="my_requirements.txt")and the "dependency resolution" of this repository would entail running pip and creating an "installation bundle". Then "fetching and building" would be just unpacking it and adding a convenient BUILD file, e.g. with a target per installed library (i.e. @mydeps//:ladle would be the library called "ladle" fetched according to instructions in my_requirements.txt)
Of course, this would preclude cross-compilation,
abuse the concept of "dependency resolution",
depend on a version of pip installed on the host system
and would make it possible to have multiple versions of the same package in the same workspace, or even the same version of the same package multiple times (from different requirements.txt files)
On the flip side, we wouldn't have to re-implement anything (e.g. version resolution or compiling native coed) from pip, which is a very welcome development.
--WDYT?--Lukács T. Berki | Software Engineer | lbe...@google.com |Google Germany GmbH | Erika-Mann-Str. 33 | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891
You received this message because you are subscribed to the Google Groups "Bazel/Python Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-sig-pyth...@googlegroups.com.
To post to this group, send email to bazel-si...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-sig-python/CAOu%2B0LVdEMC-e92yG00xZ0SpL86x_E0k1oi2pVU%2BPOBs3FgFzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
On Wed, Feb 28, 2018 at 3:31 PM John Field <jfi...@google.com> wrote:On Wed, Feb 28, 2018 at 7:27 AM 'Lukács T. Berki' via Bazel/Python Special Interest Group <bazel-si...@googlegroups.com> wrote:Hey there,We've been pondering how best to express dependencies on PyPI (Python package manager) packages and I'd like to make sure that we don't contradict your plans too much.The brief description of the landscape is that PyPI processes a file called requirements.txt which contains a set of packages with version constraints (which version is installed possibly depends on the Python version and the OS) and some metadata (e.g. URL to find the packages), then it does transitive dependency resolution, fetches the packages required, possibly compiles native code, then installs the Python + native code somewhere.Question is, how best to integrate this in the brave new world of WORKSPACE files? In particular, your plans seem to hinge upon separating dependency resolution (non-hermetic) and actually fetching and building them (hermetic), which is difficult, because pip does both of these things. pip does have provisions for repeatability, but not for separating out the dependency resolution part.My best plan is that we would have a WORKSPACE file like this:pip_dependency_set(name="mydeps", requirements="my_requirements.txt")and the "dependency resolution" of this repository would entail running pip and creating an "installation bundle". Then "fetching and building" would be just unpacking it and adding a convenient BUILD file, e.g. with a target per installed library (i.e. @mydeps//:ladle would be the library called "ladle" fetched according to instructions in my_requirements.txt)Looks like "WORKSPACE.resolved" version of this rule should use "repeatable pip" (e.g. https://pip.pypa.io/en/stable/user_guide/#hash-checking-mode).Let's say you have some requirements.txt. Is there a mode in pip to run dependency resolution and then get the specific version numbers to pinpoint them? (maybe you can parse the installation bundle to get them?)
Of course, this would preclude cross-compilation,I don't have a great idea on how to solve this :(abuse the concept of "dependency resolution",Hmm, I don't really thing so: if a result of "sync" to WORKSPACE is a WORKSPACE.resolved with pinpointed hashes/versions, this seems good enough.depend on a version of pip installed on the host systemYes. I do not think this matters too much.
and would make it possible to have multiple versions of the same package in the same workspace, or even the same version of the same package multiple times (from different requirements.txt files)Is that problematic? (especially given we will be able to split diamond dependencies soon: https://bazel-review.googlesource.com/c/bazel/+/42172)
On the flip side, we wouldn't have to re-implement anything (e.g. version resolution or compiling native coed) from pip, which is a very welcome development.\o/--WDYT?--Lukács T. Berki | Software Engineer | lbe...@google.com |Google Germany GmbH | Erika-Mann-Str. 33 | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891
You received this message because you are subscribed to the Google Groups "Bazel/Python Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-sig-pyth...@googlegroups.com.
To post to this group, send email to bazel-si...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-sig-python/CAOu%2B0LVdEMC-e92yG00xZ0SpL86x_E0k1oi2pVU%2BPOBs3FgFzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--Google Germany GmbHErika-Mann-Straße 33, 80636 München, Germany
On Wed, 28 Feb 2018 at 15:52, Dmitry Lomov <dsl...@google.com> wrote:
On Wed, Feb 28, 2018 at 3:31 PM John Field <jfi...@google.com> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-sig-python+unsubscribe@googlegroups.com.
To post to this group, send email to bazel-sig-python@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-sig-python/CAOu%2B0LVdEMC-e92yG00xZ0SpL86x_E0k1oi2pVU%2BPOBs3FgFzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--Lukács T. Berki | Software Engineer | lbe...@google.com |Google Germany GmbH | Erika-Mann-Str. 33 | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891
--
You received this message because you are subscribed to the Google Groups "Bazel/Python Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-sig-python+unsubscribe@googlegroups.com.
To post to this group, send email to bazel-sig-python@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-sig-python/CAOu%2B0LUqYzomj-e5vyika-GiebqtLshEAQSJSC2Fo8xsk77VJw%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-sig-pyth...@googlegroups.com.
To post to this group, send email to bazel-si...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-sig-python/CAOu%2B0LVdEMC-e92yG00xZ0SpL86x_E0k1oi2pVU%2BPOBs3FgFzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--Lukács T. Berki | Software Engineer | lbe...@google.com |Google Germany GmbH | Erika-Mann-Str. 33 | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891
--
You received this message because you are subscribed to the Google Groups "Bazel/Python Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-sig-pyth...@googlegroups.com.
To post to this group, send email to bazel-si...@googlegroups.com.
Question 1: Dependency resolution hermetic or not?Sample requirements.txt:djangoSample pinned requirements.txt for Python 2, os == 'linux' (E.g. output of "pip install -f requirements.txt; pip freeze > requirements.txt.lock")Django==1.11.10futures==3.1.1pytz==2017.4Sample pinned requirements.txt for Python 3, os == 'mac'Django==2.0.2pytz==2018.3Doing this pinning requires figuring out the right version of django, then the right wheel or sdist for that version, fetching it, installing it, reading the metadata for dependencies, then recursing on those dependencies. You do this all in a virtualenv (probably one of several). You could certainly throw away the packages you fetch, and fetch them again in the "hermetic" part if you wanted.
Question 2: One "pip" or many?Scenario 2.1: You have Python 2 and 3 targets in a Bazel repository. You use Django. Django 2.0 only works under Python 3. Your Python 2 code still uses Django 1.x.You have a requirements.txt like this:django < 2.0; python_version == "2.7"django >= 2.0; python_version >= "3.4.3"Django 1.x and 2.x have different sub-dependencies. The only way to find these sub-dependencies is to actually fetch the django package metadata and recursively evaluate it.Unfortunately, when you run "pip", you can't tell it to evaluate requirements for Python 2 or Python 3. "pip" uses whatever interpreter version its running under. So Bazel needs to have host versions of pip like "pip2.7" and "pip3.4" and "pip3.5" etc for every target version of interest. You can also do "python3.4 -m pip", except on platforms like Debian where they break it :(
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-sig-pyth...@googlegroups.com.
To post to this group, send email to bazel-si...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-sig-python/CAOu%2B0LVdEMC-e92yG00xZ0SpL86x_E0k1oi2pVU%2BPOBs3FgFzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--Lukács T. Berki | Software Engineer | lbe...@google.com |Google Germany GmbH | Erika-Mann-Str. 33 | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891
--
You received this message because you are subscribed to the Google Groups "Bazel/Python Special Interest Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-sig-pyth...@googlegroups.com.
To post to this group, send email to bazel-si...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-sig-python/CAOu%2B0LUqYzomj-e5vyika-GiebqtLshEAQSJSC2Fo8xsk77VJw%40mail.gmail.com.
After a bit more thinking, I have the following arguments for making the dependency resolution AND building the PIP packages part of "bazel fetch", and eventually part of the non-hermetic dependency resolution instead of the hermetic fetching:
On Thu, Mar 1, 2018 at 11:17 AM Lukács T. Berki <lbe...@google.com> wrote:After a bit more thinking, I have the following arguments for making the dependency resolution AND building the PIP packages part of "bazel fetch", and eventually part of the non-hermetic dependency resolution instead of the hermetic fetching:The problem with that is: we do not have any place to put the results of non-hermetic build of PIP packages if it happens during "non-hermetic dependency resolution" (I think you mean non-deterministic, really).The current thinking is:* "non-deterministic dependency resolution" aka `bazel sync` produces WORKSPACE.resolved* "determenistic fetching" aka `bazel fetch` fetches predictable artifacts based on what is in WORKSPACE.resolvedThere are no other artifacts planned besides WORKSPACE.resolved that pass from `bazel sync` to `bazel fetch`.So what to do? I see several ways out:a) output enough information into WORKSPACE.resolved to make the pip run determenistic
b) have support of additional artifacts that accompany WORKSPACE.resolved and that bazel sync would generate. The users will need to check them in and update on bazel sync.
c) accept that depending on pip packages is inherently non-determenistic and the users have to make their builds reproducible in other ways (either by checking in prebuilt bundles, or by dockerizing)
d) something else?...(c) would be similar to how we approach C++ toolchain today: we came to accept that it is inherenly dependent on the execution envrionment (although there are ways to hermeticize it)
On Thu, 1 Mar 2018 at 14:00, Dmitry Lomov <dsl...@google.com> wrote:On Thu, Mar 1, 2018 at 11:17 AM Lukács T. Berki <lbe...@google.com> wrote:After a bit more thinking, I have the following arguments for making the dependency resolution AND building the PIP packages part of "bazel fetch", and eventually part of the non-hermetic dependency resolution instead of the hermetic fetching:The problem with that is: we do not have any place to put the results of non-hermetic build of PIP packages if it happens during "non-hermetic dependency resolution" (I think you mean non-deterministic, really).The current thinking is:* "non-deterministic dependency resolution" aka `bazel sync` produces WORKSPACE.resolved* "determenistic fetching" aka `bazel fetch` fetches predictable artifacts based on what is in WORKSPACE.resolvedThere are no other artifacts planned besides WORKSPACE.resolved that pass from `bazel sync` to `bazel fetch`.So what to do? I see several ways out:a) output enough information into WORKSPACE.resolved to make the pip run determenisticWe can't do that, because the output of the pip run (including compiling native code) depends on the Python version / OS / C++ compiler that is installed.b) have support of additional artifacts that accompany WORKSPACE.resolved and that bazel sync would generate. The users will need to check them in and update on bazel sync.This would work, I guess, but I don't think anyone would be thrilled at the prospect of Bazel essentially forcing them to check in binary blobs.c) accept that depending on pip packages is inherently non-determenistic and the users have to make their builds reproducible in other ways (either by checking in prebuilt bundles, or by dockerizing)...but then why monkey around with "fetch" and "sync"? The whole point of having two things is that one is deterministic and the other is not. If you put the boundary between "fetch" and "sync" at "accessing the network" then putting pip package checksums into WORKSPACE.resolved makes sense, but not if the boundary is that one is deterministic and the other isn't. I think the least bad approach is the putting package checksums into WORKSPACE.resolved. Then "bazel fetch" would not be deterministic, but at least the result wouldn't be radically different. And some of the behavior that's dependent on the system (package choice based on Python version + OS) would be in "bazel sync".
On Thu, 1 Mar 2018 at 14:00, Dmitry Lomov <dsl...@google.com> wrote:On Thu, Mar 1, 2018 at 11:17 AM Lukács T. Berki <lbe...@google.com> wrote:After a bit more thinking, I have the following arguments for making the dependency resolution AND building the PIP packages part of "bazel fetch", and eventually part of the non-hermetic dependency resolution instead of the hermetic fetching:The problem with that is: we do not have any place to put the results of non-hermetic build of PIP packages if it happens during "non-hermetic dependency resolution" (I think you mean non-deterministic, really).The current thinking is:* "non-deterministic dependency resolution" aka `bazel sync` produces WORKSPACE.resolved* "determenistic fetching" aka `bazel fetch` fetches predictable artifacts based on what is in WORKSPACE.resolvedThere are no other artifacts planned besides WORKSPACE.resolved that pass from `bazel sync` to `bazel fetch`.So what to do? I see several ways out:a) output enough information into WORKSPACE.resolved to make the pip run determenisticWe can't do that, because the output of the pip run (including compiling native code) depends on the Python version / OS / C++ compiler that is installed.
b) have support of additional artifacts that accompany WORKSPACE.resolved and that bazel sync would generate. The users will need to check them in and update on bazel sync.This would work, I guess, but I don't think anyone would be thrilled at the prospect of Bazel essentially forcing them to check in binary blobs.
c) accept that depending on pip packages is inherently non-determenistic and the users have to make their builds reproducible in other ways (either by checking in prebuilt bundles, or by dockerizing)...but then why monkey around with "fetch" and "sync"? The whole point of having two things is that one is deterministic and the other is not. If you put the boundary between "fetch" and "sync" at "accessing the network" then putting pip package checksums into WORKSPACE.resolved makes sense, but not if the boundary is that one is deterministic and the other isn't. I think the least bad approach is the putting package checksums into WORKSPACE.resolved. Then "bazel fetch" would not be deterministic, but at least the result wouldn't be radically different. And some of the behavior that's dependent on the system (package choice based on Python version + OS) would be in "bazel sync".
d) something else?...(c) would be similar to how we approach C++ toolchain today: we came to accept that it is inherenly dependent on the execution envrionment (although there are ways to hermeticize it)Except that those are not WORKSPACE rules.
On Thu, 1 Mar 2018 at 14:12, Lukács T. Berki <lbe...@google.com> wrote:On Thu, 1 Mar 2018 at 14:00, Dmitry Lomov <dsl...@google.com> wrote:On Thu, Mar 1, 2018 at 11:17 AM Lukács T. Berki <lbe...@google.com> wrote:After a bit more thinking, I have the following arguments for making the dependency resolution AND building the PIP packages part of "bazel fetch", and eventually part of the non-hermetic dependency resolution instead of the hermetic fetching:The problem with that is: we do not have any place to put the results of non-hermetic build of PIP packages if it happens during "non-hermetic dependency resolution" (I think you mean non-deterministic, really).The current thinking is:* "non-deterministic dependency resolution" aka `bazel sync` produces WORKSPACE.resolved* "determenistic fetching" aka `bazel fetch` fetches predictable artifacts based on what is in WORKSPACE.resolvedThere are no other artifacts planned besides WORKSPACE.resolved that pass from `bazel sync` to `bazel fetch`.So what to do? I see several ways out:a) output enough information into WORKSPACE.resolved to make the pip run determenisticWe can't do that, because the output of the pip run (including compiling native code) depends on the Python version / OS / C++ compiler that is installed.b) have support of additional artifacts that accompany WORKSPACE.resolved and that bazel sync would generate. The users will need to check them in and update on bazel sync.This would work, I guess, but I don't think anyone would be thrilled at the prospect of Bazel essentially forcing them to check in binary blobs.c) accept that depending on pip packages is inherently non-determenistic and the users have to make their builds reproducible in other ways (either by checking in prebuilt bundles, or by dockerizing)...but then why monkey around with "fetch" and "sync"? The whole point of having two things is that one is deterministic and the other is not. If you put the boundary between "fetch" and "sync" at "accessing the network" then putting pip package checksums into WORKSPACE.resolved makes sense, but not if the boundary is that one is deterministic and the other isn't. I think the least bad approach is the putting package checksums into WORKSPACE.resolved. Then "bazel fetch" would not be deterministic, but at least the result wouldn't be radically different. And some of the behavior that's dependent on the system (package choice based on Python version + OS) would be in "bazel sync".On a related note: do you already have plans what should happen if the set of things fetched over the network depends on the architecture you want to build for?Currently the plan seems to be to just ignore that problem and go with whatever the host system needs, which is fine for the time being as long as it's compatible with whatever you have in mind for the future. The official location for the target platform is currently the BuildConfiguration, but that's not avaliable during "bazel fetch" or "bazel sync". So we either make that available (somehow), require people to hard-code choices in their WORKSPACE files, or?
I like the direction of this.I am not quite sure how the "configuration" step would work? What are the inputs to that step?Is toolchain resolution mechanism easily extensible to allow this?