TF SIG Build follow-up: Thoth & thamos

20 views
Skip to first unread message

Austin Anderson

unread,
Apr 18, 2022, 7:11:37 PM4/18/22
to Douglas Yarrington, SIG Build, Fridolín Pokorný, mcos...@redhat.com, Christoph Goern, tensorflow-devinfra-team
Hi all,

At the last TF SIG Build meeting, we discussed Thoth, RedHat's cloud resolver for Python packages, and the options for collaboration. While we at Google wouldn't directly benefit from this -- so I don't think Google should put our resources to work on this effort directly -- I would be happy to see community support for it. Below, I will explain what I understand about Thoth, my experience trying its CLI thamos, and then my conclusions. Thoth seems cool, although thamos appears to target different users than I expected.

Thoth's CLI tool "thamos" accepts a manually-created YAML environment description and a Python requirements.txt file, then cross-references a remote "prescriptions" database to avoid installing environment-incompatible packages. This database is on GitHub as YAML files. Users running old versions of TF, or TF with older OS versions, can use Thoth to reduce the risk of installing incompatible packages, such as a TF version that doesn't support their old graphics card. The tool sends environment data to the Thoth server to get a resolution.

Prescriptions can consider Python package versions and any environment descriptor and give users specific errors. The useful descriptors I noticed include "cuda_version", "cudnn_version", and a couple others (full yaml example). Many prescriptions target TensorFlow (see GitHub) and can do things like warn a user installing TensorFlow that their CUDA and cuDNN versions are not compatible (example). This database seems potentially useful.

Unfortunately, I spent 30 minutes trying the thamos CLI and, if I were a user, would have given up. Thamos did not give me advice until I changed its generated "overlays_dir" and "requirements_format". Thamos gives a lot of information, sometimes conflicting: Thamos recognized that I configured "cuda_version" and "cudnn_version" but its advice still asserted I lacked CUDA; it then excluded some of the useful TF prescriptions until I faked a switch to Fedora. It seems like Thamos partly supports most Linuxes, but fully supports only Fedora, RHEL and UBI. Thamos seems like a useful tool for enterprise users who know what they're doing.
  • Can TF recommend Thoth for users as part of the installation debugging process?
    • I don't think so. My brief experience with Thoth's CLI, Thamos, leads me to believe this would cause our average users more confusion, and it is limited to operating systems that we do not officially support (Fedora, RHEL, UBI). Thoth seems useful for a certain subset of users, though, and I'd like to make it discoverable for them -- if a straightforward tutorial for TF users existed somewhere, it could be linked from SIG Build, but I don't think the TF team would write or maintain it. 
  • Can TF make use of Thoth's database (prescriptions)?
    • Unclear. Douglas wondered if Thoth could generate an OS-independent compatibility matrix for our docs (see my example below). Bluntly, it may be tough for TF to incentivize internal work on any projects that target users on un-supported platforms. A greedy collaboration might be: RedHat maintains the prescriptions and maintains compatibility tables (maybe on SIG Build) for TF versions; TF will endorse them. Slightly less greedy: TF could agree to add new prescriptions when well-known changes occur (like: when a new TF version releases with a different CUDA version supported). I assume that we can't rely on leads/legal agreeing to ongoing maintenance of that database or any partial ownership of the prescriptions, however, because there is not an explicit benefit for Google.
A compatibility matrix may look like this, for example:

TF Version | Python 3.5 | ... | Python 3.10
    2.8    |     No     | ... |    Yes
    ...    |     ..     | ... |    ...

TF Version | CUDA 11.0 | ... | CUDA 11.2 
    2.8    |     Yes   | ... |    Yes

To summarize, I like the concept of Thoth but it looks like it would be hard to qualify a direct collaboration.

Thanks,
Austin

Fridolín Pokorný

unread,
Apr 19, 2022, 12:07:03 PM4/19/22
to Austin Anderson, Douglas Yarrington, SIG Build, Maya Costantini, Christoph Goern, tensorflow-devinfra-team
Hi all,

thanks Austin for your time and follow-up.

On Tue, Apr 19, 2022 at 1:11 AM Austin Anderson <ange...@google.com> wrote:
Hi all,

At the last TF SIG Build meeting, we discussed Thoth, RedHat's cloud resolver for Python packages, and the options for collaboration. While we at Google wouldn't directly benefit from this -- so I don't think Google should put our resources to work on this effort directly -- I would be happy to see community support for it. Below, I will explain what I understand about Thoth, my experience trying its CLI thamos, and then my conclusions. Thoth seems cool, although thamos appears to target different users than I expected.

Thoth's CLI tool "thamos" accepts a manually-created YAML environment description and a Python requirements.txt file, then cross-references a remote "prescriptions" database to avoid installing environment-incompatible packages. This database is on GitHub as YAML files. Users running old versions of TF, or TF with older OS versions, can use Thoth to reduce the risk of installing incompatible packages, such as a TF version that doesn't support their old graphics card. The tool sends environment data to the Thoth server to get a resolution.

Thoth synthetizes hardware and software requirements as described. The project focuses also on security - we use PyPA's advisory-db (not visible in prescriptions) to guide on used software with respect to security on the Python level. If a containerized environment is used, the resolver considers also software present in the containerized environment based on CVEs reported and detected by Quay Clair.

Prescriptions can consider Python package versions and any environment descriptor and give users specific errors. The useful descriptors I noticed include "cuda_version", "cudnn_version", and a couple others (full yaml example). Many prescriptions target TensorFlow (see GitHub) and can do things like warn a user installing TensorFlow that their CUDA and cuDNN versions are not compatible (example). This database seems potentially useful.

Another example of a use case that the resolver is fixing is library incompatibilities - an example can be already reported issue h5py==3.0.0 used with tensorflow=2.1.0 (see the relevant resolver prescription). We would be interested also in these types of issues - if there is a way to label such issues for us in the TF issue tracker, it would be great - we could create prescriptions for them and have them available in the matrix discussed below.
 
Unfortunately, I spent 30 minutes trying the thamos CLI and, if I were a user, would have given up. Thamos did not give me advice until I changed its generated "overlays_dir" and "requirements_format". Thamos gives a lot of information, sometimes conflicting: Thamos recognized that I configured "cuda_version" and "cudnn_version" but its advice still asserted I lacked CUDA; it then excluded some of the useful TF prescriptions until I faked a switch to Fedora. It seems like Thamos partly supports most Linuxes, but fully supports only Fedora, RHEL and UBI. Thamos seems like a useful tool for enterprise users who know what they're doing.

Thanks for your feedback - we should definitely improve the first touch experience and learning curve.
  • Can TF recommend Thoth for users as part of the installation debugging process?
    • I don't think so. My brief experience with Thoth's CLI, Thamos, leads me to believe this would cause our average users more confusion, and it is limited to operating systems that we do not officially support (Fedora, RHEL, UBI). Thoth seems useful for a certain subset of users, though, and I'd like to make it discoverable for them -- if a straightforward tutorial for TF users existed somewhere, it could be linked from SIG Build, but I don't think the TF team would write or maintain it. 
Currently, we aggregate dependency data for Fedora, RHEL, and UBI. If other operating systems would increase adoption, we could eventually evaluate them and allocate resources to support them. 
  • Can TF make use of Thoth's database (prescriptions)?
    • Unclear. Douglas wondered if Thoth could generate an OS-independent compatibility matrix for our docs (see my example below). Bluntly, it may be tough for TF to incentivize internal work on any projects that target users on un-supported platforms. A greedy collaboration might be: RedHat maintains the prescriptions and maintains compatibility tables (maybe on SIG Build) for TF versions; TF will endorse them. Slightly less greedy: TF could agree to add new prescriptions when well-known changes occur (like: when a new TF version releases with a different CUDA version supported). I assume that we can't rely on leads/legal agreeing to ongoing maintenance of that database or any partial ownership of the prescriptions, however, because there is not an explicit benefit for Google.
A compatibility matrix may look like this, for example:

TF Version | Python 3.5 | ... | Python 3.10
    2.8    |     No     | ... |    Yes
    ...    |     ..     | ... |    ...

TF Version | CUDA 11.0 | ... | CUDA 11.2 
    2.8    |     Yes   | ... |    Yes


We have planned support for this based on the TF build discussion. I think we could provide such an automatically generated matrix out of prescriptions for TensorFlow.
 
To summarize, I like the concept of Thoth but it looks like it would be hard to qualify a direct collaboration.

Thanks again for your valuable feedback.

Have a great day,
Fridolin
Reply all
Reply to author
Forward
0 new messages