Hi all,
At the last TF SIG Build meeting, we discussed Thoth, RedHat's cloud resolver for Python packages, and the options for collaboration. While we at Google wouldn't directly benefit from this -- so I don't think Google should put our resources to work on this effort directly -- I would be happy to see community support for it. Below, I will explain what I understand about Thoth, my experience trying its CLI thamos, and then my conclusions. Thoth seems cool, although thamos appears to target different users than I expected.
Thoth's CLI tool "thamos" accepts a manually-created YAML environment description and a Python requirements.txt file, then cross-references a remote "prescriptions" database to avoid installing environment-incompatible packages. This database is on GitHub as YAML files. Users running old versions of TF, or TF with older OS versions, can use Thoth to reduce the risk of installing incompatible packages, such as a TF version that doesn't support their old graphics card. The tool sends environment data to the Thoth server to get a resolution.
Prescriptions can consider Python package versions and any environment descriptor and give users specific errors. The useful descriptors I noticed include "cuda_version", "cudnn_version", and a couple others (
full yaml example). Many prescriptions target TensorFlow (see
GitHub) and can do things like warn a user installing TensorFlow that their CUDA and cuDNN versions are not compatible (
example). This database seems potentially useful.
Unfortunately, I spent 30 minutes trying the thamos CLI and, if I were a user, would have given up. Thamos did not give me advice until I changed its generated "overlays_dir" and "requirements_format". Thamos gives a lot of information, sometimes conflicting: Thamos recognized that I configured "cuda_version" and "cudnn_version" but its advice still asserted I lacked CUDA; it then excluded some of the useful TF prescriptions until I faked a switch to Fedora. It seems like Thamos partly supports most Linuxes, but fully supports only Fedora, RHEL and UBI. Thamos seems like a useful tool for enterprise users who know what they're doing.
- Can TF recommend Thoth for users as part of the installation debugging process?
- I don't think so. My brief experience with Thoth's CLI, Thamos, leads me to believe this would cause our average users more confusion, and it is limited to operating systems that we do not officially support (Fedora, RHEL, UBI). Thoth seems useful for a certain subset of users, though, and I'd like to make it discoverable for them -- if a straightforward tutorial for TF users existed somewhere, it could be linked from SIG Build, but I don't think the TF team would write or maintain it.
- Can TF make use of Thoth's database (prescriptions)?
- Unclear. Douglas wondered if Thoth could generate an OS-independent compatibility matrix for our docs (see my example below). Bluntly, it may be tough for TF to incentivize internal work on any projects that target users on un-supported platforms. A greedy collaboration might be: RedHat maintains the prescriptions and maintains compatibility tables (maybe on SIG Build) for TF versions; TF will endorse them. Slightly less greedy: TF could agree to add new prescriptions when well-known changes occur (like: when a new TF version releases with a different CUDA version supported). I assume that we can't rely on leads/legal agreeing to ongoing maintenance of that database or any partial ownership of the prescriptions, however, because there is not an explicit benefit for Google.
A compatibility matrix may look like this, for example:
TF Version | Python 3.5 | ... | Python 3.10
2.8 | No | ... | Yes
... | .. | ... | ...
TF Version | CUDA 11.0 | ... | CUDA 11.2
2.8 | Yes | ... | Yes
To summarize, I like the concept of Thoth but it looks like it would be hard to qualify a direct collaboration.
Thanks,
Austin