Hi Brent,
to answer 1. ) yes we do experience that behaviour every now and then, and we have one like that this week.
For context, our environments also involve ~300 packages and we have more than ~75.500 versions created for the 2.200 individual rez packages we have.
So as you guessed and Allan confirmed, the work the solver is doing is not trivial, and the addition of just one package or a different version of that package (that has different dependencies or a different version number) can trow the solver into that seemingly eternal resolve path
In our experience, the reasons most of the time are:
- packages with really wide ranges
- non mutually exclusive variants (ie. Qt/MayaQT/Qt_vfx, or OpenImageIO with our namespace ALOpeImageIO)
- packages with asymmetric variants
- packages with some conditional requirements.
2) The troubleshooting approaches Allan mentioned are the ones that we use the most, another technique we use is to set REZ_MAX_FAILS to a relatively small value i.e 20, so the solver will at least stop and show you the failure of one of the paths and that can give you a clue of where to start looking for, but also can mislead you a bit.
But as Allan mentioned, the first thing we do is to turn the verbosity.
If that does not help, try to narrow down which is the minimum set of packages that cause rez to go to this sort of never resolving state.
Unfortunately, this is a trial and error exercise, but if you know what has changed recently try that first, or if you know the dependencies of the packages, try and play with the packages with non-fully mutually exclusive variants or with asymmetric variants.
Either start removing some of the requested packages until you find which is the culprit, or do the inverse process, start growing the number of packages
Once you have found the culprit, start by playing with the version ranges to see if you can narrow them further.
Many times the problem will be solved by:
- restricting the version range of a package, usually, the lower bound.
- adding an anti-package for one of the variants !package
- removing unnecessary dependencies.
Hope this helps
Fede
PS: Since this troubleshooting is so time-consuming, lately, I have been thinking about writing an external tool, that takes a rez env line that is not resolving (our default max_fails = 1000, but I guess we will put something more like 100) and then does a kind of a binary search to find you which packages are the one preventing the full request to be resolved...
Ie if we have 100 packages in the request, and we find that with the first 50 it resolves, then we try with 75 and so on, let say when adding 51 it causes not to resolve, then we can remove that package from the list, and try again with 75 but excluding that package, and if it resolves then go to 99..... I guess it can end up with a list of packages that cause not to resolve.
Honestly, I think in some cases, it might mislead you, but I guess the goal is not to find the exact root cause but give you an idea of where to look for without doing the process manually process. What do you think?