It appears that if I launch Nextflow like this:
unset PYTHONHOME; unset PYTHONPATH; export PATH=/gpfs/home/kellys04/conda-NGS580/bin:$$PATH; \
./nextflow run main.nf -resume
with a process that looks like this:
process test_conda {
echo true
executor "slurm"
input:
val(x) from Channel.from('')
script:
"""
which conda
"""
}
Processes running on compute nodes still report the correct path to 'conda'.
This is a bit confusing, since I did not expect 'conda' to be in my $PATH when the process executes. It appears that the $PATH of the parent Nextflow process is being propagated to the environment in which the cluster jobs are running. Dont recall being aware of this feature, I am guessing this is intended? Is it described somewhere? If I run this:
$ grep 'conda' work/1c/d8ab76f86e82332dafb3375cdda92c/.command.*
work/1c/d8ab76f86e82332dafb3375cdda92c/.command.log:/gpfs/home/kellys04/conda-NGS580/bin/conda
work/1c/d8ab76f86e82332dafb3375cdda92c/.command.out:/gpfs/home/kellys04/conda-NGS580/bin/conda
work/1c/d8ab76f86e82332dafb3375cdda92c/.command.run:#SBATCH -J nf-test_conda_(1)
work/1c/d8ab76f86e82332dafb3375cdda92c/.command.run:# NEXTFLOW TASK: test_conda (1)
work/1c/d8ab76f86e82332dafb3375cdda92c/.command.sh:which conda
it is not clear how 'conda' is being propagated to the process's $PATH.
This also presents the problem that I may need to use multiple conda installations in a single pipeline, for example some tools only run under Miniconda2/Anaconda2, while others may need 3. Its not clear how I would implement this using the built-in conda implementation here. Any suggestions?