Hi,
I am evaluating Anduril for the use in our next-gen sequencing lab. I pleased to have been able to run basic "local" workflows and implement custom nodes. However, I have trouble in finding a good way to execute components on our PBS cluster.
- In prefix mode an arbitrary prefix is added to each component _commandline. Since different components have different computational requirements (cpu, memory, etc.) it makes sense to be able to modify these resources on a per-component basis, a single prefix will not cut it. Is this possible for prefix jobs?
- PBS has two ways to execute jobs, "qsub" and "qsub -I" which seem equivalent to "sbatch" and "srun" in slurm ("qrun" does sth different and this is probably a mistake in the Anduril docs). Unfortunately "qsub -I" cannot be used instead of "srun" because it ignores the script (all input is interactive) and never returns. Just "qsub" on the other hand returns immediately. Do I assume correctly that Anduril assumes prefix jobs to "wait" i.e. not return until done? Along the lines of:
qwait
#!/usr/bin/env bash
JOBID=$(qsub "$@" | cut -f 1 -d '.')
STATUS="R"
while [ $STATUS = "R" ]; do
# keep running until R
sleep 1
STATUS=$(qstat $JOBID | tail -n +3 | sed 's/\s\+/ /g' | cut -f 5 -d " ")
done
- how difficult would it be to add a native "pbs" mode, could you point me to the implementation of Slurm in Anduril?
- in prefix mode it appears that paths passed to "qsub" are "local" and real (i.e. never include symlinks). So if on a local machine it is "/home/user/andruil/components" and remotely it is "/user_homes/user/andruil/components" the submitted job will fail. Since it is impossible to fake a directory structure using simlinks it appears that the prefix command will only work if the andruil workflow is started from one of the nodes with a shared filesystem. Is this correct?
- lets assume I have a directory of a 100s of files and I write a pipeline which processes them one-by-one. I would want to be able to run the pipeline easily for all files, but I would also like to be able re-run it when new files appear in the directory but only for those new files. How should I accomplish this?
Thanks,
Marcin