Hi,
I played a bit around with saga, and I like it a lot. I quickly managed to start remote jobs via `ssh`.
But I still cannot figure out how to submit jobs to an SGE cluster with `sge+ssh`.
Every time I submit a job (I am simply using a slightly modified version of the SGE touch example,
http://saga-python.readthedocs.org/en/latest/adaptors/saga.adaptor.sgejob.html) with a given service
``js = saga.job.Service("sge+ssh://mycluster.myuni.de",
session=session)``
it ends up in the FAILED or `Eqw` state, respectively.
What is odd though is that I was able to copy the SAGA-shell script from the temporary folder of our cluster and
this runs perfectly with the `qsub` command.
The script that was produced by SAGA is:
#!/bin/bash
#$ -S /bin/bash
#$ -V
#$ -v FILENAME=testfile
#$ -wd /net/homes2/informatik/augustin/robm/working_dir
#$ -o examplejob.out
#$ -e examplejob.err
#$ -l h_rt=0:1:00
#$ -q short
#$ -A TG-MCB090174
#$ -pe mp 1
function aborted() {
echo Aborted with signal $1.
echo "signal: $1" >>$HOME/.saga/adaptors/sge_job/$JOB_ID
echo "end_time: $(LC_ALL=en_US.utf8 date '+%a %b %d %H:%M:%S %Y')" >>$HOME/.saga/adaptors/sge_job/$JOB_ID
exit -1
}
mkdir -p $HOME/.saga/adaptors/sge_job
for sig in SIGHUP SIGINT SIGQUIT SIGTERM SIGUSR1 SIGUSR2; do trap "aborted $sig" $sig; done
echo "hostname: $HOSTNAME" >$HOME/.saga/adaptors/sge_job/$JOB_ID
echo "qsub_time: Thu Jul 16 15:37:34 2015" >>$HOME/.saga/adaptors/sge_job/$JOB_ID
echo "start_time: $(LC_ALL=en_US.utf8 date '+%a %b %d %H:%M:%S %Y')" >>$HOME/.saga/adaptors/sge_job/$JOB_ID
/bin/touch $FILENAME
echo "exit_status: $?" >>$HOME/.saga/adaptors/sge_job/$JOB_ID
echo "end_time: $(LC_ALL=en_US.utf8 date '+%a %b %d %H:%M:%S %Y')" >>$HOME/.saga/adaptors/sge_job/$JOB_ID
And as I said, manually typing `qsub this_script.sh` works fine. How come that I can manually start it but submitting it
from a different computer via saga fails? The error reason provided by `qstat -j <jobnumber>` is the following:
error reason 1: 07/16/2015 16:23:19 [35257:14045]: execvlp(/var/spool/sge/node070/job_scripts/5228798, "/var/spool/s
Any ideas what's wrong?
Thanks and cheers,
Robert