Dear Anna,
alas, I am unable to reproduce the problem. I am running the
following script against a pbs server:
---------------------------------------------
import saga
import sys
import os
try:
js = saga.job.Service('ssh://localhost/')
jd = saga.job.Description()
for i in range (100) :
jd.executable = "echo 'sleep 5' | qsub"
myjob = js.create_job(jd)
# Check our job's id and state
# print "Job ID : %s" % (
myjob.id)
# print "Job State : %s" % (myjob.state)
# print "\n...starting job...\n"
myjob.run()
# print "Job full ID : %s" % (
myjob.id)
# print "Job State : %s" % (myjob.state)
# print "\n...waiting for job...\n"
myjob.wait()
# print "Job short ID : %s" % (
myjob.id).split("-")[1]
# print "Job State : %s" % (myjob.state)
# print "Exitcode : %s" % (myjob.exit_code)
os.system ('echo -n "%5d : " ; ps -ef | grep -v grep | grep -i
wrapper | wc -l '% i)
js.close ()
os.system ('echo -n "%5d : " ; ps -ef | grep -v grep | grep -i
wrapper | wc -l '% 0)
sys.exit (0)
except saga.SagaException, ex:
# Catch all saga exceptions
print "An exception occured: (%s) %s " % (ex.type, (str(ex)))
# Trace back the exception. That can be helpful for debugging.
print " \n*** Backtrace:\n %s" % ex.traceback
sys.exit (1)
------------------------------------------------------------
and see as output:
------------------------------------------------------------
(ve)merzky@tutorial:~ $ python test.py
0 : 2
1 : 2
2 : 2
3 : 2
4 : 2
5 : 2
6 : 2
[...]
98 : 2
99 : 2
0 : 0
------------------------------------------------------------
The process list seems to confirm that exactly two wrapper.sh
instances are alive during the submission phase:
------------------------------------------------------------
merzky@tutorial:~ $ ps -ef --forest | grep -v grep | grep -i -e saga
-e wrapper -e ssh -e sleep -C 5
mongodb 1632 1 0 10:15 ? 00:00:02 /usr/bin/mongod
--unixSocketPrefix=/var/run/mongodb --config /etc/mongodb.conf run
root 1652 1 0 10:15 ? 00:00:00 /usr/sbin/rsyslogd -c5
root 1655 1 1 10:15 ? 00:00:09 /usr/sbin/pbs_mom
merzky 16212 1655 0 10:30 ? 00:00:00 \_ -bash
merzky 16218 16212 0 10:30 ? 00:00:00 | \_ -bash
merzky 16219 16218 0 10:30 ? 00:00:00 | \_ sleep 5
merzky 16271 1655 0 10:30 ? 00:00:00 \_ -bash
merzky 16278 16271 0 10:30 ? 00:00:00 | \_ -bash
merzky 16279 16278 0 10:30 ? 00:00:00 | \_ sleep 5
merzky 16328 1655 0 10:30 ? 00:00:00 \_ -bash
merzky 16334 16328 0 10:30 ? 00:00:00 | \_ -bash
merzky 16335 16334 0 10:30 ? 00:00:00 | \_ sleep 5
merzky 16388 1655 0 10:30 ? 00:00:00 \_ -bash
merzky 16394 16388 0 10:30 ? 00:00:00 | \_ -bash
merzky 16395 16394 0 10:30 ? 00:00:00 | \_ sleep 5
merzky 16444 1655 0 10:30 ? 00:00:00 \_ -bash
merzky 16450 16444 0 10:30 ? 00:00:00 | \_ -bash
merzky 16451 16450 0 10:30 ? 00:00:00 | \_ sleep 5
merzky 16504 1655 0 10:30 ? 00:00:00 \_ -bash
merzky 16510 16504 0 10:30 ? 00:00:00 | \_ -bash
merzky 16511 16510 0 10:30 ? 00:00:00 | \_ sleep 5
merzky 16560 1655 0 10:30 ? 00:00:00 \_ -bash
merzky 16566 16560 0 10:30 ? 00:00:00 | \_ -bash
merzky 16567 16566 0 10:30 ? 00:00:00 | \_ sleep 5
merzky 16620 1655 0 10:30 ? 00:00:00 \_ -bash
merzky 16626 16620 0 10:30 ? 00:00:00 | \_ -bash
merzky 16627 16626 0 10:30 ? 00:00:00 | \_ sleep 5
merzky 16676 1655 0 10:30 ? 00:00:00 \_ -bash
merzky 16682 16676 0 10:30 ? 00:00:00 \_ -bash
merzky 16683 16682 0 10:30 ? 00:00:00 \_ sleep 5
root 1676 1 0 10:15 ? 00:00:01 /usr/sbin/pbs_server
root 1760 1 0 10:15 ? 00:00:00 /usr/sbin/cron
102 1785 1 0 10:15 ? 00:00:00 /usr/bin/dbus-daemon --system
root 1803 1 0 10:15 ? 00:00:00 /usr/sbin/sshd
root 1890 1803 0 10:17 ? 00:00:00 \_ sshd: merzky [priv]
merzky 1966 1890 0 10:17 ? 00:00:00 | \_ sshd: merzky@pts/0,pts/1
merzky 1967 1966 0 10:17 pts/0 00:00:00 | \_ -bash
merzky 2951 1967 0 10:26 pts/0 00:00:00 | | \_ vim test.py
merzky 11631 1967 3 10:29 pts/0 00:00:01 | | \_ python test.py
merzky 11642 11631 0 10:29 pts/2 00:00:00 | | \_
/usr/bin/ssh -t -o IdentityFile=/home/merzky/.ssh/id_rsa -o
ControlMaster=yes -o ControlPath=/tmp/saga_ssh_merzky_%h_%p.ctrl
localhost
merzky 11667 11631 0 10:29 pts/4 00:00:00 | | \_
/usr/bin/ssh -t -o IdentityFile=/home/merzky/.ssh/id_rsa -o
ControlMaster=no -o ControlPath=/tmp/saga_ssh_merzky_%h_%p.ctrl
localhost
merzky 2152 1966 0 10:24 pts/1 00:00:00 | \_ -bash
merzky 16684 2152 0 10:30 pts/1 00:00:00 | \_ ps -ef --forest
root 11643 1803 0 10:29 ? 00:00:00 \_ sshd: merzky [priv]
merzky 11648 11643 0 10:29 ? 00:00:00 \_ sshd: merzky@pts/3,pts/5
merzky 11649 11648 0 10:29 pts/3 00:00:00 \_ -bash
merzky 11668 11648 0 10:29 pts/5 00:00:00 \_ /bin/sh -i
merzky 11688 11668 1 10:29 pts/5 00:00:00 \_
/bin/sh /home/merzky/.saga/adaptors/shell_job/wrapper.sh 11668
merzky 12208 11688 0 10:29 pts/5 00:00:00 \_
/bin/sh /home/merzky/.saga/adaptors/shell_job/wrapper.sh 11668
merzky 15183 12208 0 10:30 pts/5 00:00:00 \_ sleep 30
root 1867 1 0 10:15 ? 00:00:00 /usr/sbin/pbs_sched
root 1893 1 0 10:17 ? 00:00:00
/usr/sbin/console-kit-daemon --no-daemon
root 1960 1 0 10:17 ? 00:00:00
/usr/lib/policykit-1/polkitd --no-debug
------------------------------------------------------------
This run used the saga-python release from pypi. So I am not sure
what to make of this. Would it be, by any chance, possible to get
access to your slurm-server, in order to debug this?
Best, Andre.
PS.: mail formatting will likely screw up the pasted texts above --
you may want to check this out on the ticket at
https://github.com/radical-cybertools/saga-python/issues/379
> You received this message because you are subscribed to the Google Groups "saga-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
saga-users+...@googlegroups.com.