Hi,
I am having issue with slurm not creating the output(xxx.out) and error (xxx.err) file In the directory it was run. Also, sview reporting that the job is running but it’s actually not running.
Yinka Adeosun
Unix Administrator
Vistronix, Inc
Contractor to US EPA Chesapeake Bay Program Office
Hi,
I am having issue with slurm not creating the output(xxx.out) and error (xxx.err) file In the directory it was run. Also, sview reporting that the job is running but it’s actually not running.
Yinka Adeosun
Unix Administrator
Vistronix, Inc
Contractor to US EPA Chesapeake Bay Program Office
-- Andy Riebs Hewlett-Packard Company High Performance Computing +1-786-263-9743 My opinions are not necessarily those of HP
I started here less than a week but I’d answer to the best of my knowledge.
1. What version of SLURM? (If you configured and built it, what options did you use?) – 2.1.6-1
2. What OS? What hardware? – RHEL5. Dell Clusters(HPC)
3. Are user directories shared across the cluster? --- Yes
4. Do you use automount for the user directories? -- Yes
5. Can you include a copy of your slurm.conf? --- Yes, attached
6. Do you know where the user output *is* going? –Created where the job is run
7. Do the slurmctld.log and slurmd.log log files report any errors? Yes like from nodes “…unable to register: unable to contact slurm controller (connect failure)
Thanks,
Yinka
![]()
I started here less than a week but I’d answer to the best of my knowledge.
1. What version of SLURM? (If you configured and built it, what options did you use?) – 2.1.6-1
2. What OS? What hardware? – RHEL5. Dell Clusters(HPC)
3. Are user directories shared across the cluster? --- Yes
4. Do you use automount for the user directories? -- Yes
5. Can you include a copy of your slurm.conf? --- Yes, attached
6. Do you know where the user output *is* going? –Created where the job is run
7. Do the slurmctld.log and slurmd.log log files report any errors? Yes like from nodes “…unable to register: unable to contact slurm controller (connect failure)
Thanks,
Yinka
From: Andy Riebs [mailto:andy....@hp.com]
Sent: Tuesday, August 28, 2012 10:50 AM
To: slurm-dev
Subject: [slurm-dev] Re: Output and Error Files
Yinka,
The following information would help considerably in identifying the nature of your problem:
1. What version of SLURM? (If you configured and built it, what options did you use?)
2. What OS? What hardware?
3. Are user directories shared across the cluster?
4. Do you use automount for the user directories?
5. Can you include a copy of your slurm.conf?
6. Do you know where the user output *is* going?
7. Do the slurmctld.log and slurmd.log log files report any errors?
AndyOn 08/28/2012 09:55 AM, Yinka Adeosun wrote:
Hi,
I am having issue with slurm not creating the output(xxx.out) and error (xxx.err) file In the directory it was run. Also, sview reporting that the job is running but it’s actually not running.
Yinka Adeosun
Unix Administrator
Vistronix, Inc
Contractor to US EPA Chesapeake Bay Program Office
--Andy RiebsHewlett-Packard CompanyHigh Performance Computing+1-786-263-9743My opinions are not necessarily those of HP
-- Andy Riebs Hewlett-Packard Company High Performance Computing +1-786-263-9743 My opinions are not necessarily those of HP
I started here less than a week but I’d answer to the best of my knowledge.
1. What version of SLURM? (If you configured and built it, what options did you use?) – 2.1.6-1
2. What OS? What hardware? – RHEL5. Dell Clusters(HPC)
3. Are user directories shared across the cluster? --- Yes
4. Do you use automount for the user directories? -- Yes
5. Can you include a copy of your slurm.conf? --- Yes, attached
6. Do you know where the user output *is* going? –Created where the job is run
7. Do the slurmctld.log and slurmd.log log files report any errors? Yes like from nodes “…unable to register: unable to contact slurm controller (connect failure)
Thanks,
Yinka
From: Andy Riebs [mailto:andy....@hp.com]
Sent: Tuesday, August 28, 2012 10:50 AM
To: slurm-dev
Subject: [slurm-dev] Re: Output and Error Files
Yinka,
The following information would help considerably in identifying the nature of your problem:
1. What version of SLURM? (If you configured and built it, what options did you use?)
2. What OS? What hardware?
3. Are user directories shared across the cluster?
4. Do you use automount for the user directories?
5. Can you include a copy of your slurm.conf?
6. Do you know where the user output *is* going?
7. Do the slurmctld.log and slurmd.log log files report any errors?
AndyOn 08/28/2012 09:55 AM, Yinka Adeosun wrote:
Hi,
I am having issue with slurm not creating the output(xxx.out) and error (xxx.err) file In the directory it was run. Also, sview reporting that the job is running but it’s actually not running.
Yinka Adeosun
Unix Administrator
Vistronix, Inc
Contractor to US EPA Chesapeake Bay Program Office
--Andy RiebsHewlett-Packard CompanyHigh Performance Computing+1-786-263-9743My opinions are not necessarily those of HP
-- Andy Riebs Hewlett-Packard Company High Performance Computing +1-786-263-9743 My opinions are not necessarily those of HP
Andre,
Thanks for getting back. Yes, the output and the error files are not being created where the jobs are run. In my response I meant it was supposed to be creating the error/out files at the location it was run.
![]()
A few seconds
![]()
Andy,
By the way, the “….unable to register: unable to contact slurm controller(connect failure)” issue is resolved but the error/out files are still not being created, please.
Yinka.
![]()
A few seconds
From: Andy Riebs [mailto:andy....@hp.com]
Sent: Wednesday, August 29, 2012 9:26 AM
To: slurm-dev
Subject: [slurm-dev] RE: sview problem, was Re: Output and Error Files
On the sview problem, how long does sview report an incorrect state: a few seconds, a few minutes, until slurm is restarted?
On 08/28/2012 01:24 PM, Yinka Adeosun wrote:
I started here less than a week but I’d answer to the best of my knowledge.
1. What version of SLURM? (If you configured and built it, what options did you use?) – 2.1.6-1
2. What OS? What hardware? – RHEL5. Dell Clusters(HPC)
3. Are user directories shared across the cluster? --- Yes
4. Do you use automount for the user directories? -- Yes
5. Can you include a copy of your slurm.conf? --- Yes, attached
6. Do you know where the user output *is* going? –Created where the job is run
7. Do the slurmctld.log and slurmd.log log files report any errors? Yes like from nodes “…unable to register: unable to contact slurm controller (connect failure)
Thanks,
Yinka
From: Andy Riebs [mailto:andy....@hp.com]
Sent: Tuesday, August 28, 2012 10:50 AM
To: slurm-dev
Subject: [slurm-dev] Re: Output and Error Files
Yinka,
The following information would help considerably in identifying the nature of your problem:
1. What version of SLURM? (If you configured and built it, what options did you use?)
2. What OS? What hardware?
3. Are user directories shared across the cluster?
4. Do you use automount for the user directories?
5. Can you include a copy of your slurm.conf?
6. Do you know where the user output *is* going?
7. Do the slurmctld.log and slurmd.log log files report any errors?
AndyOn 08/28/2012 09:55 AM, Yinka Adeosun wrote:
Hi,
I am having issue with slurm not creating the output(xxx.out) and error (xxx.err) file In the directory it was run. Also, sview reporting that the job is running but it’s actually not running.
Aaron,
Glad to hear from you. I’ve heard good praises of you from everyone.
Sinfo:
bart@prometheus ]# sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
active* up infinite 1 idle* prometheus1
active* up infinite 3 idle prometheus[2-4]
debug up infinite 1 idle* prometheus6
debug up infinite 1 idle prometheus5
Dig from headnode to nodes is not resolving. Interesting too, the nodes are configured for dhcp.
Thanks,
Yinka.
Aaron,
Thanks. I feel much better knowing I am talking to the master. I can ping/ssh to the nodes from headnode but does not resolve with dig. Also, fyi prometheus=bluefishJ
Regards,
Yinka
Error! Filename not specified.
--
Aaron Knister
Systems Administrator
Division of Information Technology
University of Maryland, Baltimore County
aar...@umbc.edu![]()