Thanks,
Whelton
[root@indium init.d]# ./pbs_server start
Starting pbs_server: PBS_Server: process_host_name_part, host compute-0-0:2 not found
PBS_Server: pbsd_init(setup_nodes), could not create node "compute-0-0:2", error = 15062
PBS_Server: PBS_Server, pbsd_init failed
[FAILED]
[root@indium init.d]# ./pbs start
Starting PBS
PBS_Server: process_host_name_part, host compute-0-0:2 not found
PBS_Server: pbsd_init(setup_nodes), could not create node "compute-0-0:2", error = 15062
PBS_Server: PBS_Server, pbsd_init failed
PBS server
Warning: can not open holidays file, assuming 24hr primetime: No such file or directory
Error opening file dedicated_time: No such file or directory
Warning: resource group file error, fair share will not work: No such file or directory
In token_acct_open filed to open file /opt/torque/sched_priv/accounting/20090820
acct_open: No such file or directory
PBS sched
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090820/5a7ae372/attachment.html
It looks like your /opt/torque/server_priv/nodes file has some oddities.
What does it contain? Mine looks like this:
# cat /opt/torque/server_priv/nodes
compute-0-0 np=4
compute-0-1 np=4
compute-0-2 np=4
compute-0-3 np=4
etc...
Bart
This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.
[root@indium ~]# cat /opt/torque/server_priv/nodes
compute-0-0:2
compute-0-1:2
compute-0-2:2
compute-0-3:2
compute-0-4:2
compute-0-5:2
compute-0-6:2
compute-0-7:2
compute-0-8:2
compute-0-9:2
compute-0-10:2
compute-0-11:2
compute-0-12:2
> Date: Thu, 20 Aug 2009 16:04:54 -0700
> From: bbra...@Environcorp.com
> To: npaci-rocks...@sdsc.edu
> Subject: Re: [Rocks-Discuss] Torque/PBS crash
Note that my computes are quad-core, thus the np=4 (np means number of
processors). Obviously, if you have more cores, use np=8 or whatever is
right.
This file should get re-generated when you do a "rocks sync config".
Have you tried that already? If you did, and that's what it wrote, then
somehow your database has gotten messed up. What's the output of "rocks
list host compute". It should look like this:
# rocks list host compute
HOST MEMBERSHIP CPUS RACK RANK COMMENT
compute-0-0: Compute 4 0 0 -------
compute-0-1: Compute 4 0 1 -------
compute-0-2: Compute 4 0 2 -------
compute-0-3: Compute 4 0 3 -------
Bart
Just follow Bart's suggestions and his Torque node file syntax,
and restart the pbs_server.
That's the syntax Torque expects.
The syntax you used, with colons, suggests
that you copied a MPICH2 "machines" file to the Torque nodes file.
Each one uses a different syntax, although they look similar.
The MPICH2 machines file uses colons to separate the
node name from the number of processors/cores,
followed optionally by space and
ifhn=IP-address of the interface to use for MPI.
OTOH, the Torque node file uses the node name
followed by spaces,
and then the sequence
np=number_of_processors/cores on your nodes,
followed optionally by any
character strings you may want
to use to set the nodes' "properties/features".
I hope this helps,
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
Thanks a lot Bart and Gus!
That was exactly the problem. I know someone was trying to configure mpich2 and may have accidentally changed the node file.
Thanks again,
Whelton
> Date: Thu, 20 Aug 2009 22:30:23 -0400
> From: g...@ldeo.columbia.edu