[Rocks-Discuss] One frontend, two "clusters"?

645 views
Skip to first unread message

Jim Kusznir

unread,
Dec 16, 2008, 12:22:48 PM12/16/08
to Discussion of Rocks Clusters
Hi all:

I currently have a "nice" 24-node 8-core/node cluster, all 64 bit,
running ROCKS.

I've also inherited an old, 16-node, 2-proc, 32-bit cluster with no
cluster management system, queuing system, or anything, and running
RedHat 9. The users are working on migrating their code to the new
cluster. Once that's complete, I get to reformat the old cluster and
try and make it useful again. This has gotten me thinking as to how
best to set this up.

My "ideal world" has already been shot full of many holes (just add
the nodes to the existing cluster and let them all be shared
transparently by users) for many good reasons.

My next-best-plan would be to have the same headnode "run" both
clusters, and just have two different queues that users submit to for
the two clusters (i.e., the nodes only belong to one queue or the
other). This has several advantages from my vantage point:

1) I only pay for one PGI license (as opposed to the two I currently pay for)
2) I get 16 compute nodes out of the old cluster (instead of the 15 +
1 headnode I'd get -- the new cluster has a dedicated head node)
3) I get a single login system and home directories (without having to
do extra work to try and combine them)
4) Users have the same interface/tools, they need only change one
option to submit to the right queue (rather than remembering there's
another cluster and logging into it)
5) it makes it easy to share the existing 30TB PVFS volume with the
old cluster (currently there's no access there)
6) Users can see the status of both clusters through a single ganglia
view (although some extra work will probably be required)

Potential disadvantages:

1) The whole compiling for 32-bit vs 64-bit problem
2) How does rocks deal with this situation, anyway?
3) have to set up a cross-kickstart environment on the head node

Does anyone have any other things to take into account? How does one
compile code for a 32bit target on a 64-bit headnode? Is this "a good
idea"? ( A good part of me has visions of a basically-always-idle
cluster if the two are set up seperate, so I'm very interested in
trying to combine them if it could work out decently...

--Jim

Tim Carlson

unread,
Dec 16, 2008, 1:24:40 PM12/16/08
to Discussion of Rocks Clusters
On Tue, 16 Dec 2008, Jim Kusznir wrote:

So how old are the "old" nodes. Can they run a 64bit OS? That would solve
the 32/64 problem nicely. If you users are migrating code to the new
cluster I assume they are recompiling everything for 64 bit?

Tim

Jonas Baltrusaitis

unread,
Dec 16, 2008, 1:39:59 PM12/16/08
to Discussion of Rocks Clusters
When I try to run ORCA without torque, I get the following error:

mr9540.local: Connection refused
p0_11729: p4_error: Child process exited while making connection to remote process on mr9540: 0
p0_11729: (33.007812) net_send: could not write to fd=4, errno = 32

using the command:

export PATH=/share/apps/orca_amd64_exe/:/opt/mpich/gnu/bin/:$PATH && /share/apps/orca_amd64_exe/orca testORCA.inp

The first part of the command adds ORCA and the correct mpi type to the front of the path while the second part (after the &&) runs the job. It seems from the error I got that there is a communication problem when using mpi, unfortunately I really don't know how to fix this...


Kaufman, Ian

unread,
Dec 16, 2008, 1:41:20 PM12/16/08
to Discussion of Rocks Clusters
Hi Jim,

>
> My "ideal world" has already been shot full of many holes (just add
> the nodes to the existing cluster and let them all be shared
> transparently by users) for many good reasons.
>
> My next-best-plan would be to have the same headnode "run" both
> clusters, and just have two different queues that users submit to
> for
> the two clusters (i.e., the nodes only belong to one queue or the
> other). This has several advantages from my vantage point:

How are these two situations different? To do the latter, you would
basically integrate them into the cluster as 32 bit nodes, and then
define two queues, one for 64 bit and one for 32 bit. I do this now.
Then, instead of the default queue being used, you make sure the users
specify which queue they want/need. Not quite transparent, but it
is pretty close.

> 1) I only pay for one PGI license (as opposed to the two I currently
> pay for)
> 2) I get 16 compute nodes out of the old cluster (instead of the 15
> +
> 1 headnode I'd get -- the new cluster has a dedicated head node)
> 3) I get a single login system and home directories (without having
> to
> do extra work to try and combine them)
> 4) Users have the same interface/tools, they need only change one
> option to submit to the right queue (rather than remembering there's
> another cluster and logging into it)
> 5) it makes it easy to share the existing 30TB PVFS volume with the
> old cluster (currently there's no access there)
> 6) Users can see the status of both clusters through a single
> ganglia
> view (although some extra work will probably be required)

All sound like great advanatges.


>
> Potential disadvantages:
>
> 1) The whole compiling for 32-bit vs 64-bit problem
> 2) How does rocks deal with this situation, anyway?
> 3) have to set up a cross-kickstart environment on the head node

Some compilers it is as simple as supplying a flag when building.
Or, just have them build code on one of the 32 bit nodes. You could
even have SGE submit the compile. Cross-kickstarting is not that
bad.

I believe that gcc compiles 32 bit code with "-m32" or "-march=i386".

Ian Kaufman
Research Systems Administrator
Jacobs School of Engineering

Bart Brashers

unread,
Dec 16, 2008, 1:56:54 PM12/16/08
to Discussion of Rocks Clusters
I don't know ORCA, but often MPI jobs need to be stared using "mpiexec"
or "mpirun".

Also, is /share/apps/orca_amd64_exe/orca a script, or the compiler
output? If it's a script, somewhere in it is a line that specifies the
"nodes" file, which machines to contact to start the MPI job. (mpiexec
handles this transparently when launched via torque -- it gets the list
of nodes assigned to the job via an environment variable.) It looks
like it's trying to connect to a machine named mr9540. Yours are likely
named compute-0-0 and compute-0-1.

If share/apps/orca_amd64_exe/orca, share it with us. If it's really an
executable (binary) file, then read `man mpiexec` or `man mpirun`.

Bart


This message contains information that may be confidential, privileged or otherwise protected by law from disclosure. It is intended for the exclusive use of the Addressee(s). Unless you are the addressee or authorized agent of the addressee, you may not review, copy, distribute or disclose to anyone the message or any information contained within. If you have received this message in error, please contact the sender by electronic reply to em...@environcorp.com and immediately delete all copies of the message.

Kirk Peterson

unread,
Dec 16, 2008, 1:57:33 PM12/16/08
to Discussion of Rocks Clusters
Jonas,

check to make sure ORCA is setup to use ssh rather than rsh when it
talks to the nodes. Otherwise it could be a number
of other things that would require more details to be given (and also
perhaps someone that was familiar with the parallel
implementation of ORCA - is there a user list? That seems like a much
better venue for this question )

-Kirk

Jonas Baltrusaitis

unread,
Dec 16, 2008, 2:12:56 PM12/16/08
to Discussion of Rocks Clusters
ORCA is an executable, not a script, that calls mpirun. I can also try to run the job on the compute nodes and it again fails with the same error just substitute compute-0-0 with mr9540 (mr9540 is my frontnode which I asked yesterday how to setup for torque purposes to run jobs on it). With torque it is true that you want to use mpiexec but that is not really important right now as it will not run outside of the queue.

--- On Tue, 12/16/08, Bart Brashers <bbra...@environcorp.com> wrote:

Jonas Baltrusaitis

unread,
Dec 16, 2008, 2:54:20 PM12/16/08
to Discussion of Rocks Clusters
Kirk, I don;t think it has anything to do with ORCA. It doesn't run outside the Torque and I am sure none of any other software packages will run now so I am trying to figure out where does this error come from. What test can be done to pinpoint that?


--- On Tue, 12/16/08, Kirk Peterson <kipe...@wsu.edu> wrote:

> From: Kirk Peterson <kipe...@wsu.edu>
> Subject: Re: [Rocks-Discuss] mpi communcation problem
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>

Jonas Baltrusaitis

unread,
Dec 16, 2008, 3:29:43 PM12/16/08
to Discussion of Rocks Clusters
This is what is mailed to the root account from torque. This is not ORCA anymore but PCGAMESS, so different software package. It looks like torque cannot copy files properly (or should I post it in torque forum?..).

PBS Job Id: 15
Job Name: testPCGAMESS
Exec host: compute-0-1/3+compute-0-1/2+compute-0-1/1+compute-0-1/0+compute-0-0/3+compute-0-0/2+compute-0-0/1+compute-0-0/0
An error has occurred processing your job, see below.
Post job file processing error; job 15 on host compute-0-1/3+compute-0-1/2+compute-0-1/1+compute-0-1/0+compute-0-0/3+compute-0-0/2+compute-0-0/1+compute-0-0/0

Unable to copy file /opt/torque/spool/15.OU to root@mr9540:/root/testPCGAMESS/testPCGAMESS.o15
>>> error from copy
Permission denied (publickey,gssapi-with-mic,password).
lost connection
>>> end error output
Output retained on that host in: /opt/torque/undelivered/15.mr9540.OU

Unable to copy file /opt/torque/spool/15.mr9540.ER to root@mr9540:/root/testPCGAMESS/testPCGAMESS.e15
>>> error from copy
Permission denied (publickey,gssapi-with-mic,password).
lost connection
>>> end error output
Output retained on that host in: /opt/torque/undelivered/15.mr9540.ER

--- On Tue, 12/16/08, Kirk Peterson <kipe...@wsu.edu> wrote:

> From: Kirk Peterson <kipe...@wsu.edu>
> Subject: Re: [Rocks-Discuss] mpi communcation problem
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> Date: Tuesday, December 16, 2008, 10:57 AM

Kirk Peterson

unread,
Dec 16, 2008, 3:50:23 PM12/16/08
to Discussion of Rocks Clusters
Jonas,

looks like it's the same problem as before. I think Bart pointed out
that the hostname of mr9540 is the smoking gun. You're evidently not
passing a valid hostfile to mpirun. If you run this in PBS/torque you
should pass the file that's referenced by the environment variable
$PBS_NODEFILE. You'll have to check what ORCA or GAMESS require.

-Kirk

Bart Brashers

unread,
Dec 16, 2008, 3:57:18 PM12/16/08
to Discussion of Rocks Clusters
It looks like your running this as root, rather than as a normal user.
Try creating a user, run "rocks sync users", then submit it as that
user.

Bart

> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu
[mailto:npaci-rocks-dis...@sdsc.edu] On
> Behalf Of Jonas Baltrusaitis
> Sent: Tuesday, December 16, 2008 12:30 PM
> To: Discussion of Rocks Clusters
> Subject: Re: [Rocks-Discuss] mpi communcation problem
>

Brandon Davidson

unread,
Dec 16, 2008, 3:59:12 PM12/16/08
to Discussion of Rocks Clusters
Hi Jonas,

It sounds like Torque is trying to use SSH to copy files between nodes, which
fails when it hits the password prompt. Usually you don't want Torque to do
this; you want it to just copy things into place with the understanding that
they will be available in the same place on the backend due to a NFS mount or
other shared filesystem. See this mailing list thread for more info:

https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/2008-June/031406.html

-Brandon

Jonas Baltrusaitis wrote:
> This is what is mailed to the root account from torque. This is not ORCA anymore but PCGAMESS, so different software package. It looks like torque cannot copy files properly (or should I post it in torque forum?..).
>
> PBS Job Id: 15
> Job Name: testPCGAMESS
> Exec host: compute-0-1/3+compute-0-1/2+compute-0-1/1+compute-0-1/0+compute-0-0/3+compute-0-0/2+compute-0-0/1+compute-0-0/0
> An error has occurred processing your job, see below.
> Post job file processing error; job 15 on host compute-0-1/3+compute-0-1/2+compute-0-1/1+compute-0-1/0+compute-0-0/3+compute-0-0/2+compute-0-0/1+compute-0-0/0
>
> Unable to copy file /opt/torque/spool/15.OU to root@mr9540:/root/testPCGAMESS/testPCGAMESS.o15
>>>> error from copy
> Permission denied (publickey,gssapi-with-mic,password).
> lost connection
>>>> end error output
> Output retained on that host in: /opt/torque/undelivered/15.mr9540.OU
>
> Unable to copy file /opt/torque/spool/15.mr9540.ER to root@mr9540:/root/testPCGAMESS/testPCGAMESS.e15
>>>> error from copy
> Permission denied (publickey,gssapi-with-mic,password).
> lost connection
>>>> end error output
> Output retained on that host in: /opt/torque/undelivered/15.mr9540.ER


--
Brandon Davidson
Systems Administrator
University of Oregon Neuroinformatics Center
(541) 346-2417 bran...@uoregon.edu
Key Fingerprint 1F08 A331 78DF 1EFE F645 8AE5 8FBE 4147 E351 E139

Gus Correa

unread,
Dec 16, 2008, 4:10:04 PM12/16/08
to Discussion of Rocks Clusters
Are you trying to run as root or regular user?

Jonas Baltrusaitis

unread,
Dec 16, 2008, 4:12:16 PM12/16/08
to Discussion of Rocks Clusters
But I am passing a valid hostfile... Below is my submission script

#!/bin/sh
#PBS -l nodes=2:ppn=4
mpirun -m $PBS_NODEFILE -np 8 /share/apps/pcg71c -r -f -p -i /root/testPCGAMESS/testPCGAMESS.inp -o /root/testPCGAMESS/testPCGAMESS.out -ex /share/apps/pcg71c/fastdiag.ex&pcgp2p.ex -t /share/apps/scratch/

Bart Brashers

unread,
Dec 16, 2008, 4:12:45 PM12/16/08
to Discussion of Rocks Clusters
Although you should have something like this in place by default:

[root torque/mom_priv]# ssh c0-0 cat /opt/torque/mom_priv/config
$pbsserver [frontend].local
$usecp [frontend.domain.com]:/home /home

The more important aspect is that you're trying to run this in /root,
and/or as root.

Bart

Jonas Baltrusaitis

unread,
Dec 16, 2008, 4:12:39 PM12/16/08
to Discussion of Rocks Clusters
root


--- On Tue, 12/16/08, Gus Correa <g...@ldeo.columbia.edu> wrote:

Brandon Davidson

unread,
Dec 16, 2008, 4:30:17 PM12/16/08
to Discussion of Rocks Clusters
It's been said before, but...

*** Do Not Submit Jobs As Root ***

Create a new user, make sure that its home directory is available on the backend
nodes by running 'rocks sync user', and then log in as that user and submit your
jobs.

Besides all the myriad other reasons not to do things as root, its home
directory is not shared between nodes. This will cause Torque to attempt to use
SSH to stage files in and out, leading to the problem addressed in my previous
email.

-Brandon

Jonas Baltrusaitis wrote:
> But I am passing a valid hostfile... Below is my submission script
>
> #!/bin/sh
> #PBS -l nodes=2:ppn=4
> mpirun -m $PBS_NODEFILE -np 8 /share/apps/pcg71c -r -f -p -i /root/testPCGAMESS/testPCGAMESS.inp -o /root/testPCGAMESS/testPCGAMESS.out -ex /share/apps/pcg71c/fastdiag.ex&pcgp2p.ex -t /share/apps/scratch/

Gus Correa

unread,
Dec 16, 2008, 4:39:13 PM12/16/08
to Discussion of Rocks Clusters
Hi Jonas

As Bart pointed out, you should run as a regular user.
Create one, do "rocks sync users",
login as this user, submit the job, as Bart recommended.

You may still have

/opt/torque/spool/15.OU

on one of the nodes.

Check with
cluster-fork 'ls /opt/torque/spool/'
and
cluster-fork 'ls /opt/torque/undelivered/'

It would help if you show your Torque/PBS scripts,
mpirun/mpiexec commands, etc.

We're not all computational chemists.
Are PCGAMESS and ORCA precompiled executables, or did you
compile them from the source code?

Gus Correa

Jim Kusznir

unread,
Dec 16, 2008, 4:40:45 PM12/16/08
to Discussion of Rocks Clusters
The nodes in question are "Intel(R) Xeon(TM) CPU 3.06GHz" (dual-cpu
with hyperthreading). My understanding is that these are 32-bit CPUs
(but I'd love to be corrected!).

Brandon Davidson

unread,
Dec 16, 2008, 5:19:29 PM12/16/08
to Discussion of Rocks Clusters
Hi Jim,

Jim Kusznir wrote:
> The nodes in question are "Intel(R) Xeon(TM) CPU 3.06GHz" (dual-cpu
> with hyperthreading). My understanding is that these are 32-bit CPUs
> (but I'd love to be corrected!).

I run a system somewhat like this - I have a 64bit frontend, 16 8-way 64-bit
Xeons, a 32-bit login node, and 16 dual-core-HT 32-bit Xeons. I have one queue
for the 64-bit nodes, one for the 32-bit nodes, and one for all of them.

Rocks doesn't support this configuration out of the box, as it will only boot
nodes of the same architecture as the frontend - you'd have to pop in the Kernel
DVD whenever you wanted to rebuilt the 32-bit backends. I have a patch for 5.0
that fixes this that I think will be pretty easy to port to 5.1, although I
haven't tried yet. It would probably also allow you to build a 32-bit Xen
virtual cluster on 64-bit nodes, which could be interesting.

-Brandon

Jonas Baltrusaitis

unread,
Dec 16, 2008, 6:03:06 PM12/16/08
to Discussion of Rocks Clusters
Unfortunately, I am not a comp chemist either but rather a wannabe.

OK, so now it's better. I created a user. I submit a PCGAMESS jobs with following script and get a following error. At least I am getting something!..

#!/bin/sh
#PBS -l nodes=2:ppn=4

mpirun -H $PBS_NODEFILE -n 8 /share/apps/pcg71c -r -f -p -i /home/jbaltrus/testPCGAMESS/testPCGAMESS.inp -o /home/jbaltrus/testPCGAMESS/testPCGAMESS.out -ex /share/apps/pcg71c/fastdiag.ex&/share/apps/pcg71c/pcgp2p.ex -t /share/apps/scratch/

Error:

ssh: /opt/torque/aux//36.mr9540.healthcare.uiowa.edu: Name or service not known

/opt/torque/mom_priv/jobs/36.mr9540.healthcare.uiowa.edu.SC: line 3: /share/apps/pcg71c/pcgp2p.ex: cannot execute binary file
mpirun: killing job...

Bart Brashers

unread,
Dec 16, 2008, 6:28:51 PM12/16/08
to Discussion of Rocks Clusters
What is the output of

# ls -lF /share/apps/pcg71c/pcgp2p.ex
# ssh c0-0 /share/apps/pcg71c/pcgp2p.ex

It says it can't execute (run) the file...

I'm not sure what the "ssh: " line is all about.

Bart

> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu
[mailto:npaci-rocks-dis...@sdsc.edu] On
> Behalf Of Jonas Baltrusaitis

Jonas Baltrusaitis

unread,
Dec 16, 2008, 6:38:03 PM12/16/08
to Discussion of Rocks Clusters
Exactly, what is the ssh about?

[root@mr9540 ~]# ls -lF /share/apps/pcg71c/pcgp2p.ex
-rwxr-xr-x 1 root root 35328 Dec 15 19:15 /share/apps/pcg71c/pcgp2p.ex*
[root@mr9540 ~]# ssh c0-0 /share/apps/pcg71c/pcgp2p.ex
ssh: c0-0: Name or service not known
[root@mr9540 ~]# ssh compute-0-0 /share/apps/pcg71c/pcgp2p.ex
bash: /share/apps/pcg71c/pcgp2p.ex: cannot execute binary file
[root@mr9540 ~]#


--- On Tue, 12/16/08, Bart Brashers <bbra...@environcorp.com> wrote:

Gus Correa

unread,
Dec 16, 2008, 6:46:07 PM12/16/08
to Discussion of Rocks Clusters
Hi Jonas

Some wild guesses.

I saw on the PCGAMESS site that they have over 64 pre-compiled versions!
Which one did you install?
I would try the version with MPICH static libraries and ssh first.
Perhaps optimized for your architecture (but maybe just the generic
"Pentium").
You certainly need an ssh version (Rocks uses ssh not rsh).

Also, I haven't seen any 64-bit version of PCGAMESS (but I didn't search
much).
Not sure if your system is 64-bit.

In any case, I presume PCGAMESS is linked to MPICH-1, right?
In this case, you need to launch it with the mpirun that comes with MPICH-1.
Most likely this is not the first mpirun on your path.
What is the output of "which mpirun"?
On my Rocks 4.3 this is the OpenMPI mpirun, not the MPICH-1 mpirun.

However, I have the MPICH-1 mpirun on:
/opt/mpich/gnu/bin/mpirun
You need to find the correct one on your system
(the "locate" command is your friend), and use the full path name
on your PBS script.


I hope this helps,

Bart Brashers

unread,
Dec 16, 2008, 6:46:31 PM12/16/08
to Discussion of Rocks Clusters
I take it back, perhaps the "ssh: " line has to do with the FQDN of the
frontend, and whether that can be resolved by the compute nodes.
Torque/PBS is picky about this sort of thing. You got conflicting
advice yesterday about the entry for the frontend in
/opt/torque/server_priv/nodes. Which did you end up using? What's the
output of

# cat /opt/torque/server_priv/nodes

Bart

Bart Brashers

unread,
Dec 16, 2008, 6:47:42 PM12/16/08
to Discussion of Rocks Clusters
Sorry, my typo. The 2nd line should have been

# ssh c0-0 ls -lF /share/apps/pcg71c/pcgp2p.ex

Brandon Davidson

unread,
Dec 16, 2008, 6:56:19 PM12/16/08
to Discussion of Rocks Clusters
Hi Jonas,

Which MPI are you using? Unless you are using a Torque-aware MPI, it will still
want to use SSH to launch your jobs. Both OpenMPI and LAM can be built to use
Torque, but neither Rocks or the Torque roll provides one... so you'll have to
make sure that you have SSH keys set up in advance.

Additionally, it sounds like your mpirun is expecting the -H argument to be a
list of hosts, not a file containing the list.

You might try running a simple MPI hello-world app before moving up to an actual
application, just to make sure that the basics are all in place.

-Brandon

Bart Brashers

unread,
Dec 16, 2008, 7:06:50 PM12/16/08
to Discussion of Rocks Clusters
I would also suggest you remove the frontend as an execution host (take
it out of /opt/torque/server_priv/nodes and re-start pbs_server) until
you get stuff working on the compute nodes. Then add the complication
of the frontend participating in Torque jobs.

Bart

Jonas Baltrusaitis

unread,
Dec 16, 2008, 7:09:11 PM12/16/08
to Discussion of Rocks Clusters
[root@mr9540 ~]# cat /opt/torque/server_priv/nodes
mr9540.chem.uiowa.edu np=8
compute-0-0 np=4
compute-0-1 np=4
[root@mr9540 ~]#

looks alright to me


--- On Tue, 12/16/08, Bart Brashers <bbra...@environcorp.com> wrote:

Jonas Baltrusaitis

unread,
Dec 16, 2008, 7:12:15 PM12/16/08
to Discussion of Rocks Clusters
this is what I got, they told me it will run fine on 64 bit

Linux MPICH (using ssh as remote shell by default), fully statically linked Serial/parallel Linux binaries linked with MPICH, optimized for Pentium 4, Pentium D, Xeon, Intel Core 2 (Conroe/Merom/Woodcrest/Clovertown etc..., Penryn/Harpertown etc...), Intel Core i7 (Nehalem etc..) processors, as well as for AMD Phenom (tri- and four-core)/AMD Barcelona (four-core Opterons) processors.

Jonas Baltrusaitis

unread,
Dec 16, 2008, 7:10:07 PM12/16/08
to Discussion of Rocks Clusters
[root@mr9540 ~]# ssh c0-0 ls -lF /share/apps/pcg71c/pcgp2p.ex

ssh: c0-0: Name or service not known
[root@mr9540 ~]#

--- On Tue, 12/16/08, Bart Brashers <bbra...@environcorp.com> wrote:

Bart Brashers

unread,
Dec 16, 2008, 7:41:24 PM12/16/08
to Discussion of Rocks Clusters
I think they took out the "short names" of compute nodes in Rocks 5.1.
I'm using 5.0, and forgot you're using 5.1. Sorry about that. Try
this:

# ssh compute-0-0 ls -lF /share/apps/pcg71c/pcgp2p.ex

I just want to make sure the auto-mounter is working correctly.

Jonas Baltrusaitis

unread,
Dec 16, 2008, 7:54:00 PM12/16/08
to Discussion of Rocks Clusters
[root@mr9540 ~]# ssh compute-0-0 ls -lF /share/apps/pcg71c/pcgp2p.ex

-rwxr-xr-x 1 root root 35328 Dec 15 19:15 /share/apps/pcg71c/pcgp2p.ex*
[root@mr9540 ~]#

Bart Brashers

unread,
Dec 16, 2008, 8:09:04 PM12/16/08
to Discussion of Rocks Clusters
> [root@mr9540 ~]# cat /opt/torque/server_priv/nodes
> mr9540.chem.uiowa.edu np=8
> compute-0-0 np=4
> compute-0-1 np=4
> [root@mr9540 ~]#
>
> looks alright to me

I don't think so. I'm pretty sure you want just the name of the
frontend, without the FQDN, like so:

mr9540 np=8
compute-0-0 np=4
compute-0-1 np=4

You can certainly test it easily, using a simple script:

# cat testme.csh
#!/bin/csh -f
echo $hostname
sleep 60

# qsub -l nodes=mr9540 testme.csh

And see if it works.

Bart

Jonas Baltrusaitis

unread,
Dec 16, 2008, 8:24:22 PM12/16/08
to Discussion of Rocks Clusters
jbaltrus@mr9540 ~]$ qsub -l nodes=mr9540 testme.csh
qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
[jbaltrus@mr9540 ~]$

Do you want me to revert to mr9540? Let me see if you thought me right:

change it in /opt/torque/server_priv/nodes and re-start pbs_server?


--- On Tue, 12/16/08, Bart Brashers <bbra...@environcorp.com> wrote:

> From: Bart Brashers <bbra...@environcorp.com>
> Subject: Re: [Rocks-Discuss] mpi communcation problem
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>

Jonas Baltrusaitis

unread,
Dec 16, 2008, 9:16:51 PM12/16/08
to Discussion of Rocks Clusters
If I do # qsub -l nodes=mr9540 testme.csh it gives me error qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
even after removing the FQDN from nodelist. If I do # qsub -l nodes=compute-0-0 testme.csh it successfully rings in the que and finishes with two empty files. I assume that's a success

I will try to run compute-0-0 instead of $PBS_NODEFILE and see if I can get any of the prorgams going. BTW, where is the $PBS_NODEFILE and how can I cahnge it so it omits frontend for now?


--- On Tue, 12/16/08, Bart Brashers <bbra...@environcorp.com> wrote:

> From: Bart Brashers <bbra...@environcorp.com>
> Subject: Re: [Rocks-Discuss] mpi communcation problem
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>
> Date: Tuesday, December 16, 2008, 5:09 PM

Gianluca Cecchi

unread,
Dec 17, 2008, 5:19:04 AM12/17/08
to Discussion of Rocks Clusters
On Tue, Dec 16, 2008 at 10:40 PM, Jim Kusznir <jkus...@gmail.com> wrote:
> The nodes in question are "Intel(R) Xeon(TM) CPU 3.06GHz" (dual-cpu
> with hyperthreading). My understanding is that these are 32-bit CPUs
> (but I'd love to be corrected!).
>

Do a cat /proc/cpuinfo and see if it has the lm flag that stands for long mode
Or put a 64bit live cd on christmas when you can stop the node... ;-)
But you also have to be sure eventually that in that case the
motherboard support 64bit too.
I remember a post in another mailing list where indeed an Intel(R)
Xeon(TM) CPU 3.00GHz had both lm and ht (hyperthreading) flags set but
the mobo used was not 64bit capable...
to be verified also the actual 64bit performance of these cpus

In general the definitions of the flags are in
include/asm/cpufeature.h of kernel tree

HIH,
Gianluca

Gus Correa

unread,
Dec 17, 2008, 12:15:55 PM12/17/08
to Discussion of Rocks Clusters
Hi Jonas

$PBS_NODEFILE is generated on the fly by Torque/PBS,
one file for each job, based on the /opt/torque/server_priv/nodes,
and on the availability of resources.

If you want to modify something and remove the frontend for now,
the right file to edit is /opt/torque/server_priv/nodes.
You must restart the pbs_server (on the frontend) after you edit the file,
for the change to take place ( service pbs_server stop; service
pbs_server start)
Bart and Brandon recommended that, and I second their suggestion,
at least until you get the basic functionality to work.

Also, try Brandon's suggestion to run a basic MPI program,
to check if the nodes can talk to each other.

If you want to stick to MPICH-1, which seems to be what PCGAMESS uses,
there are very simple example programs in /opt/mpich/gnu/examples/
(at least on my Rocks 4.3). Try cpi.c, maybe the hello++.cc also.
To avoid confusion with your PATH,
use full path names to compile /opt/mpich/gnu/bin/mpicc, etc,
and for the mpirun (/opt/mpich/gnu/bin/mpirun)
launcher on your PBS script too.
This will help to check if you have the basic functionality of
Torque and MPICH working.

Gus Correa

---------------------------------------------------------------------
Gustavo Correa, PhD - Email: g...@ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
[

Gus Correa

unread,
Dec 17, 2008, 12:31:13 PM12/17/08
to Discussion of Rocks Clusters
Hi Jonas

Yes, the 32-bit (i686) executable is likely to work on a 64-bit (x86_64)
processor,
AMD or Intel. It will run in 32-bit mode.
Since you've got the version optimized for Pentium 4 and above,
you need to make sure it matches your processors.
Most likely yes, since they are 64-bit, which came after Pentium 4.
Which processors do you have?

Try my suggestions below, for when you go back to PCGAMESS.

Gus Correa

unread,
Dec 17, 2008, 1:00:32 PM12/17/08
to Discussion of Rocks Clusters
Hi Jonas

Jonas Baltrusaitis wrote:
> If I do # qsub -l nodes=mr9540 testme.csh it gives me error qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
>

Still running as root, as the # prompt above suggests?
The rootly super power doesn't help here.


> even after removing the FQDN from nodelist.

The FQDN associates to the frontend eth1,
your outside-looking Internet interface.
The cluster-local frontend name (mr9540 I presume) associates to the
frontend eth0,
which is the inside looking interface,
part of the 10.0.0.0 (unless you changed it) private subnet
that all compute nodes belong to.
The latter should be used by Torque and by MPI,
as the traffic of both flows across the private subnet.

Did you restart the pbs_server, after editing the nodes file?


> If I do # qsub -l nodes=compute-0-0 testme.csh it successfully rings in the que and finishes with two empty files. I assume that's a success
>

There seems to be a minor glitch.
On "testme.csh" try
echo $HOSTNAME
instead of
echo $hostname.
The environment variable (uppercase) is defined naturally by your login
shell.
Bart's script is meant to write the hostname(s) to the PBS script
"stdout" file.
Your "finishes with two empty files", if you are referring to stdout and
stderr,
is not necessarily success.

Again, if you are running as root, you are asking for troubling confusion.
Submit jobs as a regular user.


> I will try to run compute-0-0 instead of $PBS_NODEFILE and see if I can get any of the prorgams going. BTW, where is the $PBS_NODEFILE and how can I cahnge it so it omits frontend for now?
>
>

Make it easy on yourself.
Remove the frontend from the node file for now, restart the pbs_server,
as Bart and Brandon suggested.
You can add it later, after you sort this out.

Good luck!

Gus Correa

publications

unread,
Dec 17, 2008, 6:48:23 PM12/17/08
to Discussion of Rocks Clusters
Did you specify:

# set so no ssh problems with pcgamess
export P4_RSHCOMMAND=ssh

As suggested on the PCGamess/ Firefly web page?

I had your problem when I 1st started. I placed:

# set so no ssh problems with pcgamess
export P4_RSHCOMMAND=ssh

In my /etc/profile file and now the problem is resolved.

Jim

> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu
> [mailto:npaci-rocks-dis...@sdsc.edu] On Behalf Of
> Jonas Baltrusaitis

Jonas Baltrusaitis

unread,
Dec 22, 2008, 2:59:50 PM12/22/08
to Discussion of Rocks Clusters
After a fresh frontend install and first boot up as a root I start terminal to do insert-ethers. The message immediately pops up:


It doesn't appear that you have set up your ssh key.
This process will make the files:
/root/.ssh/id_rsa.pub
/root/.ssh/id_rsa
/root/.ssh/authorized_keys

Generating public/private rsa key pair.
inEnter file in which to save the key (/root/.ssh/id_rsa)


What do I enter for file name and, later on, for passwords?



Kaufman, Ian

unread,
Dec 22, 2008, 3:06:25 PM12/22/08
to Discussion of Rocks Clusters
You can just accept the defaults if you wish, and depending
on how security minded you are, ignore the passphrase.

Ian Kaufman
Research Systems Administrator
Jacobs School of Engineering
ikau...@soe.ucsd.edu x49716


> -----Original Message-----
> From: npaci-rocks-dis...@sdsc.edu [mailto:npaci-rocks-
> discussio...@sdsc.edu] On Behalf Of Jonas Baltrusaitis
> Sent: Monday, December 22, 2008 12:00 PM
> To: Discussion of Rocks Clusters

Jonas Baltrusaitis

unread,
Dec 24, 2008, 2:57:15 PM12/24/08
to Discussion of Rocks Clusters
When I try simple sge script I can' execute following (as per sge manual)

$ cp /opt/mpi-tests/src/*.c .
$ cp /opt/mpi-tests/src/Makefile .
$ make

[jbaltrus@mr9540 test]$ cp /opt/mpi-tests/src/*.c
cp: cannot create regular file `/opt/mpi-tests/src/mpi-verify.c': Permission denied
[jbaltrus@mr9540 test]$

Following that I create file /test/mpi-ring.qsub with

#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#
/opt/openmpi/bin/mpirun -np $NSLOTS $HOME/test/mpi-ring

and when I submit it with [jbaltrus@mr9540 test]$ qsub -pe orte 4 mpi-ring.qsub

I get

Warning: Permanently added 'compute-0-0.local' (RSA) to the list of known hosts.

[compute-0-0.local:16126] [0,0,1] ORTE_ERROR_LOG: Not found in file odls_default_module.c at line 1191
--------------------------------------------------------------------------
Failed to find or execute the following executable:

Host: compute-0-0.local
Executable: /home/jbaltrus/test/mpi-ring

Cannot continue.
--------------------------------------------------------------------------
[compute-0-0.local:16126] [0,0,1] ORTE_ERROR_LOG: Not found in file orted.c at line 626


Could anybody tell me what's wrong with SGE/permissions?

thanks

Jonas



Jonas Baltrusaitis

unread,
Dec 27, 2008, 1:12:43 PM12/27/08
to Discussion of Rocks Clusters
I setup a rudimentary SGE script to run ORCA, but when I run it, ORCA starts and I get an error "Cannot open input file: -p4pg". I contacted ORCA they said this is not their file. It is associated with PCGAMESS (which I also use), they recommend to use it in their submit line, I had similar script to submit PCGAMESS

Jonas

#!/bin/bash
#$ -N xxxr
#$ -S /bin/bash
#MPI is also available. Simply substitute "mpi" for "mpich"
#$ -pe mpich 4
#$ -cwd
#$ -o xxxr.out
#$ -e xxxr.err
#$ -notify

/opt/mpich/gnu/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines /share/apps/orca_amd64_exe/orca



Jonas Baltrusaitis

unread,
Dec 27, 2008, 6:03:15 PM12/27/08
to Discussion of Rocks Clusters
For no reason SGE stopped submitting my jobs. I haven't changed anything. I stopped started SGE but jobs just sit in qw forever, no queue name is given

Incidentally, how do I add frontend to so I can run jobs on it using SGE?

job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
53 0.55500 RhC20As4H3 jbaltrus qw 12/27/2008 16:59:10 4
[jbaltrus@mr9540 RhC20As4H32Cl_CO2]$


Jonas Baltrusaitis

unread,
Dec 27, 2008, 7:08:01 PM12/27/08
to Discussion of Rocks Clusters
Restarted the computes and now it's working. Weird...

The other question remains: how do I setup SGE so it submits jobs not only to computes, but also to the frontend (which has as many processors as computes do)


--- On Sat, 12/27/08, Jonas Baltrusaitis <jasi...@yahoo.com> wrote:

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 28, 2008, 7:21:48 AM12/28/08
to Discussion of Rocks Clusters
please check the mail archives

--
Hung-Sheng Tsao, Ph.D. (LaoTsao) Sr. System Engineer
US, GEH East TS Ambassador
400 Atrium Dr, 1ST FLOOR P/F:1877 319 0460 (x67079)
Somerset, NJ 08873 C: 973 495 0840
http://blogs.sun.com/hstsao/ E:Hung-Sh...@sun.com
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NOTICE: This email message is for the sole use of the intended
recipient(s) and may contain confidential and privileged
information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply email and destroy
all copies of the original message.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Jonas Baltrusaitis

unread,
Dec 28, 2008, 2:13:23 PM12/28/08
to Discussion of Rocks Clusters
Checked, found "qmon" and install_execd. Neither of them seems to work

[root@mr9540 ~]# /opt/gridengine/install_execd
/opt/gridengine/install_execd: line 34: /root/inst_sge: No such file or directory
/opt/gridengine/install_execd: line 34: exec: /root/inst_sge: cannot execute: No such file or directory
[root@mr9540 ~]# qmon
Error: Can't open display:
[root@mr9540 ~]# qconf

--- On Sun, 12/28/08, Dr. Hung-Sheng Tsao (LaoTsao) <Hung-Sh...@sun.com> wrote:

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 28, 2008, 5:18:02 PM12/28/08
to Discussion of Rocks Clusters

it seems that for some unknow reason Ur SGE_ROOT is not set
source /opt/gridengine/default/common/settings.sh
env |grep SGE
should see
SGE_CELL=default
SGE_ARCH=lx26-amd64
SGE_EXECD_PORT=537
SGE_QMASTER_PORT=536
SGE_ROOT=/opt/gridengine

cd /opt/gridengine
./install_execd
answer all default

May want to reduce the slot for the frontend so U have CPU resource for
other usage for ROCKS.

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 28, 2008, 5:39:30 PM12/28/08
to Discussion of Rocks Clusters

under /etc/profile.d
U should see
sge-binaries.[c]sh
that set the SGE_ROOT etc

Jonas Baltrusaitis

unread,
Dec 28, 2008, 5:52:31 PM12/28/08
to Discussion of Rocks Clusters
done. working. except I missed the part where it assigned number of processors on the frontend. so by default it assigned only one (out of 8). how do I change that?

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 28, 2008, 6:10:48 PM12/28/08
to Discussion of Rocks Clusters
qconf -sq all.q |grep slots
what is the output?
for 8 core U may want to assign 4 slots for execd

qconf -mattr queue slots '[<frontend>=4]' all.q

<frontend> is the output of frontend name in
qconf -sq all.q |grep slots
hth

Jonas Baltrusaitis

unread,
Dec 28, 2008, 6:21:02 PM12/28/08
to Discussion of Rocks Clusters
something weird, I don't see my frontend name

[jbaltrus@mr9540 RhC20As4H32Cl_CO2]$ qconf -sq all.q |grep slots
slots 1,[compute-0-0.local=4],[compute-0-1.local=4], \
[jbaltrus@mr9540 RhC20As4H32Cl_CO2]$

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 28, 2008, 6:41:30 PM12/28/08
to Discussion of Rocks Clusters

may be U add the frontend later
do
cd /var/tmp
qconf -sq all.q >all.q.out
grep <frontend> all.q.out

Jonas Baltrusaitis

unread,
Dec 28, 2008, 6:55:20 PM12/28/08
to Discussion of Rocks Clusters
I suppose I add the frontend but it still doesn't show and it still has one processor only

[root@mr9540 ~]# cd /var/tmp
[root@mr9540 tmp]# qconf -sq all.q >all.q.out
[root@mr9540 tmp]# grep mr9540 all.q.out
[mr9540.local=1]
[root@mr9540 tmp]# qconf -sq all.q |grep slots


slots 1,[compute-0-0.local=4],[compute-0-1.local=4], \

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 28, 2008, 7:07:34 PM12/28/08
to Discussion of Rocks Clusters
frontend: mr9540.local is there in the all.q
there are many line for slots so grep slots does not show the frontend

just do
qconf -mattr queue slots '[mr9540.local=4]' all.q
to assign 4 slots for mr9540.local
verify it
qconf -sq all.q |grep mr9540

Jonas Baltrusaitis

unread,
Dec 29, 2008, 4:59:11 PM12/29/08
to Discussion of Rocks Clusters
The frontend suddenly became incredibly slow (all I did was I added it to SGE host list so I can execute jobs on it). I added all 8 processors, then switched to 4 processors: still the same. I can 'gauge' that by login: it takes 15-20 seconds to actually login to it via ssh on the local network. originally folders and command line would pop-up instantly. What's the more sophisticated way to see if it slowed down and why that happened?

thanks

Jonas



Elvedin Trnjanin

unread,
Dec 29, 2008, 5:06:51 PM12/29/08
to Discussion of Rocks Clusters
Run 'top' to see processor and memory usage per process and 'iostat' for
statistics on your IO devices such as disk drives.

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 5:14:15 PM12/29/08
to Discussion of Rocks Clusters
how much memory for frontend?
one should never allocate 8/8 for execd,
the other issues is the memory that used by Ur programs

--

Jonas Baltrusaitis

unread,
Dec 29, 2008, 6:13:41 PM12/29/08
to Discussion of Rocks Clusters
I allocated 4 out of 8. This is the output of top. pcgamess are the jobs I am running on both of my compute nodes, frontend sits empty. It's interesting but I see 3 pcgamess processes (I am running only two, though...) Since I am not familiar with what info top provides I'll rely on your response

top - 17:11:26 up 6 days, 3:29, 3 users, load average: 6.42, 6.19, 6.06
Tasks: 159 total, 6 running, 153 sleeping, 0 stopped, 0 zombie
Cpu(s): 26.7%us, 71.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 2.0%si, 0.0%st
Mem: 8124416k total, 5889708k used, 2234708k free, 352236k buffers
Swap: 4923868k total, 0k used, 4923868k free, 4331840k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7535 jbaltrus 25 0 894m 209m 18m S 33.3 2.6 48:50.51 pcgamess
7536 jbaltrus 25 0 894m 198m 18m S 33.3 2.5 48:50.47 pcgamess
7537 jbaltrus 25 0 893m 198m 18m R 33.3 2.5 48:50.44 pcgamess
1 root 15 0 10324 692 580 S 0.0 0.0 0:00.20 init
2 root RT -5 0 0 0 S 0.0 0.0 0:00.01 migration/0
3 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/0
4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
26 root 10 -5 0 0 0 S 0.0 0.0 0:00.15 events/0
34 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
35 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 kthread
37 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 xenwatch
38 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 xenbus
47 root 10 -5 0 0 0 S 0.0 0.0 0:00.02 kblockd/0
55 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid
196 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0
207 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khubd
209 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod
310 root 25 0 0 0 0 S 0.0 0.0 0:00.00 pdflush
311 root 15 0 0 0 0 S 0.0 0.0 0:01.06 pdflush
312 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kswapd0
313 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0
463 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused
542 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0
550 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux
560 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
561 root 14 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1
562 root 11 -5 0 0 0 S 0.0 0.0 0:00.01 scsi_eh_2
563 root 11 -5 0 0 0 S 0.0 0.0 0:00.01 scsi_eh_3
564 root 10 -5 0 0 0 S 0.0 0.0 0:00.24 kjournald
591 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kauditd
623 root 21 -4 12636 820 372 S 0.0 0.0 0:00.33 udevd
1970 root 17 0 3760 380 316 S 0.0 0.0 0:00.00 change_console
2131 root 16 -5 0 0 0 S 0.0 0.0 0:00.00 kmpathd/0
2175 root 10 -5 0 0 0 S 0.0 0.0 0:01.50 kjournald
2177 root 10 -5 0 0 0 S 0.0 0.0 0:00.36 kjournald
2825 root 12 -3 18356 712 512 S 0.0 0.0 0:00.01 auditd
2827 root 12 -3 16228 740 596 S 0.0 0.0 0:00.01 audispd
3009 root 18 0 170m 8976 2380 S 0.0 0.1 4:22.22 greceptor
3020 root 15 0 10088 772 596 S 0.0 0.0 0:00.30 syslogd
3023 root 15 0 3784 424 340 S 0.0 0.0 0:00.00 klogd
3036 root 18 0 10708 392 248 S 0.0 0.0 0:00.41 irqbalance
3085 rpc 15 0 8028 652 516 S 0.0 0.0 0:00.01 portmap
3100 sge 25 0 231m 4140 2684 S 0.0 0.1 0:08.44 sge_qmaster
3106 root 18 0 10120 768 636 S 0.0 0.0 0:00.00 rpc.statd
3120 sge 15 0 171m 2372 1760 S 0.0 0.0 0:00.90 sge_schedd
3149 root 15 0 48648 700 272 S 0.0 0.0 0:00.02 rpc.idmapd
3175 nobody 18 0 105m 1412 852 S 0.0 0.0 6:48.01 gmetad
3195 dbus 15 0 21376 876 556 S 0.0 0.0 0:00.84 dbus-daemon
3248 root 21 0 39892 1616 1204 S 0.0 0.0 0:00.05 automount
3267 root 18 0 3780 564 460 S 0.0 0.0 0:00.00 acpid
3367 root 15 0 142m 6416 3004 S 0.0 0.1 0:03.86 snmpd
3382 root 15 0 60528 1220 672 S 0.0 0.0 0:00.06 sshd



Gus Correa

unread,
Dec 29, 2008, 6:45:49 PM12/29/08
to Discussion of Rocks Clusters
Hi Jonas, list

I am not sure I understood what you said.
How did you launch PCGAMESS?
What is the mpirun command?
How many processes did you launch (2, 3, 4 or 8?),
and what is in your machines file (or equivalent from Torque or SGE)?
Where does this "top" output come from? (frontend, or which compute node?)

Somehow the program seems to run on a single CPU,
three processes sharing it 33% for each one, as shown on the %CPU column
of top.
While on top, type "1" (number one), to show the activity on each CPU/core.

Gus Correa

Jonas Baltrusaitis

unread,
Dec 29, 2008, 6:55:51 PM12/29/08
to Discussion of Rocks Clusters
I am confused, too. Let me lay it out. I have a frontend (8 cores) and two computes (4 cores each). Currently I am running two pcgamess jobs, each 4 cores, thus both computes are running a single job. Top comes from the frontend, which supposed to be idle.

This is my submit line
/opt/mpich/gnu/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines /share/apps/pcg71c/pcgamess -i /home/jbaltrus/RhC20As4H32Cl_CO2/RhC20As4H32Cl_CO2.inp -ex /share/apps/pcg71c -t /share/apps/scratch/tmp

Since I am confused I will appreciate any help

Jonas

PS why would frontend cpus be involved in pcgamess jobs anyway?.. something is not right. I am running SGE. HOw do I check machines file?


--- On Mon, 12/29/08, Gus Correa <g...@ldeo.columbia.edu> wrote:

> From: Gus Correa <g...@ldeo.columbia.edu>
> Subject: Re: [Rocks-Discuss] frontend slow
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 6:09:36 PM12/29/08
to Discussion of Rocks Clusters
top on frontend indicate that U have 3 pcgamess on frontend each use vm
800MB, and 33% of CPU.

qstat -f should show U what's running on execd-host

the load ave is 6.5 for 8CPU system and some 2.2GB memory free

try to reduce the frontend slot to 1,
qdel <JID> to delete job
and submit again through qsub

all job need to be submitted by qsub

--

Jonas Baltrusaitis

unread,
Dec 29, 2008, 7:15:07 PM12/29/08
to Discussion of Rocks Clusters
that's the trick: those jobs are running on computes. I have 4 out of 8 on the frontend enabled. I also always use qsub, the script includes mpirun, though

queuename qtype used/tot. load_avg arch states
----------------------------------------------------------------------------
al...@compute-0-0.local BIP 4/4 4.07 lx26-amd64
77 0.55500 RhC20As4H3 jbaltrus r 12/29/2008 16:21:11 4
----------------------------------------------------------------------------
al...@compute-0-1.local BIP 4/4 4.05 lx26-amd64
74 0.55500 Co2crypt2_ jbaltrus r 12/29/2008 15:34:26 4
----------------------------------------------------------------------------
al...@mr9540.local BIP 0/4 6.18 lx26-amd64 a
[jbaltrus@mr9540 ~]$

--- On Mon, 12/29/08, Dr. Hung-Sheng Tsao (LaoTsao) <Hung-Sh...@sun.com> wrote:

> From: Dr. Hung-Sheng Tsao (LaoTsao) <Hung-Sh...@sun.com>

> Subject: Re: [Rocks-Discuss] frontend slow
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>

Gus Correa

unread,
Dec 29, 2008, 7:23:45 PM12/29/08
to Discussion of Rocks Clusters
Hi Jonas, list

An SGE user or expert may help you better than me.
I use Torque/PBS instead.

However, from what you said,
it is clear that SGE is directing your jobs, or part of them,
to the frontend.

You may want to show your full SGE scripts for each of the two jobs.
With the info you sent,
I don't know what is the value of $NSLOTS, for instance.

On Torque/PBS the "machines" file where the job will run is generated
on the fly, depending on available resources, but can be accessed
through the environment variable $PBS_NODEFILE
from within the Torque/PBS script.
Most likely SGE has a similar mechanism.
Moreover, there is a configuration file that tells Torque/PBS how many
nodes you have in the cluster, and how many CPUs you have on each.
SGE should have a similar feature.
This is the master node file that you need to look for and get right.

BTW, did you type "1" (number one) while on top?
What is the output?

Jonas Baltrusaitis

unread,
Dec 29, 2008, 7:43:05 PM12/29/08
to Discussion of Rocks Clusters
here is the full script. the node file is configured properly, we just went thru it yesterday with Hung_Sheng


#!/bin/bash
#$ -N RhC20As4H32Cl_CO2


#$ -S /bin/bash
#MPI is also available. Simply substitute "mpi" for "mpich"
#$ -pe mpich 4
#$ -cwd

#$ -o RhC20As4H32Cl_CO2.out
#$ -e RhC20As4H32Cl_CO2.err
#$ -notify

/opt/mpich/gnu/bin/mpirun -np $NSLOTS -machinefile $TMPDIR/machines /share/apps/pcg71c/pcgamess -i /home/jbaltrus/RhC20As4H32Cl_CO2/RhC20As4H32Cl_CO2.inp -ex /share/apps/pcg71c -t /share/apps/scratch/tmp

--- On Mon, 12/29/08, Gus Correa <g...@ldeo.columbia.edu> wrote:

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 6:50:07 PM12/29/08
to Discussion of Rocks Clusters
it seems the pcgames job on frontend are not controlled by sge

may be do
pkill pcgames on frontend to kill them
what is the script(whole) that U used to submit the jobs?

Gus Correa

unread,
Dec 29, 2008, 7:53:39 PM12/29/08
to Discussion of Rocks Clusters
Hi Jonas, list

Any chances that you may have inadvertently launched a
PCGAMESS run on the frontend using mpirun directly, but not SGE?
If this is the case, I guess SGE won't know about that run,
and won't report it as a job in qstat (as you show below).

What is the output of
ps -u jbaltrus
on the frontend?

What is the output of
cluster-fork ' ps -u jbaltrus' ?

Gus Correa

Jonas Baltrusaitis

unread,
Dec 29, 2008, 7:57:01 PM12/29/08
to Discussion of Rocks Clusters
whne I type 1

top - 18:56:44 up 6 days, 5:14, 3 users, load average: 6.55, 6.22, 6.08
Tasks: 159 total, 7 running, 152 sleeping, 0 stopped, 0 zombie
Cpu0 : 27.3%us, 70.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.1%hi, 2.2%si, 0.0%st
Mem: 8124416k total, 5935216k used, 2189200k free, 355820k buffers
Swap: 4923868k total, 0k used, 4923868k free, 4372760k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

7537 jbaltrus 25 0 893m 198m 18m S 33.3 2.5 83:45.53 pcgamess
7535 jbaltrus 25 0 894m 209m 18m R 33.0 2.6 83:45.91 pcgamess
7536 jbaltrus 25 0 894m 198m 18m R 33.0 2.5 83:45.89 pcgamess

2175 root 10 -5 0 0 0 S 0.0 0.0 0:01.69 kjournald

2177 root 10 -5 0 0 0 S 0.0 0.0 0:00.36 kjournald
2825 root 12 -3 18356 712 512 S 0.0 0.0 0:00.01 auditd
2827 root 12 -3 16228 740 596 S 0.0 0.0 0:00.01 audispd

3009 root 18 0 170m 8976 2380 S 0.0 0.1 4:25.54 greceptor

3020 root 15 0 10088 772 596 S 0.0 0.0 0:00.30 syslogd
3023 root 15 0 3784 424 340 S 0.0 0.0 0:00.00 klogd
3036 root 18 0 10708 392 248 S 0.0 0.0 0:00.41 irqbalance
3085 rpc 15 0 8028 652 516 S 0.0 0.0 0:00.01 portmap

3100 sge 25 0 231m 4140 2684 S 0.0 0.1 0:08.65 sge_qmaster

3106 root 18 0 10120 768 636 S 0.0 0.0 0:00.00 rpc.statd
3120 sge 15 0 171m 2372 1760 S 0.0 0.0 0:00.90 sge_schedd
3149 root 15 0 48648 700 272 S 0.0 0.0 0:00.02 rpc.idmapd

3175 nobody 18 0 105m 1412 852 S 0.0 0.0 6:52.47 gmetad

3195 dbus 15 0 21376 876 556 S 0.0 0.0 0:00.84 dbus-daemon
3248 root 21 0 39892 1616 1204 S 0.0 0.0 0:00.05 automount
3267 root 18 0 3780 564 460 S 0.0 0.0 0:00.00 acpid
3367 root 15 0 142m 6416 3004 S 0.0 0.1 0:03.86 snmpd
3382 root 15 0 60528 1220 672 S 0.0 0.0 0:00.06 sshd

--- On Mon, 12/29/08, Gus Correa <g...@ldeo.columbia.edu> wrote:

> From: Gus Correa <g...@ldeo.columbia.edu>
> Subject: Re: [Rocks-Discuss] frontend slow
> To: "Discussion of Rocks Clusters" <npaci-rocks...@sdsc.edu>

Jonas Baltrusaitis

unread,
Dec 29, 2008, 7:59:41 PM12/29/08
to Discussion of Rocks Clusters
Actuall these things started happeningwhen I submitted pcgamess job to the headnode and then killed it. Could be that sge didn;t really kill it. below are the outputs

ps -u
[jbaltrus@mr9540 ~]$ ps -u jbaltrus
PID TTY TIME CMD
7018 ? 00:11:26 pcgamess
7180 ? 00:00:00 pcgamess
7184 ? 00:09:35 pcgamess
7353 ? 00:00:00 pcgamess
7355 ? 00:23:22 pcgamess
7517 ? 00:00:00 pcgamess
7518 ? 00:00:00 pcgamess
7520 ? 00:00:00 pcgamess
7521 ? 00:00:00 pcgamess
7535 ? 01:24:06 pcgamess
7536 ? 01:24:06 pcgamess
7537 ? 01:24:05 pcgamess
11323 ? 00:00:00 sshd
11324 ? 00:00:00 sftp-server
11377 pts/3 00:00:00 bash
11659 pts/3 00:00:00 ps
[jbaltrus@mr9540 ~]$

clutser fork

[jbaltrus@mr9540 ~]$ cluster-fork ' ps -u jbaltrus'
compute-0-0:
PID TTY TIME CMD
3784 ? 00:00:00 bash
3785 ? 00:00:00 mpirun
3925 ? 02:35:16 pcgamess
3926 ? 00:00:00 pcgamess
3927 ? 00:00:00 ssh
3930 ? 00:00:00 sshd
3931 ? 02:36:53 pcgamess
4084 ? 00:00:00 ssh
4085 ? 00:00:00 pcgamess
4088 ? 00:00:00 sshd
4089 ? 02:36:50 pcgamess
4242 ? 00:00:00 ssh
4243 ? 00:00:00 pcgamess
4246 ? 00:00:00 sshd
4247 ? 02:36:08 pcgamess
4400 ? 00:00:00 pcgamess
4401 ? 00:00:00 pcgamess
4402 ? 00:00:00 pcgamess
4403 ? 00:00:00 pcgamess
4404 ? 00:00:00 pcgamess
4429 ? 00:00:09 pcgamess
4430 ? 00:00:47 pcgamess
4431 ? 00:00:09 pcgamess
4432 ? 00:00:09 pcgamess
4533 ? 00:00:00 sshd
4534 ? 00:00:00 ps
compute-0-1:
PID TTY TIME CMD
3169 ? 00:00:00 bash
3170 ? 00:00:00 mpirun
3313 ? 03:21:32 pcgamess
3314 ? 00:00:00 pcgamess
3315 ? 00:00:00 ssh
3318 ? 00:00:00 sshd
3319 ? 03:23:39 pcgamess
3472 ? 00:00:00 pcgamess
3473 ? 00:00:00 ssh
3476 ? 00:00:00 sshd
3477 ? 03:23:41 pcgamess
3630 ? 00:00:00 ssh
3631 ? 00:00:00 pcgamess
3634 ? 00:00:00 sshd
3635 ? 03:22:23 pcgamess
3788 ? 00:00:00 pcgamess
3789 ? 00:00:00 pcgamess
3790 ? 00:00:00 pcgamess
3791 ? 00:00:00 pcgamess
3792 ? 00:00:00 pcgamess
3814 ? 00:00:51 pcgamess
3815 ? 00:00:10 pcgamess
3816 ? 00:00:11 pcgamess
3817 ? 00:00:10 pcgamess
3944 ? 00:00:00 sshd
3945 ? 00:00:00 ps

[jbaltrus@mr9540 ~]$


--- On Mon, 12/29/08, Gus Correa <g...@ldeo.columbia.edu> wrote:

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 6:59:40 PM12/29/08
to Discussion of Rocks Clusters
look fine
please delete the -o and -e so the output will use the Name (-N) and
<JOBID> for error and output
otherwise the output and error will be over written by different run

--

Gus Correa

unread,
Dec 29, 2008, 8:15:44 PM12/29/08
to Discussion of Rocks Clusters
Hi Jonas, list

You clearly have a large number of pcgamess processes on the frontend
and on the compute nodes.
Most likely a number of them are just leftovers of previous runs,
which were not killed properly.
12 on the frontend, 16 on compute-0-0, another 16 on compute-0-1.
Note how different the PID numbers and the process TIMEs are!

You may be able to kill some using SGE (qdel or equivalent).
However, after you do this, you need to check for what was
left, and kill them by hand, using kill -9 $PID (whatever the PID is),
both on the frontend and the nodes.

An alternative is to reboot all machines.
Then, start fresh! :)

The clean way to submit jobs is to use only your resource manager
(SGE qsub in your case), not mpirun directly.
The clean way to kill jobs is to use only your resource manager
(SGE qdel in your case), not kill -9 or things the like.

I hope this helps,

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 7:20:45 PM12/29/08
to Discussion of Rocks Clusters
U have many many pcgamess job not controlled by sge
the frontend has 12 pcgames that each need 800GB vmem
and compute nodes0-0 ,0-1 has 16 jobs
qstat only show 4/4, 4/4 and 0/4

Jonas Baltrusaitis

unread,
Dec 29, 2008, 8:27:44 PM12/29/08
to Discussion of Rocks Clusters
thank you both so much. however, I only use sge to submit and/or kill jobs with the below mentioned script? where did the leftovers come from?

Jonas Baltrusaitis

unread,
Dec 29, 2008, 8:31:50 PM12/29/08
to Discussion of Rocks Clusters
I'll reiterate: I only use sge to submit kill the jobs. any idea where those other came from?

Gus Correa

unread,
Dec 29, 2008, 8:39:09 PM12/29/08
to Discussion of Rocks Clusters
Hi Jonas, list

Jonas Baltrusaitis wrote:
> thank you both so much. however, I only use sge to submit and/or kill jobs with the below mentioned script? where did the leftovers come from?
>
>

Hard to tell.
Did you restart the SGE daemon along the way, by any means?
I don't know anything about SGE, I don't use it.
Dr. Tsao may have a better guess.

However, the fix is to start fresh.
It may be easier to
just reboot all machines (clean, with shutdown -r), and give it another try.

Gus Correa

Gus Correa

unread,
Dec 29, 2008, 8:45:22 PM12/29/08
to Discussion of Rocks Clusters
Hi Jonas, list

Here is another line of though.

Jonas Baltrusaitis wrote:
> whne I type 1
>
> top - 18:56:44 up 6 days, 5:14, 3 users, load average: 6.55, 6.22, 6.08
> Tasks: 159 total, 7 running, 152 sleeping, 0 stopped, 0 zombie
> Cpu0 : 27.3%us, 70.4%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.1%hi, 2.2%si, 0.0%st
>

That is weird.
If you have 8 cores on the frontend, after you type "1",
top should have shown 8 lines, one for each core/CPU:
Cpu0 ...
Cpu1 ...
...
Cpu7 ...

However, it only shows one, for Cpu0.

What is the output of "uname -a" on the frontend?

What is the output of "cat /proc/cpuinfo" on the frontend?

Gus Correa

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 7:41:35 PM12/29/08
to Discussion of Rocks Clusters
the <PE> orte and mpich has tight integration with sge
<PE> mpi has loosely integration with sge.

my thinking is that U may have use -pe mpi before,
qdel would not delete all the mpi jobs because of loosely integration

Gus Correa

unread,
Dec 29, 2008, 8:50:02 PM12/29/08
to Discussion of Rocks Clusters
Hi Jonas, list

Jonas Baltrusaitis wrote:
> I'll reiterate: I only use sge to submit kill the jobs. any idea where those other came from?
>
>

I'll reiterate: we don't know. :)
Bear with us, let's try to find out what is going wrong.

Gus Correa

Jonas Baltrusaitis

unread,
Dec 29, 2008, 8:50:32 PM12/29/08
to Discussion of Rocks Clusters
[jbaltrus@mr9540 ~]$ uname -a
Linux mr9540.healthcare.uiowa.edu 2.6.18-92.1.13.el5xen #1 SMP Wed Sep 24 20:01:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[jbaltrus@mr9540 ~]$

[jbaltrus@mr9540 ~]$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 10
cpu MHz : 1995.025
cache size : 6144 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 lahf_lm
bogomips : 4989.34
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

[jbaltrus@mr9540 ~]$

Jonas Baltrusaitis

unread,
Dec 29, 2008, 8:51:40 PM12/29/08
to Discussion of Rocks Clusters
wth, CPU core 1? Should be 8... It is 8 (2x4)...

Jonas Baltrusaitis

unread,
Dec 29, 2008, 8:52:38 PM12/29/08
to Discussion of Rocks Clusters
I might have tried mpi before I tried mpich...

Jonas Baltrusaitis

unread,
Dec 29, 2008, 8:53:19 PM12/29/08
to Discussion of Rocks Clusters
No, bear with me. I love getting help

Gus Correa

unread,
Dec 29, 2008, 9:16:52 PM12/29/08
to Discussion of Rocks Clusters
Hi Jonas, list.

Hmmm ....
Are you sure the frontend is a dual-socket quad-core (total 8-cores)
machine?
It seems to have an SMP kernel,
but cpuinfo only reports one processor with a single core.
I expected to see 8 sections on this report, one for each of
processor0, ...., processor7 (i.e. the 8 cores).

Also, I am not familiar to the latest kernel naming convention,
but yours is called 2.6.18-92.1.13.el5xen.
Did you install the xen roll?
I never used it, but I wonder if it would hide the actual cores,
to play with virtualization.

What is the output of "uname -a" and of "cat /proc/cpuinfo" on compute-0-0?
(Just to compare with the frontend results.)

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 9:07:48 PM12/29/08
to Discussion of Rocks Clusters
we are into the detail of pc-gamess that I know little

from what I am reading on the web, one need to check Ur inp file U used

there are some parms that tell pcgamess to use more than one CPU to run.

please share Ur inp file

Jonas Baltrusaitis

unread,
Dec 29, 2008, 9:49:00 PM12/29/08
to Discussion of Rocks Clusters
with pleasure. I guess the main options to pay attention to are $p2p p2p=.t. dlb=.t. $end and kdiag=0. I guess those are described here http://classic.chem.msu.su/gran/gamess/index.html

$contrl scftyp=UHF runtyp=optimize icharg=4 mult=11 maxit=300 ECP=read $end
$contrl dfttyp=b3lyp nzvar=0 $end
$system timlim=999999999 MWORDS=100 kdiag=0 $end
!$basis gbasis=N31 ngauss=6 ndfunc=1 $end
$SCF DIRSCF=.T. FDIFF=.t. NPUNCH=0 $END
$ZMAT DLC=.T. AUTO=.T. $END
$STATPT NSTEP=200 OPTTOL=0.0005 NPRT=-2 NPUN=-2 HSSEND=.t. $END
$p2p p2p=.t. dlb=.t. $end
$GUESS guess=huckel kdiag=0 $END
$CONTRL COORD=UNIQUE $END
$DATA
Co2cryptate2 + CO2 b3lyp/ lanl2dz on Co and 6-31G* on others
C1
COBALT 27.0 0.0440040000 0.0268240000 -0.0453410000

Jonas Baltrusaitis

unread,
Dec 29, 2008, 9:52:19 PM12/29/08
to Discussion of Rocks Clusters
Yep, I am absolutely positively sure that it's dual processor 4 cores each, I have assembled it myself and on several instances I've seen them all identified (I guess torque did that)

Compute nodes show 4 core processors as normal

[jbaltrus@compute-0-0 ~]$ uname -a
Linux compute-0-0.local 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:32:05 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[jbaltrus@compute-0-0 ~]$ cat /proc/cpuinfo


processor : 0
vendor_id : GenuineIntel
cpu family : 6

model : 15
model name : Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz
stepping : 11
cpu MHz : 2667.000
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 5323.69


clflush size : 64
cache_alignment : 64

address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1


vendor_id : GenuineIntel
cpu family : 6

model : 15
model name : Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz
stepping : 11
cpu MHz : 2667.000
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 5319.88


clflush size : 64
cache_alignment : 64

address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 2


vendor_id : GenuineIntel
cpu family : 6

model : 15
model name : Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz
stepping : 11
cpu MHz : 2667.000
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 5320.00


clflush size : 64
cache_alignment : 64

address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 3


vendor_id : GenuineIntel
cpu family : 6

model : 15
model name : Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz
stepping : 11
cpu MHz : 2667.000
cache size : 4096 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips : 5319.85


clflush size : 64
cache_alignment : 64

address sizes : 36 bits physical, 48 bits virtual
power management:


Jonas Baltrusaitis

unread,
Dec 29, 2008, 9:54:09 PM12/29/08
to Discussion of Rocks Clusters
How should I modify the submission script so after the jobs finishes it copies some files from scratch to the /home/$USER/job and deletes scratch files/folders (pcgamess creates one folder for each processor run so I always have to manually delete 4 folders in scratch which is very tedious)

thanks

JOnas

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 10:30:04 PM12/29/08
to Discussion of Rocks Clusters
-t /share/apps/scratch/tmp
change to -t /share/apps/scratch/$USER/$JOB_ID
(may need to create it before the mpirun line)
after the mpirun line
cp some files to $HOME/$JOB_NAME
rm -rf /share/apps/scratch/$USER/$JOB_ID/

do man qsub to see the environment variable available to U.
U can put some statements e.g.
echo JOB_ID=$JOB_ID
echo JOB_HOME=$JOB_HOME
etc to find out the real value for each run

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 10:41:14 PM12/29/08
to Discussion of Rocks Clusters
please see this link
http://licht.ims.ac.jp/lab/gmssub
that may help U to create script for pcgamess

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 29, 2008, 11:30:59 PM12/29/08
to Discussion of Rocks Clusters
for mpich
try mpirun -nolocal option
so it would not start mpi-id=0 on the local (master)

Gus Correa

unread,
Dec 30, 2008, 1:31:23 PM12/30/08
to Discussion of Rocks Clusters
Hi Jonas, list

1) First and foremost, by all means, follow Dr. Tsao's advice on how to
setup PCGAMESS to run under SGE.
In particular, avoid using the master/frontend for now, as he suggested.

2) It is clear that you have different kernels
on the frontend (2.6.18-92.1.13.el5xen #1 SMP)
and on the compute nodes (2.6.18-92.1.13.el5 #1 SMP).
The former has xen, the latter doesn't.

3) Since the compute nodes report the correct number of CPUs/cores (4),
but the frontend reports only one CPU/core instead of the right number (8),
I suspect the xen kernel is hiding the actual number of cores,
to play virtualization.
This is a guess, admittedly, but founded on the evidence you sent us.

4) You can get another piece of evidence if you submit a 4-processor
PCGAMESS job to compute-0-0 via SGE, login to compute-0-0, do "top" there,
and type "1" (number one) within top.
I expect this will report all 4 CPUs, and there won't be any sharing
(not 25% for each), but around 99-100% CPU activity on each pcgamess
process.
This is in contrast to what you saw on the frontend.
Please, do the experiment, and send the result.

5) Why and how you got to this state of affairs,
with different kernels on the frontend and on the nodes,
is a mystery that only you, if anybody, can tell.
You may have installed more the xen roll on the frontend,
after you installed the compute nodes, for instance.

6) Do you need xen?
If your main goal is to run parallel jobs in computational Chemistry
using MPI,
I would say your business is not virtualization, and you don't need xen.
Actually, you may really want to stay away of it,
if standard parallel computing is all you want.
(Search the list archives for different opinions about this.)

7) There may be a way to fix the frontend, but I don't know how to.
I don't use xen, never installed it.
You may want to start a new thread with this question, i.e.,
how to remove xen from your frontend, or at least how to make it report
and use the correct number of physical CPUs/cores you have even if xen
is kept.

8) I am not sure this is feasible,
but I am reluctant to tell you to reinstall the cluster from scratch,
without xen.
(Reinstalling would give you a homogeneous cluster, though.)
The Rocks developers and other list subscribers experienced with xen may
advise you better.

I hope this helps.

Gus Correa

---------------------------------------------------------------------
Gustavo Correa, PhD - Email: g...@ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
---------------------------------------------------------------------

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 30, 2008, 2:29:32 PM12/30/08
to Discussion of Rocks Clusters

May be I find a solution for U
issue:
vm vcpu-set 0 8
then
cat /proc/cpuinfo
U should see 8 cpu now

May be this is a bug or feature in this rocks5-1 xen build
One can understand that if U are using xen to build domU then U donot
want dom0 to use all CPU by assign it only vcpu 1
But If U donot want to use xen and vm for any other domU then it is
better to have dom0 present to U all the vcpu:-)
IMHO, even if one choose the xen roll, one need grub/menu.lst to provide
one to choose not to boot into dom0

I change the title a bit
happy holiday

-------------- next part --------------
A non-text attachment was scrubbed...
Name: hung-sheng_tsao.vcf
Type: text/x-vcard
Size: 366 bytes
Desc: not available
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081230/6832e96a/hung-sheng_tsao.vcf

Jonas Baltrusaitis

unread,
Dec 30, 2008, 2:35:06 PM12/30/08
to Discussion of Rocks Clusters
With immense help from Gus and Dr. Tsao I will start a new thread: namely, how to remove xen from your frontend. I don't know what it is and why it masks the number of processors. All I did was to download all the rolls and install them, no more no less. I assume xen was an option for install and I picked it. Since it messes up the total count of my frontend processors, how do I remove it and have a homogeneous cluster (see below)? My main goal is to run parallel jobs in computational
Chemistry using MPI, as gus said... The less problems the better

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 30, 2008, 3:00:32 PM12/30/08
to Discussion of Rocks Clusters
see my recent email
xm vcpu-set 0 8

-------------- next part --------------
A non-text attachment was scrubbed...
Name: hung-sheng_tsao.vcf
Type: text/x-vcard
Size: 366 bytes
Desc: not available

Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20081230/143b5201/hung-sheng_tsao.vcf

Jonas Baltrusaitis

unread,
Dec 30, 2008, 4:31:22 PM12/30/08
to Discussion of Rocks Clusters
[root@mr9540 ~]# vm vcpu-set 0 8
-bash: vm: command not found

--- On Tue, 12/30/08, Dr. Hung-Sheng Tsao (LaoTsao) <Hung-Sh...@sun.com> wrote:

> From: Dr. Hung-Sheng Tsao (LaoTsao) <Hung-Sh...@sun.com>

Jonas Baltrusaitis

unread,
Dec 30, 2008, 4:34:55 PM12/30/08
to Discussion of Rocks Clusters
still, something is not quite right. It sees them as a separate processor, but not separate cores, e.g. processors 0 to 7 whereas cores 1

[root@mr9540 ~]# xm vcpu-set 0 8
[root@mr9540 ~]# cat /proc/cpuinfo


processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 10
cpu MHz : 1995.025
cache size : 6144 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 lahf_lm
bogomips : 4989.34
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 1


vendor_id : GenuineIntel
cpu family : 6

model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 10
cpu MHz : 1995.025
cache size : 6144 KB

physical id : 1


siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 lahf_lm
bogomips : 4989.34
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 2


vendor_id : GenuineIntel
cpu family : 6

model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 10
cpu MHz : 1995.025
cache size : 6144 KB

physical id : 2


siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 lahf_lm
bogomips : 4989.34
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 3


vendor_id : GenuineIntel
cpu family : 6

model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 10
cpu MHz : 1995.025
cache size : 6144 KB

physical id : 3


siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 lahf_lm
bogomips : 4989.34
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 4


vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 10
cpu MHz : 1995.025
cache size : 6144 KB

physical id : 4


siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 lahf_lm
bogomips : 4989.34
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 5


vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 10
cpu MHz : 1995.025
cache size : 6144 KB

physical id : 5


siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 lahf_lm
bogomips : 4989.34
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 6


vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 10
cpu MHz : 1995.025
cache size : 6144 KB

physical id : 6


siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 lahf_lm
bogomips : 4989.34
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 7


vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
stepping : 10
cpu MHz : 1995.025
cache size : 6144 KB

physical id : 7


siblings : 1
core id : 0
cpu cores : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni tm2 lahf_lm
bogomips : 4989.34
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

--- On Tue, 12/30/08, Dr. Hung-Sheng Tsao (LaoTsao) <Hung-Sh...@sun.com> wrote:

> From: Dr. Hung-Sheng Tsao (LaoTsao) <Hung-Sh...@sun.com>

Jonas Baltrusaitis

unread,
Dec 30, 2008, 4:33:18 PM12/30/08
to Discussion of Rocks Clusters
disregard my previous email. xm command seem to work, I see 8 processor with cat. I'll try to run some jobs alter today to see how they run


--- On Tue, 12/30/08, Dr. Hung-Sheng Tsao (LaoTsao) <Hung-Sh...@sun.com> wrote:

> From: Dr. Hung-Sheng Tsao (LaoTsao) <Hung-Sh...@sun.com>

Dr. Hung-Sheng Tsao (LaoTsao)

unread,
Dec 30, 2008, 4:46:38 PM12/30/08
to Discussion of Rocks Clusters
It is loading more messages.
0 new messages