[Rocks-Discuss] More troubles with Linpack

1,122 views
Skip to first unread message

Daniel Davidson

unread,
Jun 10, 2008, 5:55:14 PM6/10/08
to Discussion of Rocks Clusters
Finally have linpack compiled and working, but I am having problems with
it still.

As I understand in HPL.dat Ps and Qs should multiply to be the number of
processors used. When I do anything other than 1 for each I get errors
like:

[danield@xxxxxxxx ~]$ /usr/bin/mpirun -np 2 -machinefile
machines /home/danield/xhpl
HPL ERROR from process # 0, on line 419 of function HPL_pdinfo:
>>> Need at least 2 processes for these tests <<<

HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <<<


The real odd thing is if I set both to 1, it will run and use additional
processors on different nodes. Can anyone clue me in on this.

thanks,

Dan

Tim Carlson

unread,
Jun 10, 2008, 6:45:51 PM6/10/08
to Discussion of Rocks Clusters
On Tue, 10 Jun 2008, Daniel Davidson wrote:

The HPL.dat file is extremely sensitive to errors. What are the contents
of your "machines" file, and go ahead and paste in your entire HPL.dat
file so we can take a closer look.

Tim

Daniel Davidson

unread,
Jun 11, 2008, 9:18:01 AM6/11/08
to Discussion of Rocks Clusters
Here you go, the HPL.dat is the same I used years ago on my old cluster:
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
1000 Ns
1 # of NBs
64 NBs
1 # of process grids (P x Q)
2 Ps
2 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
8 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
80 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)

Machines contains names of the first two two 4 core systems.
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-1
compute-0-1
compute-0-1
compute-0-1
compute-0-1
compute-0-1
compute-0-1
compute-0-1

thanks,

Dan

Tim Carlson

unread,
Jun 11, 2008, 11:33:29 AM6/11/08
to Discussion of Rocks Clusters
On Wed, 11 Jun 2008, Daniel Davidson wrote:

Your HPL.dat file shows you wanting to use 4 processors. P=2 and Q=2 means
your grid is 2x2 giving a total of 4 processors. Your mpirun command is
only using 2. Have you tried this with

mpirun -np 4

Have I also mentioned that the errors out of xhpl can be misleading? ;)
Your error says you need 2 cpus when in fact you need 4.

Tim

Daniel Davidson

unread,
Jun 11, 2008, 12:53:31 PM6/11/08
to Discussion of Rocks Clusters
The np and p x q multiple have been the same, I just changed the value
of q (from 1 to 2) between the messages. Changing as specified does not
resolve the problem.

Any other ideas?

Dan

Jeff Pummill

unread,
Jun 11, 2008, 1:32:01 PM6/11/08
to Discussion of Rocks Clusters
Dan,

Just a hunch here...

Check your ages old HPL.dat against the one that Rocks installed. I
think at some point in time, the ordering of the file changed. Do a diff
between the two and see if this might be the problem in your case. If
so, you'll just have to set the parameters in the new HPL.dat to be the
appropriate values.


Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
//

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20080611/4db24e5b/attachment.html

Tim Carlson

unread,
Jun 11, 2008, 2:51:47 PM6/11/08
to Discussion of Rocks Clusters
On Wed, 11 Jun 2008, Daniel Davidson wrote:

Can you run the "cpi" example with mpirun? I cut and paste your HPL.dat
file and gave it a run on one of my clusters and had no problems. So for
some reason it seems to me that something is odd with your mpirun. The
"cpi" example would nicely display the list of machines it uses.

Tim

Reply all
Reply to author
Forward
0 new messages