As I understand in HPL.dat Ps and Qs should multiply to be the number of
processors used. When I do anything other than 1 for each I get errors
like:
[danield@xxxxxxxx ~]$ /usr/bin/mpirun -np 2 -machinefile
machines /home/danield/xhpl
HPL ERROR from process # 0, on line 419 of function HPL_pdinfo:
>>> Need at least 2 processes for these tests <<<
HPL ERROR from process # 0, on line 621 of function HPL_pdinfo:
>>> Illegal input in file HPL.dat. Exiting ... <<<
The real odd thing is if I set both to 1, it will run and use additional
processors on different nodes. Can anyone clue me in on this.
thanks,
Dan
The HPL.dat file is extremely sensitive to errors. What are the contents
of your "machines" file, and go ahead and paste in your entire HPL.dat
file so we can take a closer look.
Tim
Machines contains names of the first two two 4 core systems.
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-1
compute-0-1
compute-0-1
compute-0-1
compute-0-1
compute-0-1
compute-0-1
compute-0-1
thanks,
Dan
Your HPL.dat file shows you wanting to use 4 processors. P=2 and Q=2 means
your grid is 2x2 giving a total of 4 processors. Your mpirun command is
only using 2. Have you tried this with
mpirun -np 4
Have I also mentioned that the errors out of xhpl can be misleading? ;)
Your error says you need 2 cpus when in fact you need 4.
Tim
Any other ideas?
Dan
Just a hunch here...
Check your ages old HPL.dat against the one that Rocks installed. I
think at some point in time, the ordering of the file changed. Do a diff
between the two and see if this might be the problem in your case. If
so, you'll just have to set the parameters in the new HPL.dat to be the
appropriate values.
Jeff F. Pummill
Senior Linux Cluster Administrator
University of Arkansas
//
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20080611/4db24e5b/attachment.html
Can you run the "cpi" example with mpirun? I cut and paste your HPL.dat
file and gave it a run on one of my clusters and had no problems. So for
some reason it seems to me that something is odd with your mpirun. The
"cpi" example would nicely display the list of machines it uses.
Tim