[Rocks-Discuss] Problem about HPL

44 views
Skip to first unread message

Nkjt

unread,
Jun 4, 2009, 4:24:26 AM6/4/09
to npaci-rocks...@sdsc.edu

Hello,All!
I have configured a cluster which contains 1 master and 2 nodes with Intel CPU Q8200.I want to test it with HPL+GOTOBLAS,but there is something strange.
I successfully compiled the GOTOBLAS and HPL, and they can works. When I execute the command mpirun -np 1 -machinefile ./machines ./xhpl, it runs well and I got a high performance(29.78GFLOPS).But when I changed the arguments(Ps or Qs or both,i.e 2, 3) of HPL.dat and mpirun -np 2 -manchinefile
./machines ./xhpl, I got the bad performance(about 10e-2 GFLOPS). I don't know how to handle this problem.
Attached is the machinefile and HPL.dat.Actually, I don't know the meaning of the arguments after 13 lines, so I always keep it default. Could anyone help me ?

-machines
compute-0-0
compute-0-1
cluster

-HPL.dat

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
15000 Ns
1 # of NBs
256 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
2 Ps
1 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
2 # of recursive stopping criterium
2 4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
3 # of recursive panel fact.
0 1 2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
0 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
0 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
_________________________________________________________________
Messenger安全保护中心,免费修复系统漏洞,保护Messenger安全!
http://im.live.cn/safe/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090604/c1870b6b/attachment.html

Gowtham

unread,
Jun 4, 2009, 6:47:22 AM6/4/09
to Discussion of Rocks Clusters

Hi,

The parameters used in HPL.dat are explained here
in detail:

http://www.netlib.org/benchmark/hpl/tuning.html

If, for some reason, you need to recompile your
HPL, you may use the instructions in

http://sgowtham.net/blog/2007/07/02/hpl-benchmark-for-single-processor-machines/

Hope this helps,

Best,
gowtham

--
Gowtham
Department of Physics
Michigan Techn University
Houghton, MI

http://sgowtham.net/

Peter Kjellstrom

unread,
Jun 4, 2009, 10:16:43 AM6/4/09
to npaci-rocks...@sdsc.edu
On Thursday 04 June 2009, Nkjt wrote:
> Hello,All!
> I have configured a cluster which contains 1 master and 2 nodes with
> Intel CPU Q8200.I want to test it with HPL+GOTOBLAS,but there is something
> strange. I successfully compiled the GOTOBLAS and HPL, and they can works.
> When I execute the command mpirun -np 1 -machinefile ./machines ./xhpl, it
> runs well and I got a high performance(29.78GFLOPS).

This number hints at what's going on here. You have a threaded gotoblas.

> But when I changed the
> arguments(Ps or Qs or both,i.e 2, 3) of HPL.dat and mpirun -np 2
> -manchinefile ./machines ./xhpl, I got the bad performance

...which then over allocates your cores.

Try setting the number of threads per MPI-rank to 1 before mpirun:
$ export OMP_NUM_THREADS=1
$ mpirun -np X (X > 1...) ...

Also monitor what's actually running on your two nodes with top.

/Peter

> (about 10e-2
> GFLOPS). I don't know how to handle this problem. Attached is the
> machinefile and HPL.dat.Actually, I don't know the meaning of the arguments
> after 13 lines, so I always keep it default. Could anyone help me ?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
Url : https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20090604/1a56a535/attachment.bin

Jeff Haferman

unread,
Jun 4, 2009, 10:58:35 AM6/4/09
to npaci-rocks...@sdsc.edu
I dug up my notes on this... I sent an email to Goto about the threaded
GotoBLAS a while back and he told me to set HAS_SMP=0 in the "detect"
script in order to run un-threaded...

Seems to me that I also set OMP_NUM_THREADS and GOTO_NUM_THREADS to 1
and pass via "mpirun -x ..."

Getting the HPL parameters set correctly to maximize your teraflops
rating is a bit of an art... here are a couple of links that I found
handy, though I continued to tweak beyond this... you want to ratchet
some of your parameters up and monitor memory usage to the point where
you're using all available memory without paging...

http://www.advancedclustering.com/faq/how-do-i-tune-my-hpldat-file.html
http://www.intel.com/support/performancetools/sb/CS-025964.htm

Gus Correa

unread,
Jun 4, 2009, 11:52:41 AM6/4/09
to Discussion of Rocks Clusters
Hello Nkjt
(A name with no vowels! :) ... Oh! our times or email anonymity .. )

For the meaning of the HPL.dat parameters,
see the "TUNING" file that comes with HPL,
or the online documentation on the netlib HPL web pages:
http://www.netlib.org/benchmark/hpl/tuning.html

The FAQs may also help:
http://www.netlib.org/benchmark/hpl/faqs.html

Performance is highly dependent on N, NB, and (P,Q).

I hope this helps.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Reply all
Reply to author
Forward
0 new messages