[Rocks-Discuss] HPCC Benchmark customization

33 views
Skip to first unread message

Mario Benitez

unread,
Mar 20, 2012, 7:29:49 PM3/20/12
to Linux Rocks list

Hi guys,

I'm trying to figure it out how hpccinf.txt should be set. I have 8 nodes and 1 front end, each with 16GB RAM & one i7 processor. /proc/cpuinfo displays for each processor:

processor : 0-7
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
stepping : 7
cpu MHz : 3701.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 3
cpu cores : 4
apicid : 7
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx rdtscp lm constant_tsc ida nonstop_tsc arat pni monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr sse4_1 sse4_2 popcnt lahf_lm
bogomips : 6820.08
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management: [8]

I tryed N, P & Q with the following values.

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
8 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
2828 Ns
1 # of NBs
80 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
8 Ps
8 Qs
16.0 threshold
1 # of panel fact
2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0 Number of additional problem sizes for PTRANS
1200 10000 30000 values of N
0 number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64 values of NB

So I don't know if these values are correct and neither if any other value should be replaced, any hint?

Thanx in advance.

Marinho.-


-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120320/d9ecaff0/attachment.html

Gowtham

unread,
Mar 20, 2012, 7:50:56 PM3/20/12
to Discussion of Rocks Clusters

Front end is ignored in my estimates below:


PART #01:

TOTAL_MEMORY_BYTES = 8 * 16 * (1024 * 1024 * 1024)
N = 0.80 * SQRT(TOTAL_MEMORY_BYTES/8)

N represents the matrix size


PART #02:

You have 8 nodes with 4 cores each, for a total
of 32 cores. Suggested value for P & Q would be

P 4
Q 8

If I remember correctly, HPL documentation
suggests keeping P & Q about the same, with Q
slightly higher than P


PART #03:

For NB, try 256 (or 512)


PART #04:

RMAX (Theory) = 8 nodes *
4 cores/node *
3.40 G cycles/second *
4 operations/cycle
= 435.2 GFLOPS

Compare this number with the output of
HPCC benchmark


Hope this helps,
g

--
Gowtham
Information Technology Services
Michigan Technological University

(906) 487/3593
http://www.it.mtu.edu/

Mario Benitez

unread,
Mar 22, 2012, 3:54:28 PM3/22/12
to Linux Rocks list

Hi Gowtham, thanx a lot once more.

Mi hpccinf file is (partially) as follows:

----


HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
8 device out (6=stdout,7=stderr,file)

3 # of problems sizes (N)
65536 13107 104857 Ns
3 # of NBs
128 256 512 NBs


0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)

4 Ps
8 Qs
--

As specified in the links you sent some days ago:

N = m * sqrt((8 * 16 * 1024 * 1024 * 1024) / 8)
where m: .50 (.10) .80
this is why Ns is 65536 13107 104857

And 128, 256 & 512 for NB

And the last is to call to the benchmark:

# mpirun -np32 ./hpcc


I don't know if my calculations are correct, are they?. Thanx in advance.

Marinho.-

> Date: Tue, 20 Mar 2012 19:50:56 -0400
> From: g...@mtu.edu
> To: npaci-rocks...@sdsc.edu
> Subject: Re: [Rocks-Discuss] HPCC Benchmark customization

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120322/ccb861ce/attachment.html

sgow...@mtu.edu

unread,
Mar 22, 2012, 4:47:30 PM3/22/12
to Discussion of Rocks Clusters
I'm assuming you calculated the values correctly for m = 0.5 and m = 0.8; just a note, though:

m = 0.50 (0.10) 0.80

is a short mathematical notation that means

m takes values from 0.50 through 0.80, in steps of 0.10

So, you will have 4 problems instead of 3 and you will need to appropriately change the matrix sizes in your hpccinf.txt

I'm assuming you are using SGE queuing system - if so, your mpirun command will look like

mpirun -n 32 -machine $TMP/machines ./hpcc

Here, $TMP/machines is the list of hosts where your job will run and this is generated automatically by SGE.

Best regards,
g

--
Gowtham
Information Technology Services
Michigan Technological University

(906) 487/3593

Mario Benitez

unread,
Mar 22, 2012, 8:18:52 PM3/22/12
to Linux Rocks list

Hi Gowtham,

Fixed (at least to run). hpccinf.txt has the following values:

...
4 # of problems sizes (N)
65536 78643 91750 104857 Ns
1 # of NBs
256 NBs


0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
4 Ps
8 Qs

16.0 threshold
...

Once I run the test using '# mpirun -n32 -f cluster_nodes ./hpcc' show the following message after some seconds.

...
The following parameter values will be used:

N : 14560
NB : 80
PMAP : Column-major process mapping
P : 4
Q : 8
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
...

then lots of test info is displayed. I would ask if the output file should exist (or empty at least) because when I deleted it just before the second test, the result was not stored in the file.

Finally, it seems the cluster nodes are processing something. So an immediate task is to know what Ganglia displays to interpret benchmark test, isn't it?, or how can I know benchmark results?

Thanx a lot.

Marinho.-

URL: https://lists.sdsc.edu/pipermail/npaci-rocks-discussion/attachments/20120323/b6c3d51e/attachment.html

Reply all
Reply to author
Forward
0 new messages