pbdDMAT runs too slowly

1 view
Skip to first unread message

Dale Wang

unread,
Sep 22, 2014, 4:49:02 AM9/22/14
to rbigdatap...@googlegroups.com
Hello everyone~
      These days I try the r-pbd project and after a long time seeking, this is what I want. But I find that there may be something wrong because the performance of distributed matrix multiplication is so poor.  Could you help me find where is the problem? If the performance is such poor, the package is not practical.

      The program is a simple distributed matrix multiplication. see matprod.r in attachment.

### SHELL> mpiexec -np 2 Rscript --vanilla [...].r

# Initialize process grid
library
(pbdDMAT, quiet=F)

args
<- commandArgs()

#print(args)
comm
.print(args[6])
comm
.print(args[7])
# global matrix size
size
<- as.numeric(args[6])
# block factor of that matrix
blockfactor
<- as.numeric(args[7])

init
.grid()

# Generate a random matrix common to all processes and distribute it.
# This approach should only be used while learning the pbdDMAT package.
comm
.set.seed(123,diff=TRUE)
# The matrix is a square matrix
dx
<- ddmatrix("rnorm", mean=0,sd=10,ncol=size,nrow=size,bldim=blockfactor)
# Get the summary info
print(dx)
barrier
()
t1
<- proc.time()[3]
# Do a self multiplication
dx
%*% dx
barrier
()
t2
<- proc.time()[3]
# Print the time of multiplication
comm
.print(t2 - t1, all.rank=T)

# Finish
finalize
()



 Then let us run the code on a single node with a single process (to compare the performance with a serial matrix multiplication program).

I run the program with the following command:

./runtest.sh singlehost 1 matprod.r "10000 10000"
 
which equals to the following command:

/usr/lib64/openmpi/bin/mpirun -mca btl_tcp_if_include eth0,lo -mca btl_tcp_disable_family 6 -hostfile singlehost -np 1 /home/hadoop/bin/R-3.1.1/lib64/R/bin/Rscript ./matprod.r 10000 10000

The singlehost hostfile contains only one machine.

Under this configuration, in the program, the matrix dx will be 10,000 * 10,000, and each block is also 10,000 * 10,000 which means that there will be only one block in the distributed matrix. Then the distributed matrix multiplication falls back to the one-block serial matrix mulitplicaiton when I call dx %*% dx.

The problem comes: the performance of distributed and serial version matrix multiplication differs apparently.
On our machines, the serial two 10,000 * 10,000 matrices multiplication costs about 230 seconds while the distribued one provided by r-pbd costs about 23 minutes....

There must be something wrong. Do you have a clue about it? Thank you very much!

Best wishes,

                                        Dale Wang


matprod.r
runtest.sh
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages