pbdDMAT runs too slowly

1 view

Skip to first unread message

Dale Wang

unread,

Sep 22, 2014, 4:49:02 AM9/22/14

to rbigdatap...@googlegroups.com

Hello everyone~
These days I try the r-pbd project and after a long time seeking, this is what I want. But I find that there may be something wrong because the performance of distributed matrix multiplication is so poor. Could you help me find where is the problem? If the performance is such poor, the package is not practical.

The program is a simple distributed matrix multiplication. see matprod.r in attachment.

### SHELL> mpiexec -np 2 Rscript --vanilla [...].r # Initialize process grid library(pbdDMAT, quiet=F) args <- commandArgs() #print(args) comm.print(args[6]) comm.print(args[7]) # global matrix size size <- as.numeric(args[6]) # block factor of that matrix blockfactor <- as.numeric(args[7]) init.grid() # Generate a random matrix common to all processes and distribute it. # This approach should only be used while learning the pbdDMAT package. comm.set.seed(123,diff=TRUE) # The matrix is a square matrix dx <- ddmatrix("rnorm", mean=0,sd=10,ncol=size,nrow=size,bldim=blockfactor) # Get the summary info print(dx) barrier() t1 <- proc.time()[3] # Do a self multiplication dx %*% dx barrier() t2 <- proc.time()[3] # Print the time of multiplication comm.print(t2 - t1, all.rank=T) # Finish finalize()

Then let us run the code on a single node with a single process (to compare the performance with a serial matrix multiplication program).

I run the program with the following command:

./runtest.sh singlehost 1 matprod.r "10000 10000"

which equals to the following command:

/usr/lib64/openmpi/bin/mpirun -mca btl_tcp_if_include eth0,lo -mca btl_tcp_disable_family 6 -hostfile singlehost -np 1 /home/hadoop/bin/R-3.1.1/lib64/R/bin/Rscript ./matprod.r 10000 10000

The singlehost hostfile contains only one machine.

Under this configuration, in the program, the matrix dx will be 10,000 * 10,000, and each block is also 10,000 * 10,000 which means that there will be only one block in the distributed matrix. Then the distributed matrix multiplication falls back to the one-block serial matrix mulitplicaiton when I call dx %*% dx.

The problem comes: the performance of distributed and serial version matrix multiplication differs apparently.
On our machines, the serial two 10,000 * 10,000 matrices multiplication costs about 230 seconds while the distribued one provided by r-pbd costs about 23 minutes....

There must be something wrong. Do you have a clue about it? Thank you very much!

Best wishes,

Dale Wang

matprod.r

runtest.sh

Reply all

Reply to author

Forward

This conversation is locked

You cannot reply and perform actions on locked conversations.

0 new messages