pbdMPI and pbdDMAT

102 views
Skip to first unread message

Alba Martinez-Ruiz

unread,
Aug 25, 2015, 5:22:03 PM8/25/15
to RBigDataProgramming
Dear pbdR team,

In pbdR, is there a function or command similar to mapply?

Best regards,
Alba

Ostrouchov, George

unread,
Aug 25, 2015, 10:09:27 PM8/25/15
to rbigdatap...@googlegroups.com
Dear Alba,

Are you running this in batch as an SPMD program? In that case, the usual mapply function can be used when all arguments are the same length. This assumes you read or generate your data (the … arguments of mapply) in parallel as distributed pieces.

We need to know more about your application (like how do you generate the arguments to mapply) to give more specific advice.

Regards,
George

--
Programming with Big Data in R
Simplifying Scalability
http://r-pbd.org/
---
You received this message because you are subscribed to the Google Groups "RBigDataProgramming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rbigdataprogram...@googlegroups.com.
To post to this group, send email to rbigdatap...@googlegroups.com.
Visit this group at http://groups.google.com/group/rbigdataprogramming.
To view this discussion on the web visit https://groups.google.com/d/msgid/rbigdataprogramming/ac719556-2959-42cd-b331-7ef9b43a618d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alba Martinez-Ruiz

unread,
Aug 26, 2015, 1:55:27 PM8/26/15
to RBigDataProgramming
Dear George,

Many thanks for your answer.

Yes, we are running the code in batch as a SPMD program. We generate the data in rank 0 and then distribute them with as.ddmatrix,

 

library(pbdDMAT)

init.grid()

 

.BLDIM <- c(2,2)

.ICTXT <- 0

 

…. etc.

 

finalize()

 

We have two distributed objects (a vector and a matrix), and we apply a function to them with mapply. It’s working OK now.

 

Yesterday I was thinking in pbdLapply. I have for instance,

 

pbdLapply(x, sum, pbd.mode=”spmd”)

 

I thought I could have a similar function for two or more arguments as in the case of mapply. But, something like this, it would apply if I were working just with the pbdMPI library, is this OK? Or am I wrong?

 

Thanks a lot in advance.

Best regards,

Alba



El martes, 25 de agosto de 2015, 22:09:27 (UTC-4), George Ostrouchov escribió:
Dear Alba,

Are you running this in batch as an SPMD program? In that case, the usual mapply function can be used when all arguments are the same length. This assumes you read or generate your data (the … arguments of mapply) in parallel as distributed pieces.

We need to know more about your application (like how do you generate the arguments to mapply) to give more specific advice.

Regards,
George

From: <rbigdatap...@googlegroups.com> on behalf of Alba Martinez-Ruiz <amar...@ucsc.cl>
Reply-To: <rbigdatap...@googlegroups.com>
Date: Tuesday, August 25, 2015 at 5:22 PM
To: RBigDataProgramming <rbigdatap...@googlegroups.com>
Subject: [RBigData] pbdMPI and pbdDMAT

Dear pbdR team,

In pbdR, is there a function or command similar to mapply?

Best regards,
Alba

--
Programming with Big Data in R
Simplifying Scalability
http://r-pbd.org/
---
You received this message because you are subscribed to the Google Groups "RBigDataProgramming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rbigdataprogramming+unsub...@googlegroups.com.

Wei-Chen Chen

unread,
Aug 26, 2015, 10:27:00 PM8/26/15
to RBigDataProgramming
Dear Alba,

May I have an example what you want the mapply for, that will be easier for me to think how it should be useful.
DMAT can be a different story.

Sincerely,
Wei-Chen Chen


On Wednesday, August 26, 2015 at 1:55:27 PM UTC-4, Alba Martinez-Ruiz wrote:
Dear George,

Many thanks for your answer.

Yes, we are running the code in batch as a SPMD program. We generate the data in rank 0 and then distribute them with as.ddmatrix,

 

library(pbdDMAT)

init.grid()

 

.BLDIM <- c(2,2)

.ICTXT <- 0

 

…. etc.

 

finalize()

 

We have two distributed objects (a vector and a matrix), and we apply a function to them with mapply. It’s working OK now.

 

Yesterday I was thinking in pbdLapply. I have for instance,

 

pbdLapply(x, sum, pbd.mode=”spmd”)

 

I thought I could have a similar function for two or more arguments as in the case of mapply. But, something like this, it would apply if I were working just with the pbdMPI library, is this OK? Or am I wrong?

 

Thanks a lot in advance.

Best regards,

Alba

...

Alba Martinez-Ruiz

unread,
Aug 27, 2015, 4:58:19 PM8/27/15
to RBigDataProgramming
Dear Wei Chen,

Thanks for your answer and sorry for delaying responding, I understand. Yes, a simple example would be,

1. I have two distributed matrices, for instance,

if (comm.rank() == 0) {
  X <- matrix(rnorm(n*p,0,1),n,p)
  } else {
  X <- NULL }
if (comm.rank() == 0) {
 Y <- matrix(rnorm(n*p,0,1),p,n)
 } else {
 Y <- NULL }
dX <- as.ddmatrix(X)
dY <- as.ddmatrix(Y)

2. I have a function
f <- function(x,y) x%*%y

3. and this is the result
result <- mapply(f,dX,dY)

This is a simple example, but I have a more complex function that depends on two arguments, we try to improve our code to save memory and decrease running times. From a parallel programming perspective, what would be more efficient? When I should use one or another? just depends on the results of the benchmarks or the problem I am implementing?

Many thanks in advance.
Best regards,
Alba

Wei-Chen Chen

unread,
Aug 28, 2015, 11:33:55 PM8/28/15
to RBigDataProgramming
Dear Alba,

Great.

I added them to a wish list at
https://github.com/wrathematics/pbdDMAT/issues/18

Sincerely,
Wei-Chen Chen

Alba Martinez-Ruiz

unread,
Sep 15, 2015, 2:45:58 PM9/15/15
to RBigDataProgramming

Dear all, 


I hope you are fine. I have an example with an error:


library(pbdDMAT, quiet=TRUE)

init.grid()

 

.BLDIM <- c(2,2)

.ICTXT <- 0


comm.set.seed(123)

n <- 100

p <- 16


if (comm.rank() == 0) {

   X <- matrix(rnorm(n*p,0,1),n,p)

   W <- rep(0,p)

} else {

X <- NULL

W <- NULL

}


dX <- as.ddmatrix(X)

dW <- as.ddmatrix(W)


f <- function(x,y) norm(x%*%y,"F")


test <- mapply(f,dX[,1:3],dW[,1:3])


comm.print("Printing test.................................................", rank.print=0)

comm.print(test, rank.print=0)

comm.print(test, rank.print=1)


finalize()


2. The error is:


Error in x@Data[1L:ldim[1L], 1L:ldim[2L], drop = FALSE] :

  subscript out of bounds

Calls: mapply -> [ -> [ -> .local

Execution halted


This is a problem of pbdDMAT, isn’t it?


Many thanks in advance.

Best regards,

Alba

Ostrouchov, George

unread,
Sep 16, 2015, 12:05:25 AM9/16/15
to rbigdatap...@googlegroups.com
Yes, it fails because mapply() does not yet handle a ddmatrix. 

There is probably a way around using mapply() with functions that do handle a ddmatrix.

Can you describe in words what is intended? I see that the function f()  multiplies a matrix and a vector and then takes the Frobenius norm of the result. But it is not clear how mapply() is intended to take apart its matrix arguments before handing them to f(). Is it by rows?

Thanks,
George

From: <rbigdatap...@googlegroups.com> on behalf of Alba Martinez-Ruiz <amar...@ucsc.cl>
Reply-To: <rbigdatap...@googlegroups.com>
Date: Tuesday, September 15, 2015 at 2:45 PM
To: RBigDataProgramming <rbigdatap...@googlegroups.com>
--
Programming with Big Data in R
Simplifying Scalability
http://r-pbd.org/
---
You received this message because you are subscribed to the Google Groups "RBigDataProgramming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rbigdataprogram...@googlegroups.com.

To post to this group, send email to rbigdatap...@googlegroups.com.
Visit this group at http://groups.google.com/group/rbigdataprogramming.

Alba Martinez Ruiz

unread,
Sep 16, 2015, 11:47:58 AM9/16/15
to rbigdatap...@googlegroups.com

Dear George,


Thanks for your answer. I have a distributed matrix with a blocking factor of c(2,2) and .ictxt <- 0. I try to apply the function by set of columns (or different tables), but I’ am not sure about how mapply is intended to manage its arguments.

In any case, I am updating the pbdDMAT package to the version available in github in my virtual machines, if this were the problem.

Best regards,
Alba

You received this message because you are subscribed to a topic in the Google Groups "RBigDataProgramming" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rbigdataprogramming/T4iKu7vezyw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rbigdataprogram...@googlegroups.com.

To post to this group, send email to rbigdatap...@googlegroups.com.
Visit this group at http://groups.google.com/group/rbigdataprogramming.

Wei-Chen Chen

unread,
Sep 21, 2015, 10:17:15 PM9/21/15
to RBigDataProgramming
Dear Alba,

You may try next by using indices instead of giving mapply entire distributed matrices.

f <- function(id.x, id.y){
  norm(dX[, 1:id.x] %*% dY[, 1:id.y], "F")    # As long as dimensions are correct.
}
test <- mapply(f, 1:3, 1:3)

Sincerely,
Wei-Chen Chen


On Wednesday, September 16, 2015 at 11:47:58 AM UTC-4, Alba Martinez-Ruiz wrote:

Dear George,


Thanks for your answer. I have a distributed matrix with a blocking factor of c(2,2) and .ictxt <- 0. I try to apply the function by set of columns (or different tables), but I’ am not sure about how mapply is intended to manage its arguments.

In any case, I am updating the pbdDMAT package to the version available in github in my virtual machines, if this were the problem.

Best regards,
Alba

2015-09-16 1:05 GMT-03:00 Ostrouchov, George <geor...@gmail.com>:
Yes, it fails because mapply() does not yet handle a ddmatrix. 

There is probably a way around using mapply() with functions that do handle a ddmatrix.

Can you describe in words what is intended? I see that the function f()  multiplies a matrix and a vector and then takes the Frobenius norm of the result. But it is not clear how mapply() is intended to take apart its matrix arguments before handing them to f(). Is it by rows?

Thanks,
George

To unsubscribe from this group and stop receiving emails from it, send an email to rbigdataprogramming+unsub...@googlegroups.com.
To post to this group, send email to rbigdataprogramming@googlegroups.com.

--
Programming with Big Data in R
Simplifying Scalability
http://r-pbd.org/
---
You received this message because you are subscribed to a topic in the Google Groups "RBigDataProgramming" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rbigdataprogramming/T4iKu7vezyw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rbigdataprogramming+unsub...@googlegroups.com.
To post to this group, send email to rbigdataprogramming@googlegroups.com.

Alba Martinez-Ruiz

unread,
Sep 23, 2015, 9:38:56 AM9/23/15
to RBigDataProgramming
Thanks Wei-Chen.

Best regards,
Alba


El lunes, 21 de septiembre de 2015, 23:17:15 (UTC-3), Wei-Chen Chen escribió:
Dear Alba,

You may try next by using indices instead of giving mapply entire distributed matrices.

f <- function(id.x, id.y){
  norm(dX[, 1:id.x] %*% dY[, 1:id.y], "F")    # As long as dimensions are correct.
}
test <- mapply(f, 1:3, 1:3)

Sincerely,
Wei-Chen Chen

On Wednesday, September 16, 2015 at 11:47:58 AM UTC-4, Alba Martinez-Ruiz wrote:

Dear George,


Thanks for your answer. I have a distributed matrix with a blocking factor of c(2,2) and .ictxt <- 0. I try to apply the function by set of columns (or different tables), but I’ am not sure about how mapply is intended to manage its arguments.

In any case, I am updating the pbdDMAT package to the version available in github in my virtual machines, if this were the problem.

Best regards,
Alba
2015-09-16 1:05 GMT-03:00 Ostrouchov, George <geor...@gmail.com>:
Yes, it fails because mapply() does not yet handle a ddmatrix. 

There is probably a way around using mapply() with functions that do handle a ddmatrix.

Can you describe in words what is intended? I see that the function f()  multiplies a matrix and a vector and then takes the Frobenius norm of the result. But it is not clear how mapply() is intended to take apart its matrix arguments before handing them to f(). Is it by rows?

Thanks,
George


To post to this group, send email to rbigdatap...@googlegroups.com.

--
Programming with Big Data in R
Simplifying Scalability
http://r-pbd.org/
---
You received this message because you are subscribed to a topic in the Google Groups "RBigDataProgramming" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rbigdataprogramming/T4iKu7vezyw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rbigdataprogramming+unsub...@googlegroups.com.
To post to this group, send email to rbigdatap...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages