Re: Return value of a Function from only one processor

조회수 11회
읽지 않은 첫 메시지로 건너뛰기

Ostrouchov, George

읽지 않음,
2016. 10. 13. 오후 10:21:1816. 10. 13.
받는사람 RBigDataProgramming, Eva Liliane Ujeneza
Dear Eva,

I would recommend an entirely different approach with batch SPMD style of parallel programming (Wei-Chen mentioned it as his item 3.) 

I assume that you have a data set on which the evaluation of funct0 takes a long time. You would like to split up the data set, evaluate the function on each piece, and then sum the local results for a final result. The following assumes that you can replicate your data in every process. The get.jid() function then subsets the data list to a different local piece on every process and allreduce() sums the local results:

my.data <- data[ get.jid( length( data ) ]
funct1 <- function( parameters ) {
   res <- lapply( parameters, func0, my.data )
   allreduce( sum( unlist( res ) ) )
}
result <- optim( parameters, funct1, my.data )
comm.print( result )

You put the above (and the rest of your code defining func0 and reading data) in a file and run it with mpirun as an Rscript batch process. I do not address how you get your data. There are several approaches to get data that depend on your file system available and how big is the data. 

I copy your message to RBigDataProgramming for a wider audience.

George


From: Eva Liliane Ujeneza <uje...@gmail.com>
Date: Thursday, October 13, 2016 at 4:38 PM
To: <RBig...@gmail.com>
Subject: Return value of a Function from only one processor

Dear all,

I think I have a concept problem with the use of multiple processors.

I am trying to estimate the parameters of a function funct0 using optim. I am using a big longitudinal dataset of patients observation (about 200 thousands individuals). I am using pbdLapply to evaluate the residual sum of square (RSS) of each patient using multiple processors. 

I am wrapping pbdLapply within a function, let's say funct1.

funct1 <- (data,parameters){
test <- pbdLapply(data, funct0, parameters)   # calculate RSS
res <- sum(unlist(test))
return(res)
}

I need the output of this function funct, to be a single value and to be passed by a single processor to the R built in function optim. The code run smoothly on a single processor (run the script with Rscript), but then runs indefinitely when I use multiple (mpiexec ... Rscript).

I am guessing the problem is that in the case of multiple processors, optim is receiving information from all processors, and somehow gets caught up in a loop (I am not really sure what is going on).

This lead to my question: is there a way I can return the value of funct1 function from only one processor and not all of them ?

​Regards,

Eva​


=============================================================

Eva Liliane Ujeneza, 
PhD student
DST/NRF Centre of Excellence in Epidemiological 
Modelling and Analysis (SACEMA)
Depart of Mathematical Sciences
University of Stellenbosch
South Africa
Twitter: @EvaUje

“Spectacular achievement is always preceded by unspectacular preparation.” 
― Robert H. Schuller

 “Don't say you don't have enough time. You have exactly the same number of hours per day that were given to Helen Keller, Pasteur, Michael Angelo, Mother Teresa, Leonardo da Vinci, Thomas Jefferson, and Albert Einstein.”
― H. Jackson Brown Jr. 
전체답장
작성자에게 답글
전달
새 메시지 0개