Randomness with mrgsolve

82 views
Skip to first unread message

Kyle Baron

unread,
Feb 25, 2016, 6:06:13 PM2/25/16
to mrgsolve
I posted a vignette here that shows how mrgsolve respects the random number generator in R.  


This is pretty tedious to follow, but it carefully demonstrates different points.  Even when you use mclapply your results can be random if you properly set a seed. (Note: parallelization via mclapply is not possible on Windows machines).

There was some off-list discussion about wether this is true or not.  In general it is.  Certain environments may not respect your decision about RNG or the seed when using mclapply the way you think it should. Inside shiny apps may be one of these (only when using mclapply).  But it is complicated to tease apart, especially when it is inside of a shiny app.  The solution requires attention to the environment and properly setting the seed.  Once that is done, we think the mrgsolve results will be random.




Kyle Baron

unread,
Feb 26, 2016, 12:38:24 AM2/26/16
to mrgsolve

Here's a self-contained shiny app:

Demonstrating:
  1. Using mclapply with mrgsolve works fine in normal R usage
  2. Using mclapply with mrgsolve in a shiny app has problems with the seed ... guessing that the same seed gets passed to each worker
I've got no idea why it works fine in R but breaks down in shiny.  I coded a workaround where a seed is passed to the worker.  This does fix things and a little more hassle ... but no reason not to use mclapply in your shiny app.


Here's another app:

It simulates ETAs as well, but with mvrnorm.  It does seem to be playing properly with mclapply.

This is the first time I was able to capture this behavior.  



Message has been deleted

Kyle Baron

unread,
Feb 26, 2016, 10:14:27 AM2/26/16
to mrgsolve

One last test, them I'm going to quit:


This uses Rcpp-coded function to generate random variates with no mrgsolve involvement.  Like the demo with mrgsolve, when you use mclapply with this function, there are issues with properly setting the seed when you use it within a Shiny app, but not in regular R. 

This is by far the simplest example I could come up with to illustrate this.  I might send it off to the Shiny folks (or ask for insight on the Rcpp list).  

But for now, if using mclapply in a Shiny app with mrgsolve (or another Rcpp-based simulator) I'd pass the seed to the worker to be safe.

Thanks for enduring all of my posts on this :).  But it's really important ... when doing big simulation in a shiny app, parallelizing makes a world of difference.








Devin Pastoor

unread,
Feb 26, 2016, 11:41:52 AM2/26/16
to Kyle Baron, mrgsolve
Hi Kyle,

I would think the best way to handle this is to pass the seed into the simulation function and use it as a closure itself rather than trying to use a reactive expression as the calling environment, especially given the calling environment can get a bit funky when forking a new process, which mclapply does under the hood.

Relevant changes below:

sim <- function(i,N,om, seed) {
  set.seed(seed)
  x <- Rcppsim(i,N,om) 
  return(data_frame(ETA1 = x, i = i, seed = seed))
}

# best to determine the seed value some other way but for demonstration purposes hard coding
 mclapply(n,
             mc.cores=mc.cores,
             sim,N,OM1, 101) %>% bind_rows %>% smry


To make the seed different for each core, this could be easily handled by modifying the sim function to change in some programmatic way based on the replicate number, or rather than iterating over the replicate directly, instead use the n param to create list of length n, with the value being a vector of the iteration number and the seed 

sim <- function(seed_iteration,N,om, seed) {
  set.seed(seed_iteration$seed)
  x <- Rcppsim(seed_iteration$i,N,om) 
  return(data_frame(ETA1 = x, i = seed_iteration$i, seed = seed_iteration$seed))
}


Cheers,

Devin




--
MetrumRG Website: http://www.metrumrg.com/opensourcetools.html
---
You received this message because you are subscribed to the Google Groups "mrgsolve" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mrgsolve+u...@metrumrg.com.
To post to this group, send email to mrgs...@metrumrg.com.
Visit this group at https://groups.google.com/a/metrumrg.com/group/mrgsolve/.

Kyle Baron

unread,
Feb 26, 2016, 11:58:11 AM2/26/16
to Devin Pastoor, mrgsolve
Hi Devin - 

Thanks for the feedback.  Passing in the seed is where we ended up in the last post (I'm suggesting to always run with fix==TRUE):

Agree that there is something funky with doing this in reactive environment.  But there also seems to be some issue with the way Rcpp picks up the seed: no problem if outside shiny or not using mclapply, but using mclapply + shiny + Rcpp all seem to be required for the funkiness .

For example, I was surprised to find that this seems to work fine (in shiny with mclapply):
Calling mvnorm on the worker didn't seem to have issues picking a different seed on each worker.

But I totally agree now that best advice to pass the seed to worker.

Kyle





--
Kyle Baron

Devin Pastoor

unread,
Feb 26, 2016, 12:10:38 PM2/26/16
to Kyle Baron, mrgsolve
The other benefit of passing the seed is as you scale up the sim dimensionality so the time to run sims is non trivial, you can much more easily cache/output files at various checkpoints for either later reference, or if the sim crashes half way through to more easily pick up where you left off.

As to your surprise on the example, It that seems the expected behavior, as the forked process will not have a seed set since it is a fresh environment (setting the seed directly in shiny will only be for that process), so we can break down the scopes in which mvrnorm will have access to

1) its scope 
2) the enclosing scope --> sim, which has nothing
3) the global scope --> no seed set since new forked process

(I am about 98% sure this is the closure scope chain, it is definitely how it works in JS, and from what I remember about hadley's notes on what scopes functions have access to R is similar)

I think the only way to have it actually maintain the seed from the shiny process would be to capture sim's enclosing scope and inject it (yuck).


Kyle Baron

unread,
Feb 26, 2016, 12:49:16 PM2/26/16
to Devin Pastoor, mrgsolve
I wasn't expecting a fresh environment on the worker since we're using
mcparallel with RNGkind("L'Ecuyer-CMRG") and mc.set.seed=TRUE.

I guess the surprise is in the other direction? ...  that all of the workers get the same seed so there are clusters of results that are exactly the same (if expecting 100 unique results, you get only 50 when parallelizing on 2 cores).







Devin Pastoor

unread,
Feb 26, 2016, 2:03:48 PM2/26/16
to Kyle Baron, mrgsolve
Ah, that 2% worry was actually confirmed - the entire enclosing environment chain is retained, not just access to the enclosing scope and the global env http://adv-r.had.co.nz/Environments.html

# in global scope
RNGkind("L'Ecuyer-CMRG")
set.seed(101)
mc.reset.stream()
outside_var <- "outside"

# fake sim function
sim <- function(i,N,om1,om2) {
  inside_var <- "inside_sim"
  inside_func <- function(){
    completely_inside <- "completely_inside"
    data.frame(i, N, om1, om2, outside_var, inside_var, completely_inside)
  }
  return(inside_func())
}


Inline image 1


As to the seed setting, I did some more digging after thinking more about design you have - and now I realized - the global set.seed call is being referenced I think, rather than the local set.seed inside the reactive expression 

See this gist for an example:



myseed <- "global_seed"

# enclosing environment on function creation is globally, so the chain goes up to the global seed
sim <- function(i,N,om1,om2) {
  inside_var <- "inside_sim"
  inside_func <- function(){
    completely_inside <- "completely_inside"
    data.frame(i, N, om1, om2, outside_var, inside_var, completely_inside, seed)
  }
  return(inside_func())
}

server<-function(input, output) {
  output$table <- renderTable({
    
  seed <- "server_seed"
    mclapply(1:input$n,
             mc.cores=input$mccores,
             sim,input$N,input$OM1,input$OM2) %>% bind_rows 
})
}

Inline image 2


Further, the seed <- "server_seed" is actually not at all in the chain of environments for the sim function, therefore has no impact on it (to the point of actually removing the global seed reference will cause the function to crashing saying can't find seed anywhere)


takeaway - if you are declaring a function in the global scope, but using it in an apply call/ any other place to invoke it, any part of that closure will not be accessed by the invoked function, rather it will reference its own scope chain where it was declared

hence, the only way to get the local seed is to also declare the sim function in that scope

server<-function(input, output) {
  output$table <- renderTable({
    
  seed <- "server_seed"
  sim <- function(i,N,om1,om2) {
    inside_var <- "inside_sim"
    inside_func <- function(){
      completely_inside <- "completely_inside"
      data.frame(i, N, om1, om2, outside_var, inside_var, completely_inside, seed)
    }
    return(inside_func())
  }
    mclapply(1:input$n,
             mc.cores=input$mccores,
             sim,input$N,input$OM1,input$OM2) %>% bind_rows 
})
}

Inline image 3



Devin

Kyle Baron

unread,
Feb 26, 2016, 2:12:34 PM2/26/16
to mrgsolve, devin....@umaryland.edu

Devin - 

Just going through your latest ... but I was depending on mcparallel to get the seed set in the worker.  That's part of what mcparallel is supposed to do when you use "L'Ecuyer-CMGR".  Now we see that the reactive environment isn't even aware that we changed the RNG.  

Kyle


RNGkind("L'Ecuyer-CMRG")
set.seed(101)
mc
.reset.stream()


##' UI ###########################################
ui
<- fluidPage(titlePanel("Shiny RNG"),
               sidebarLayout
(sidebarPanel(),
                             mainPanel
(textOutput("text"))
               
))


##' SERVER ###########################################
server
<-function(input, output) {
  output$text
<- renderText({RNGkind()})
}


##' Run the shiny app
shinyApp
(ui = ui, server = server)







Kyle Baron

unread,
Mar 4, 2016, 2:23:39 PM3/4/16
to mrgsolve
Here is the proposed solution to using mclapply with mrgsolve in a shiny app:


They keys to running mrgsolve + shiny + mclapply are:
  1. Use "L'Ecuyer-CMRG" random number generator (as required by mcparallel)
  2. Call RNGkind to set the RNG to "L'Ecuyer-CMRG" inside the server (I set it inside the reactive expression as well just to be sure)
  3. In the function you call to run on the worker, call RNGkind and set to "L'Ecuyer-CMRG" again so that RNG streams passed will be utilized properly
I have tested this with the app linked above.  Will update again once we can confirm it with some real-world use or just more experience to see if this holds up.

Reply all
Reply to author
Forward
0 new messages