Problems with reproducibility/setting seed for random number generator

589 views
Skip to first unread message

londo...@googlemail.com

unread,
Aug 3, 2016, 3:02:37 PM8/3/16
to Stan users mailing list

Hi,

I'm having problems generating reproducible results from stan's random number generator.

Here's a minimal example.

------ seedProblemMinimal.stan --------
functions {
      
    vector dgp_rng(real beta0, real tau, int S) {
      vector[S] beta; // define the output vector beta to be as long as S, the number of studies
      
      // Now fill it in (not vectorized for clarity in example)
      for (n in 1:S) {
          beta[n] = normal_rng(beta0, tau);
      }
      return beta;
    }
}
... etc. <includes necessary blocks>

------- seedProblemMinimal.R -------------
rm(list = ls())
library("rstan")

rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

# Compile stan functions --------------------------------------------------------

set.seed(42)
compiled_functions <- stan_model(file = "seedProblemMinimal.stan")
# This should generate one useable functions: dgp_rng(beta0, tau, S)
expose_stan_functions(compiled_functions)

But when I run this in the console, the results aren't reproducible when setting the seed.
--- Example 1 (from Console)
> dgp_rng(beta0 = 0, tau = 1, S = 5, seed = 42)
[1] -1.4489476  1.1870000 -0.1039393 -0.5189862 -1.6004025
> dgp_rng(beta0 = 0, tau = 1, S = 5, seed = 42)
[1]  1.1512011 -0.9238153  0.2806266 -1.4997973 -0.6603649

-- Example 2 (from Console)
> set.seed(42)
> dgp_rng(beta0 = 0, tau = 1, S = 5, seed = 42)
[1]  0.9063687  0.5709815 -1.6421407 -0.3739520  1.2693452
> set.seed(42)
> dgp_rng(beta0 = 0, tau = 1, S = 5, seed = 42)
[1] -0.9521367 -1.3151199  1.0361923 -0.3929310  0.8783626

In case it's helpful, session info is as follows:
> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.11.1       StanHeaders_2.11.0 ggplot2_2.1.0     

loaded via a namespace (and not attached):
 [1] colorspace_1.2-6 scales_0.4.0     plyr_1.8.4       rsconnect_0.4.3  parallel_3.3.1   tools_3.3.1      inline_0.3.14    gtable_0.2.0     gridExtra_2.2.1  Rcpp_0.12.6      grid_3.3.1      
[12] munsell_0.4.3    stats4_3.3.1 

Any guidance on how to generate reproducible results from the _rng would be most appreciated.

Thanks,

Greg

Ben Goodrich

unread,
Aug 3, 2016, 3:16:09 PM8/3/16
to Stan users mailing list
On Wednesday, August 3, 2016 at 3:02:37 PM UTC-4, londo...@googlemail.com wrote:
I'm having problems generating reproducible results from stan's random number generator.

The expose_stan_functions() function only sets the seed the first time a user-defined function ending in _rng is called. You can't change its seed afterwards within the same R session because it is declared static. This behavior is documented in the Details section of ?expose_stan_functions.

Ben

Bob Carpenter

unread,
Aug 3, 2016, 3:25:53 PM8/3/16
to stan-...@googlegroups.com
Should we use the R random number generator to generate a new seed
each time it's called? The constructors for the Boost RNGs are
pretty cheap.
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,
Aug 3, 2016, 3:41:29 PM8/3/16
to Stan users mailing list
On Wednesday, August 3, 2016 at 3:25:53 PM UTC-4, Bob Carpenter wrote:
Should we use the R random number generator to generate a new seed
each time it's called?  The constructors for the Boost RNGs are
pretty cheap.

IIRC, the reason it is currently this way is so that we didn't construct an unlimited number of Boost PRNGs that can only be destroyed by quitting the R session. That and you would have to update the state of the PRNG in R in order to get different numbers between consecutive calls to the user-defined _rng function. I am not seeing why someone would want to change the seed after setting it once since setting it once ensures reproducibility. 

Ben

Bob Carpenter

unread,
Aug 3, 2016, 5:58:07 PM8/3/16
to stan-...@googlegroups.com
The Boost PRNGs will only be recovered

-- when they go out of scope if they're local variables, or

-- when they're deleted if they're allocated with operator new.

I have no idea how that interacts with R or what users
want for reproduciblity within an R session.

- Bob

Ben Goodrich

unread,
Aug 3, 2016, 6:44:59 PM8/3/16
to Stan users mailing list
On Wednesday, August 3, 2016 at 5:58:07 PM UTC-4, Bob Carpenter wrote:
I have no idea how that interacts with R

Not at all, except for memory consumed
 
or what users want for reproduciblity within an R session

I can't imagine anything other than it be reproducible, which it is currently.

The OP believes it is a bug that

foo_rng(seed = 42L) != foo_rng(seed = 42L)

which is because the first call to foo_rng constructs the Boost PRNG with the static keyword and seed 42 and then updates the state of the PRNG after generating some random number. Thus, the second call to foo_rng skips constructing the Boost PRNG and does not reset the seed to 42.

If the Boost PRNG were not declared with the static keyword, then

foo_rng() == foo_rng() # default seed in both cases is 0L

because it would be constructed with seed 0 by the first call to foo_rng and destroyed and then reconstructed with seed 0 by the second call to foo_rng and redestroyed. This behavior would be atypical for R users since

rnorm(n = 1) != rnorm(n = 1)

and, if not recognized, would lead to disastrous results because all the realizations of foo_rng() would be the same. To avoid this, the user would have to pass a seed explicitly each time foo_rng was called, so that

foo_rng(seed = 0L) != foo_rng(seed = 1L)

which is also atypical for R.

An alternative would be to introduce some additional functions like

construct_PRNG(seed = 0L)
destroy_PRNG
()

that the user would have to call explicitly before / after calling foo_rng(). That seemed difficult for me (to pass the reference to the PRNG to all _rng functions) and for users (who do not have to do that before / after calling functions like rnorm).

The way it works now is the way it has always worked, it is reproducible, and no one has mentioned it until today. And I don't know why anyone would want to call a _rng function multiple times with the same PRNG state within the same R session.

Ben

Bob Carpenter

unread,
Aug 11, 2016, 10:17:07 AM8/11/16
to stan-...@googlegroups.com

> On Aug 4, 2016, at 12:44 AM, Ben Goodrich <goodri...@gmail.com> wrote:
>
> On Wednesday, August 3, 2016 at 5:58:07 PM UTC-4, Bob Carpenter wrote:
> I have no idea how that interacts with R
>
> Not at all, except for memory consumed
>
> or what users want for reproduciblity within an R session
>
> I can't imagine anything other than it be reproducible, which it is currently.

It's reproducible, but you can't reset that reproducibility within
a session. Not sure exactly why you'd want to compute the same
thing twice, but presumably that's what they're asking. Same reason
R lets you call set.seed() as many times as you want.

> The OP believes it is a bug that
>
> foo_rng(seed = 42L) != foo_rng(seed = 42L)
>
> which is because the first call to foo_rng constructs the Boost PRNG with the static keyword and seed 42 and then updates the state of the PRNG after generating some random number. Thus, the second call to foo_rng skips constructing the Boost PRNG and does not reset the seed to 42.

I agree with the OP here, because it sure looks like those should
return the same thing. If the second call doesn't do anything
with the seed argument, it should alert the user by throwing an
exception or at the very least printing some kind of warning.

>
> If the Boost PRNG were not declared with the static keyword, then
>
> foo_rng() == foo_rng() # default seed in both cases is 0L
>
> because it would be constructed with seed 0 by the first call to foo_rng and destroyed and then reconstructed with seed 0 by the second call to foo_rng and redestroyed. This behavior would be atypical for R users since
>
> rnorm(n = 1) != rnorm(n = 1)
>
> and, if not recognized, would lead to disastrous results because all the realizations of foo_rng() would be the same.

I agree that would be bad news.

But presumably an R user would expect a and b to get the same value in:

> set.seed(42)
> a = rnorm(n = 1);
> set.seed(42)
> b = rnorm(n = 1);

> To avoid this, the user would have to pass a seed explicitly each time foo_rng was called, so that
>
> foo_rng(seed = 0L) != foo_rng(seed = 1L)
>
> which is also atypical for R.
>
> An alternative would be to introduce some additional functions like
>
> construct_PRNG(seed = 0L)
> destroy_PRNG()
>
> that the user would have to call explicitly before / after calling foo_rng(). That seemed difficult for me (to pass the reference to the PRNG to all _rng functions) and for users (who do not have to do that before / after calling functions like rnorm).

That would be most natural for the R users, I'd
think, with a default construction on entry.

> The way it works now is the way it has always worked, it is reproducible, and no one has mentioned it until today. And I don't know why anyone would want to call a _rng function multiple times with the same PRNG state within the same R session.

Agreed. And no point doing work on spec without a good motivation.

- Bob

Kevin Van Horn

unread,
Jun 5, 2017, 7:36:06 PM6/5/17
to Stan users mailing list
I agree with the OP -- I also want to be able to reset the PRNG. The reason is for testing. I cannot get reproducibility with the current behavior, because the state of the PRNG at the beginning of a test depends on what tests ran before it, if they also use the PRNG. I need to be able to reset the PRNG at the beginning of every test.

Furthermore, even within a single test I may want to reset the PRNG. Sometimes I want to verify that two different ways of computing the same result actually yield the same result. That requires starting with the same PRNG state if there is any random-number generation involved.

Bob Carpenter

unread,
Jun 6, 2017, 3:15:10 PM6/6/17
to stan-...@googlegroups.com
I think this is strictly an RStan issue and as such should probably
be raised on their issue tracker. The core interfaces on the C++
side are RNG neutral---they just take one as an argument and use
it throughout. Mitzi just fixed the init of transformed data so it's
always the same across chains (specifically, it uses the seed for
the transformed data and then skips ahead trillions of draws for each chain).

- Bob
Reply all
Reply to author
Forward
0 new messages