Reproducibility for INLA fitted value

Hanh Nguyen

unread,

May 12, 2021, 10:48:10 AM5/12/21

to R-inla discussion group

Hi,

I realize that set.seed() doesn't work as expect when I fit a model with INLA, for example:

set.seed(1)

fit0<-inla(n ~ 1 +offset(log(popsize))+f(ID.NIS, model = "besag",graph=W2),

data = data.spa,

family = "poisson",

control.predictor = list(compute = TRUE),

control.compute = list(dic = TRUE,waic=TRUE))

Then, I need to take fit0$summary.fitted.values$mean for further analysis with random forest. But each time I re-run the model, the fitted values slighly change and even thought the difference is extremely small, it leads to slightly different variable importance plot from random forest (I have set seed for random forest).

Is there any way we can deal with seed in INLA to make results reproducible?

Many thanks in advance.

Best regards,

Hanh.

INLA help

unread,

May 12, 2021, 11:09:39 AM5/12/21

to R-inla discussion group, Hanh Nguyen

If the random forests results are affected by this, what you say, ‘extremely small’ differences,
its really an issue with random forest ;-)

—

Haavard Rue

Helpdesk

he...@r-inla.org

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/c9b7b5ee-aed2-4114-a197-eda78e202274n%40googlegroups.com.

Hanh Nguyen

unread,

May 12, 2021, 11:51:07 AM5/12/21

to R-inla discussion group

Hi,

Thank you very much for your quick response and suggestion.

Best regards,

Hanh

Finn Lindgren

unread,

May 12, 2021, 12:03:46 PM5/12/21

to R-inla discussion group

The R random seed is essentially irrelevant for regular inla() runs, as it doesn't do MCMC. The differences you see are most likely due to numerical fluctuations due to order of operations in the parallel computations.

Normally, these fluctuations are negligible, but it means that, as nearly always when floating point computations are involved, and doubly when parallel computations are involved, exact equality should be expected.

For debugging purposes, you can use num.threads="1:1" to force single-threaded calculations, which normally does produce the exact same values each time, but for "production use", it's better to handle the floating point values appropriately instead (i.e., to not expect them to be exactly equal).

Finn

--

You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/c9b7b5ee-aed2-4114-a197-eda78e202274n%40googlegroups.com.

--

Finn Lindgren
email: finn.l...@gmail.com

Hanh Nguyen

unread,

May 12, 2021, 12:23:39 PM5/12/21

to R-inla discussion group

Hi,

Thank you very much for your detailed explanation.

Best regards,

Hanh

bach...@gmail.com

unread,

Jan 26, 2023, 3:22:05 PM1/26/23

to R-inla discussion group

My experience, at least at the moment, is more positive.

With a binomial model, which perhaps does involve (?) some random simulation in the conversion from the linear predictor on the link function back to the actual fitted values on the original scale, if that prediction option is selected, you will get slightly different values each time if you set no seed before you run the same model, but will get the same values each time to a great many decimal places if you set the same seed right before the model is run and proceed to run it multiple times, even if you are running it in parallel. Perhaps this particular model (with a few iid effects and fixed effects) is just simpler but I would consider this to be excellent reproducibility for a fitted value, whatever its underlying cause.

Reply all

Reply to author

Forward