INLA vs (g)lm

54 views
Skip to first unread message

Thierry Onkelinx

unread,
Apr 2, 2021, 3:28:14 AM4/2/21
to R-inla discussion group
Dear all,

I'm finding a few issues when I compare INLA with lm() and glm(). Could I have feedback on the issues listed below?

Thanks a lot,

Thierry

library(INLA)
ds <- data.frame(
  y = c(36831.6112941944, 35948.9994876759, 35919.1918049138, 36030.3590884849,
    35672.7335285873, 35219.7770663821, 34125.928752802),
  x = 1:7
)
inla(y ~ x, family = "gaussian", data = ds) # inla crashes
# scaling y makes inla work and the results match lm output
ds$y_scaled <- ds$y / 1000
summary(inla(y_scaled ~ x, family = "gaussian", data = ds))
summary(lm(y_scaled ~ x, data = ds))

# glm standard errors are five times smaller
# point estimates match
summary(inla(y ~ x, family = "gamma", data = ds))
summary(glm(y ~ x, family = Gamma(link = "log"), data = ds))

# the discrepancies remain after scaling
summary(inla(y_scaled ~ x, family = "gamma", data = ds))
summary(glm(y_scaled ~ x, family = Gamma(link = "log"), data = ds))

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry....@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////


Finn Lindgren

unread,
Apr 2, 2021, 4:39:08 AM4/2/21
to Thierry Onkelinx, R-inla discussion group
Hi,

the scaling of the linear problem is 1) incompatible with the specified priors (the default fixed effect prior precision is 0.001, i.e. std.dev. 30, and the lm()-estimated -350 is more than 10 times larger) and 2) causes numerical issues.
1) is a model misspecification problem, which can be fixed by changing the fixed effect precision and likely also the prior for the observation precision, but that still leaves 2) causing issues, and
2) is best solved by the rescaling (what also brings the model within reasonable range of the default priors)

For the Gamma case, I'm inclined to guess it's mostly a difference between frequentist standard error and Bayesian posterior standard deviation. For large n I'd expect them to be more similar, but this sample is quite small.

Finn

Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/CAJuCY5zocYNGo3dztMVLkQxx1nAGiB-0NVOJ2BbmiNLbnYHz-A%40mail.gmail.com.


--
Finn Lindgren
email: finn.l...@gmail.com

Helpdesk

unread,
Apr 2, 2021, 9:35:56 AM4/2/21
to Finn Lindgren, Thierry Onkelinx, R-inla discussion group
to follow up this:

for #2, its a glm() issue, see attached comparison with INLA and JAGS

in this case data are few and glm() does not produce decent uncertainty
results. this is not 'an error' with glm() but a 'feature', as it only
guaranteed to produce good results in the asymptotic limit. The only
issue is that the definition is circular: one is within the 'asymptotic
limit' when the ``results are good''.

about #1, yes there is an issue about scaling, and INLA assumes a
'reasonable input'.

although it would be possible to internally scale everything, this only
will work for straight forward models. joint models, etc, will be hard
to scale automatically.

Best
H
> > --
> > You received this message because you are subscribed to the Google
> > Groups "R-inla discussion group" group.
> > To unsubscribe from this group and stop receiving emails from it,
> > send
> > an email to r-inla-discussion...@googlegroups.com.
> > To view this discussion on the web, visit
> > https://groups.google.com/d/msgid/r-inla-discussion-group/CAJuCY5zocYNGo3dztMVLkQxx1nAGiB-0NVOJ2BbmiNLbnYHz-A%40mail.gmail.com
> > .
>
>

--
Håvard Rue
he...@r-inla.org
Screenshot from 2021-04-02 16-29-43.png

Thierry Onkelinx

unread,
Apr 2, 2021, 11:02:32 AM4/2/21
to Helpdesk, Finn Lindgren, R-inla discussion group
Dear Håvard and Finn,

I ran a little simulation study comparing glm and inla for the gam family. For large sample sizes, both methods yield comparable results. For smaller samples, INLA provides more robust estimates for the Gamma precision.

Best regards,

Thierry

library(INLA)
run_simulation <- function(
  n_data = 1e2, trend = -0.1, start_value = 35000, precision = 700, n_sim = 1e3
) {
  replicate(n_sim, {
    ds <- data.frame(
      x = seq(0, 10, length = n_data)
    )
    ds$mu <- log(start_value) + log(1 + trend) * ds$x
    ds$b <- precision * ds$mu
    ds$a  <- ds$mu * ds$b
    ds$y <- exp(rgamma(n_data, shape = ds$a, rate = ds$b))
    frequentist <- glm(y ~ x, family = Gamma(link = "log"), data = ds)
    bayesian <- inla(y ~ x, family = "gamma", data = ds)
    cf <- coefficients(summary(frequentist))[, c("Estimate", "Std. Error")]
    cf <- data.frame(cf)
    colnames(cf) <- c("mean", "sd")
    c(
      unlist(bayesian$summary.fixed[, c("mean", "sd")] / cf),
      freq_prec = 1 / (summary(frequentist)$dispersion * precision),
      bayes_prec = bayesian$summary.hyperpar$mean / precision
    )
  })
}
summary(t(run_simulation(n_data = 10)))
# mean1       mean2             sd1             sd2          freq_prec        bayes_prec
# Min.   :1   Min.   :0.9999   Min.   :1.259   Min.   :1.252   Min.   :0.3525   Min.   :0.2702
# 1st Qu.:1   1st Qu.:1.0000   1st Qu.:1.563   1st Qu.:1.561   1st Qu.:0.8143   1st Qu.:0.4195
# Median :1   Median :1.0000   Median :1.720   Median :1.719   Median :1.0959   Median :0.4701
# Mean   :1   Mean   :1.0000   Mean   :1.814   Mean   :1.813   Mean   :1.3614   Mean   :0.4724
# 3rd Qu.:1   3rd Qu.:1.0000   3rd Qu.:1.994   3rd Qu.:1.994   3rd Qu.:1.6728   3rd Qu.:0.5321
# Max.   :1   Max.   :1.0001   Max.   :4.138   Max.   :4.139   Max.   :9.0343   Max.   :0.6707
summary(t(run_simulation(n_data = 100)))
# mean1       mean2        sd1             sd2          freq_prec       bayes_prec
# Min.   :1   Min.   :1   Min.   :1.041   Min.   :1.040   Min.   :0.680   Min.   :0.6372
# 1st Qu.:1   1st Qu.:1   1st Qu.:1.063   1st Qu.:1.062   1st Qu.:0.915   1st Qu.:0.8272
# Median :1   Median :1   Median :1.069   Median :1.069   Median :1.000   Median :0.8972
# Mean   :1   Mean   :1   Mean   :1.070   Mean   :1.070   Mean   :1.023   Mean   :0.9117
# 3rd Qu.:1   3rd Qu.:1   3rd Qu.:1.077   3rd Qu.:1.077   3rd Qu.:1.119   3rd Qu.:0.9915
# Max.   :1   Max.   :1   Max.   :1.115   Max.   :1.115   Max.   :1.661   Max.   :1.3680
summary(t(run_simulation(n_data = 1000)))
# mean1       mean2        sd1             sd2          freq_prec        bayes_prec
# Min.   :1   Min.   :1   Min.   :1.003   Min.   :1.002   Min.   :0.8815   Min.   :0.8764
# 1st Qu.:1   1st Qu.:1   1st Qu.:1.006   1st Qu.:1.006   1st Qu.:0.9710   1st Qu.:0.9647
# Median :1   Median :1   Median :1.007   Median :1.006   Median :1.0017   Median :0.9950
# Mean   :1   Mean   :1   Mean   :1.007   Mean   :1.006   Mean   :1.0037   Mean   :0.9971
# 3rd Qu.:1   3rd Qu.:1   3rd Qu.:1.007   3rd Qu.:1.007   3rd Qu.:1.0359   3rd Qu.:1.0283
# Max.   :1   Max.   :1   Max.   :1.010   Max.   :1.010   Max.   :1.1695   Max.   :1.1579

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry....@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////




Op vr 2 apr. 2021 om 15:35 schreef Helpdesk <he...@r-inla.org>:
Reply all
Reply to author
Forward
0 new messages