Factor vs Numeric / Integer for binomial model

536 views
Skip to first unread message

Paul Lantos

unread,
Aug 3, 2017, 11:08:38 AM8/3/17
to R-inla discussion group
For 2-level variables like male/female that are coded as 0 and 1, does it matter if I treat them as numeric vs factor?

In models that use brms/STAN I don't see a difference at all whether I use integer vs factor.

In INLA, however, the factor variables are reported in the model summary at each level, and I find that a bit less intuitive to report and graph.

INLA help

unread,
Aug 3, 2017, 11:19:53 AM8/3/17
to Paul Lantos, R-inla discussion group


On Thu, 2017-08-03 at 08:08 -0700, Paul Lantos wrote:
> For 2-level variables like male/female that are coded as 0 and 1,
> does it matter if I treat them as numeric vs factor?

shouldn't do that. if factor, then one level will be dropped, unless
you use another strategy, see FAQ. (control.fixed =
list(expand.factor.strategy = "inla")' )





--
Håvard Rue
he...@r-inla.org

Paul Lantos

unread,
Aug 3, 2017, 12:32:25 PM8/3/17
to R-inla discussion group, paul....@gmail.com, he...@r-inla.org
Ok, will doublecheck.

Also, INLA does not seem to accept a 2-level factor as my response variable, I get the following if I code it as a factor:

The response for family[1] is not of type 'numeric|list|matrix'; don't know what to do.

INLA help

unread,
Aug 4, 2017, 2:49:09 AM8/4/17
to Paul Lantos, R-inla discussion group
On Thu, 2017-08-03 at 09:32 -0700, Paul Lantos wrote:
> Ok, will doublecheck.
>
> Also, INLA does not seem to accept a 2-level factor as my response
> variable, I get the following if I code it as a factor:
>
> The response for family[1] is not of type 'numeric|list|matrix';
> don't know what to do.

that is true, this is not implemented.


--
Håvard Rue
he...@r-inla.org

Paul Lantos

unread,
Aug 4, 2017, 12:51:59 PM8/4/17
to R-inla discussion group, paul....@gmail.com, he...@r-inla.org
It is not dropping one level of every variable:

> str(data)
'data.frame':	19348 obs. of  24 variables:
 $ race      : Factor w/ 2 levels "0","1": 2 2 1 2 2 2 1 2 1 2 ...
 $ gender    : Factor w/ 2 levels "0","1": 1 2 1 1 1 2 1 1 2 1 ...
 $ ga        : int  37 37 38 38 40 37 34 36 36 36 ...


Fixed effects:
           mean      sd 0.025quant 0.5quant 0.975quant    mode kld
b0      -3.6931 18.4548   -39.9177  -3.7010    32.5429 -3.7111   0
bw      -0.0965  0.0880    -0.2708  -0.0960     0.0749 -0.0950   0
ga      -0.2031  0.0812    -0.3610  -0.2037    -0.0423 -0.2047   0
gender0 -1.8084 18.3103   -37.7563  -1.8101    34.1160 -1.8118   0
gender1 -1.8113 18.3103   -37.7592  -1.8130    34.1131 -1.8146   0
nicu1    0.2780  0.5175    -0.8355   0.3141     1.1975  0.3902   0
sti1     0.1152  0.1598    -0.2036   0.1169     0.4244  0.1203   0
parity   0.1663  0.0641     0.0375   0.1673     0.2894  0.1693   0
race1    0.9413  0.2340     0.4978   0.9355     1.4182  0.9236   0

Paul Lantos

unread,
Aug 5, 2017, 12:31:09 PM8/5/17
to R-inla discussion group, paul....@gmail.com, he...@r-inla.org
It's now doing this unpredictably with other binary variables. I'm now getting this without having changed the model at all other than trying it without the "gender variable"
nicu0  -1.4892 18.2637   -37.3471  -1.4898    34.3387 -1.4892   0
nicu1  -1.1172 18.2644   -36.9763  -1.1177    34.7120 -1.1172   0

It is also doing this now - it does not seem to be fully completing some of the models. This dataset has around 19,000 points and I'm predicting the model onto 4000 points in a grid:

The first example is how it should look, but many of the models look like the second example.

Auto Generated Inline Image 1
Auto Generated Inline Image 2

Elias T. Krainski

unread,
Aug 5, 2017, 12:44:42 PM8/5/17
to r-inla-disc...@googlegroups.com
On 08/05/2017 01:31 PM, Paul Lantos wrote:
It's now doing this unpredictably with other binary variables. I'm now getting this without having changed the model at all other than trying it without the "gender variable"
nicu0  -1.4892 18.2637   -37.3471  -1.4898    34.3387 -1.4892   0
nicu1  -1.1172 18.2644   -36.9763  -1.1177    34.7120 -1.1172   0
Here you have a singular model. It is all about the contrast parametrization, not an INLA issue. Consider the following example:

 n = 10;
 dat = data.frame(ga=runif(n),
  race=factor(rbinom(n,1,.5)),
  gender=as.factor(rbinom(n,1,0.5)),
  b0=rep(1,n))
model.matrix(~ga+gender+race, dat)[1:2,]
model.matrix(~0+b0+ga+gender+race, dat)[1:2,]

In the second case, b0=gender0+gender1, so a linear combination leading to a singularity.

Elias

Paul Lantos

unread,
Aug 5, 2017, 4:44:51 PM8/5/17
to R-inla discussion group
This isn't a singular model - I've only pasted in this particular variable, but there are around 8 or 9 variables in the model.

As you can see from my earlier post this large SE for nicu was not the case when the model had dropped one of the levels.

Paul Lantos

unread,
Aug 6, 2017, 4:01:10 AM8/6/17
to R-inla discussion group
This problem seems broader: INLA is behaving in a very buggy way for me I think:

In the first model one of the binary variables (yellow) reports both levels, but the other binary variables (green) do not.

b0     -3.6553 18.2851   -39.5554  -3.6558    32.2145 -3.6552   0
bw     -0.0967  0.0870    -0.2690  -0.0962     0.0727 -0.0952   0
ga     -0.2030  0.0810    -0.3604  -0.2036    -0.0427 -0.2046   0
nicu0  -1.9052 18.2684   -37.7722  -1.9058    33.9318 -1.9052   0
nicu1  -1.7444 18.2707   -37.6161  -1.7449    34.0972 -1.7444   0
sti1    0.1151  0.1598    -0.2037   0.1168     0.4243  0.1202   0
parity  0.1661  0.0641     0.0374   0.1671     0.2892  0.1691   0
race1   0.9402  0.2343     0.4958   0.9344     1.4175  0.9227   0

This model is EXACTLY the same as the last one, but it's reporting bw, a continuous numeric variable, out like this.

Any idea what's happening here?

Fixed effects: mean sd 0.025quant 0.5quant 0.975quant mode kld b0 -50.9563 1.8694 -54.6782 -50.9388 -47.3338 -50.9036 0 bw-0.009863134 46.4362 1.9305 42.6835 46.4231 50.2591 46.3968 0 bw-0.030264237 46.7306 1.9178 43.0068 46.7161 50.5330 46.6871 0 bw-0.050665339 44.5188 2.1140 40.2870 44.5466 48.5945 44.6026 0 bw-0.071066442 46.4187 1.9305 42.6659 46.4056 50.2416 46.3793 0 bw-0.091467545 46.1417 1.9520 42.3390 46.1313 49.9985 46.1105 0 bw-0.111868647 46.7916 1.9088 43.0878 46.7764 50.5790 46.7458 0 bw-0.13226975 47.8266 1.8857 44.1728 47.8098 51.5740 47.7760 0 bw-0.152670853 46.1585 1.9520 42.3559 46.1481 50.0153 46.1273 0 bw-0.173071955 45.7362 1.9941 41.8313 45.7327 49.6574 45.7257 0 bw-0.193473058 46.2262 1.9301 42.4742 46.2130 50.0483 46.1868 0 bw-0.213874161 46.8210 1.9181 43.0967 46.8066 50.6240 46.7776 0 bw-0.234275263 46.7750 1.9179 43.0509 46.7605 50.5777 46.7316 0 bw-0.254676366 46.4789 1.9173 42.7562 46.4644 50.2803 46.4354 0 bw-0.275077469 46.6101 1.9175 42.8869 46.5957 50.4120 46.5667 0

Håvard Rue

unread,
Aug 6, 2017, 4:05:20 AM8/6/17
to Paul Lantos, R-inla discussion group
On Sun, 2017-08-06 at 01:01 -0700, Paul Lantos wrote:
> This problem seems broader: INLA is behaving in a very buggy way for
> me I think:
>
> In the first model one of the binary variables (yellow) reports both
> levels, but the other binary variables (green) do not.
>
> b0 -3.6553 18.2851 -39.5554 -3.6558 32.2145 -3.6552 0
> bw -0.0967 0.0870 -0.2690 -0.0962 0.0727 -0.0952 0
> ga -0.2030 0.0810 -0.3604 -0.2036 -0.0427 -0.2046 0
> nicu0 -1.9052 18.2684 -37.7722 -1.9058 33.9318 -1.9052 0
> nicu1 -1.7444 18.2707 -37.6161 -1.7449 34.0972 -1.7444 0
> sti1 0.1151 0.1598 -0.2037 0.1168 0.4243 0.1202 0
> parity 0.1661 0.0641 0.0374 0.1671 0.2892 0.1691 0
> race1 0.9402 0.2343 0.4958 0.9344 1.4175 0.9227 0
>
> This model is EXACTLY the same as the last one, but it's reporting
> bw, a continuous numeric variable, out like this.


maybe the input is numeric and not a factor? the parsing of this part
of the model is done using 'model.matrix()' and should be the same as
you'll get using 'lm()' unless you request a different behavour.




-
Håvard Rue <hr...@r-inla.org>
r-inla.org

Paul Lantos

unread,
Aug 6, 2017, 4:09:47 AM8/6/17
to Havard Rue, R-inla discussion group
Nope, the nicu variable is a factor just like the other ones I highlighted. And bw is numeric, so not sure why in one run of the model it gave all those estimates.

> str(data)
'data.frame':	19348 obs. of  24 variables:
 $ id        : int  2248 13062 551 561 322 86 1164 12850 1549 1638 ...
 $ long      : num  616695 616149 618039 619041 619044 ...
 $ lat       : num  249922 249673 240838 242066 241644 ...
 $ race      : Factor w/ 2 levels "0","1": 2 2 1 2 2 2 1 2 1 2 ...
 $ gender    : Factor w/ 2 levels "0","1": 1 2 1 1 1 2 1 1 2 1 ...
 $ ga        : int  37 37 38 38 40 37 34 36 36 36 ...
 $ bw        : num  2.31 2.34 2.53 3.26 3.76 3.21 1.68 2.27 2.33 2.56 ...
 $ nicu      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ bw        : num  -2.07 -2.009 -1.622 -0.132 0.888 ...
 $ parity : num 0.347 1.079 -0.385 0.347 -0.385 ... $ ga : num -1.646 -1.646 -0.925 -0.925 0.517 ...

INLA help

unread,
Aug 6, 2017, 4:10:57 AM8/6/17
to Paul Lantos, R-inla discussion group
> x=as.factor(runif(10))
> y=runif(10)
> summary(inla(y~x,data = data.frame(x,y)))

Call:
"inla(formula = y ~ x, data = data.frame(x, y))"

Time used:
Pre-processing Running inla Post-processing Total
0.1038 0.0461 0.0354 0.1852

Fixed effects:
mean sd 0.025quant 0.5quant
0.975quant mode
(Intercept) 0.7406
0.0161 0.7091 0.7406 0.7721 0.7406
x0.0784379874821752 -0.0006 0.0227 -0.0451 -0.0006 0.0440
-0.0006
x0.170674027875066 0.1360
0.0227 0.0915 0.1360 0.1807 0.1360
x0.270349313504994 0.2374
0.0227 0.1928 0.2373 0.2820 0.2374
x0.506421730387956 -0.3819 0.0227 -0.4264 -0.3819 -0.3373
-0.3819
x0.562381178839132 -0.7399 0.0227 -0.7844 -0.7399 -0.6953
-0.7399
x0.707229912979528 -0.5333 0.0227 -0.5778 -0.5333 -0.4887
-0.5333
x0.719403633149341 -0.2313 0.0227 -0.2758 -0.2313 -0.1867
-0.2313
x0.789780526421964 -0.6446 0.0227 -0.6891 -0.6446 -0.6000
-0.6446
x0.816360480384901 -0.2965 0.0227 -0.3410 -0.2965 -0.2519
-0.2965
--
Håvard Rue
he...@r-inla.org

INLA help

unread,
Aug 6, 2017, 4:12:33 AM8/6/17
to Paul Lantos, R-inla discussion group
data has two columns with name 'bw' ???



On Sat, 2017-08-05 at 22:09 -1000, Paul Lantos wrote:
--
Håvard Rue
he...@r-inla.org

Paul Lantos

unread,
Aug 6, 2017, 4:15:43 AM8/6/17
to he...@r-inla.org, R-inla discussion group
No, actually, one is named "bw_std" for standardized birth weight, that's the one used in the model, but in the INLA stack I call it bw. Just tried to clean up the str(data) for simplicity. This is what the whole thing looks like.

> str(data)
'data.frame':	19348 obs. of  24 variables:
 $ id        : int  2248 13062 551 561 322 86 1164 12850 1549 1638 ...
 $ long      : num  616695 616149 618039 619041 619044 ...
 $ lat       : num  249922 249673 240838 242066 241644 ...
 $ binrace   : Factor w/ 2 levels "0","1": 2 2 1 2 2 2 1 2 1 2 ...
 $ MF        : Factor w/ 2 levels "Female","Male": 1 2 1 1 1 2 1 1 2 1 ...
 $ gender    : Factor w/ 2 levels "0","1": 1 2 1 1 1 2 1 1 2 1 ...
 $ ga        : int  37 37 38 38 40 37 34 36 36 36 ...
 $ bw        : num  2.31 2.34 2.53 3.26 3.76 3.21 1.68 2.27 2.33 2.56 ...
 $ nicu      : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ parity    : int  2 3 1 2 1 5 2 2 1 1 ...
 $ sti       : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 2 1 ...
 $ As_Fail   : int  0 0 1 1 1 1 0 0 0 0 ...
 $ Random    : int  0 0 1 1 1 1 0 0 0 0 ...
 $ As_Pass   : int  0 0 0 0 0 1 0 0 0 0 ...
 $ PF_0      : int  0 0 2 2 2 1 0 0 0 0 ...
 $ CMV_TESTED: Factor w/ 2 levels "N","Y": 1 1 1 1 1 1 1 1 1 1 ...
 $ CMV_TEST_R: Factor w/ 3 levels "","Negative",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ ADI       : num  0.987 0.987 0.302 0.302 0.302 ...
 $ bw_std    : num  -2.07 -2.009 -1.622 -0.132 0.888 ...
 $ adi_std   : num  1.729 1.729 -0.475 -0.475 -0.475 ...
 $ parity_std: num  0.347 1.079 -0.385 0.347 -0.385 ...
 $ ga_std    : num  -1.646 -1.646 -0.925 -0.925 0.517 ...
 $ bgfips    : num  3.71e+11 3.71e+11 3.71e+11 3.71e+11 3.71e+11 ...
 $ popsqmi   : int  3397 3397 2125 2125 2125 2125 2125 2125 2125 2125 ...

INLA help

unread,
Aug 6, 2017, 4:17:05 AM8/6/17
to Paul Lantos, R-inla discussion group
On Sat, 2017-08-05 at 22:15 -1000, Paul Lantos wrote:
> No, actually, one is named "bw_std" for standardized birth weight,
> that's the one used in the model, but in the INLA stack I call it bw.
> Just tried to clean up the str(data) for simplicity. This is what the
> whole thing looks like.

can you check what lm() gives?
--
Håvard Rue
he...@r-inla.org

Paul Lantos

unread,
Aug 6, 2017, 4:20:43 AM8/6/17
to he...@r-inla.org, R-inla discussion group
lm(formula = Result ~ nicu, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.05792 -0.03457 -0.03457 -0.03457  0.96543 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.034575   0.001328  26.035   <2e-16 ***
nicu1       0.023340   0.011478   2.033    0.042 *  

INLA help

unread,
Aug 6, 2017, 4:26:41 AM8/6/17
to Paul Lantos, R-inla discussion group
On Sun, 2017-08-06 at 01:01 -0700, Paul Lantos wrote:
>
> Any idea what's happening here?

its interpreting the variable as a factor. if this is an error, can you
produce a simple reproducing example and we'll fix it.

H
--
Håvard Rue
he...@r-inla.org

Paul Lantos

unread,
Aug 6, 2017, 4:30:18 AM8/6/17
to he...@r-inla.org, R-inla discussion group
The problem is that when I run the models in INLA it does not drop a level, so I get estimates for nicu0 and nicu1 -- something that does not happen for my other binary variables. It did drop the level in lm() and it has in the past when I've run the model in INLA (for instance just running an estimation model with no prediction stack).

INLA help

unread,
Aug 6, 2017, 4:31:51 AM8/6/17
to Paul Lantos, R-inla discussion group
On Sat, 2017-08-05 at 22:30 -1000, Paul Lantos wrote:
> The problem is that when I run the models in INLA it does not drop a
> level, so I get estimates for nicu0 and nicu1 -- something that does
> not happen for my other binary variables. It did drop the level in
> lm() and it has in the past when I've run the model in INLA (for
> instance just running an estimation model with no prediction stack).

are you using

control.fixed = list(expand.factor.strategy = "inla")

?

--
Håvard Rue
he...@r-inla.org

INLA help

unread,
Aug 6, 2017, 4:33:34 AM8/6/17
to Paul Lantos, R-inla discussion group, Finn Lindgren
if you think this is an issue using 'stack', then try to make an
example to pass to Finn.

H

On Sat, 2017-08-05 at 22:30 -1000, Paul Lantos wrote:
--
Håvard Rue
he...@r-inla.org

Finn Lindgren

unread,
Aug 6, 2017, 4:38:56 AM8/6/17
to Paul Lantos, he...@r-inla.org, R-inla discussion group
When running with inla.stack, you also use expand.factor.strategy="inla", right?
Also see Elias post, which showed what happens when the intercept is a regular covariates (as it must be for inla.stack to operate properly).

Håvard, I think we need an expand.factor.strategy="inla0", meaning "drop the first level of every factor" (I think the current "inla" strategy does something different to that, that only works sensibly when there is a single factor in the model...).

Paul, you can have full control over the factor level handling if you convert the factors yourself into model matrices instead when supplying them to inla.stack. (Only works for factors not involved in an f() model)
Something like
  ..., effects=list(list(myfactor=model.matrix(~0+thefactor, data)[,-1],  ...)), ..


Finn
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Paul Lantos

unread,
Aug 6, 2017, 4:40:48 AM8/6/17
to he...@r-inla.org, R-inla discussion group, Finn Lindgren
 I have not used the expand.factor.strategy="inla" - should I?

Here are the estimation and prediction stacks. Even if there's a problem in the stack why would it treat different 2-level factor variables differently?

stk.est <- inla.stack(tag='estimation',
                      data=list(y=data$Result),
                      A=list(A.est, 1),
                      effects=list(s=1:spde2$n.spde,
                                   list(b0=1,
                                        ga=data$ga_std,
                                        race=data$binrace,
                                        gender=data$gender,
                                        bw=data$bw_std,
                                        nicu=data$nicu,
                                        parity=data$parity_std,
                                        sti=data$sti)))

stk.pred <- inla.stack(tag='prediction',
                       data=list(y=NA),
                       A=list(A.pred, 1),
                       effects=list(s=1:spde2$n.spde,
                                    list(b0=1,
                                         id=grid$id)))

stk.full <- inla.stack(stk.est, stk.pred)

INLA help

unread,
Aug 6, 2017, 4:42:42 AM8/6/17
to Finn Lindgren, Paul Lantos, R-inla discussion group
On Sun, 2017-08-06 at 09:38 +0100, Finn Lindgren wrote:
> When running with inla.stack, you also use
> expand.factor.strategy="inla", right?

this strategy (which is not the default), will not drop any levels of factors.
the default strategy (as in model.matrix()), will do that. the problem is that
the priors in that case is not invariant to the levels...

--
Håvard Rue
he...@r-inla.org

INLA help

unread,
Aug 6, 2017, 4:44:06 AM8/6/17
to Finn Lindgren, Paul Lantos, R-inla discussion group
one need proper priors everywhere with this strategy, and only
'contrasts' are likelihood indentifiable

On Sun, 2017-08-06 at 09:38 +0100, Finn Lindgren wrote:
Håvard Rue
he...@r-inla.org

Paul Lantos

unread,
Aug 6, 2017, 4:45:36 AM8/6/17
to Finn Lindgren, he...@r-inla.org, R-inla discussion group
Ok, thanks Finn.

Now, I had run 4 slight variations of this model in a row and got different problems in different ones (the only thing that differed was slight variations in the response variable) and I got different problems, including this continuous variable bw showing up like this in one model. I've suspected INLA is running out of memory, but not certain, and I don't understand why this would be the error produced.

Fixed effects:
                       mean      sd 0.025quant 0.5quant 0.975quant     mode kld
b0                 -50.9563  1.8694   -54.6782 -50.9388   -47.3338 -50.9036   0
bw-0.009863134      46.4362  1.9305    42.6835  46.4231    50.2591  46.3968   0
bw-0.030264237      46.7306  1.9178    43.0068  46.7161    50.5330  46.6871   0
bw-0.050665339      44.5188  2.1140    40.2870  44.5466    48.5945  44.6026   0
bw-0.071066442      46.4187  1.9305    42.6659  46.4056    50.2416  46.3793   0
bw-0.091467545      46.1417  1.9520    42.3390  46.1313    49.9985  46.1105   0
bw-0.111868647      46.7916  1.9088    43.0878  46.7764    50.5790  46.7458   0
bw-0.13226975       47.8266  1.8857    44.1728  47.8098    51.5740  47.7760   0
 [ reached getOption("max.print") -- omitted 200 rows ]



On Sat, Aug 5, 2017 at 10:38 PM, Finn Lindgren <finn.l...@gmail.com> wrote:
When running with inla.stack, you also use expand.factor.strategy="inla", right?
Also see Elias post, which showed what happens when the intercept is a regular covariates (as it must be for inla.stack to operate properly).

Håvard, I think we need an expand.factor.strategy="inla0", meaning "drop the first level of every factor" (I think the current "inla" strategy does something different to that, that only works sensibly when there is a single factor in the model...).

Paul, you can have full control over the factor level handling if you convert the factors yourself into model matrices instead when supplying them to inla.stack. (Only works for factors not involved in an f() model)
Something like
  ..., effects=list(list(myfactor=model.matrix(~0+thefactor, data)[,-1],  ...)), ..


Finn

On 6 Aug 2017, at 09:30, Paul Lantos <paul....@gmail.com> wrote:

The problem is that when I run the models in INLA it does not drop a level, so I get estimates for nicu0 and nicu1 -- something that does not happen for my other binary variables. It did drop the level in lm() and it has in the past when I've run the model in INLA (for instance just running an estimation model with no prediction stack).

On Sat, Aug 5, 2017 at 10:26 PM, INLA help <he...@r-inla.org> wrote:
On Sun, 2017-08-06 at 01:01 -0700, Paul Lantos wrote:
>
> Any idea what's happening here?

its interpreting the variable as a factor. if this is an error, can you
produce a simple reproducing example and we'll fix it.

H
--
Håvard Rue
he...@r-inla.org

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.

Paul Lantos

unread,
Aug 6, 2017, 4:48:55 AM8/6/17
to he...@r-inla.org, Finn Lindgren, R-inla discussion group
The default strategy then is what I'd want I believe, a single estimate for this particular variable. I have prior range / sigma specified in inla.spde2.pcmatern, but no other priors specified.

Paul Lantos

unread,
Aug 6, 2017, 4:56:30 AM8/6/17
to he...@r-inla.org, Finn Lindgren, R-inla discussion group
I just ran a simple test using the same stack, and a level was appropriately dropped in nicu:


> form1 <- y ~ b0 + nicu
> test <- inla(form1,
+                     data=inla.stack.data(stk.est), 
+                     family="binomial",
+                     control.predictor=list(A =inla.stack.A(stk.est), compute=TRUE),
+                     control.compute = list(dic = TRUE))


Fixed effects:
               mean     sd 0.025quant 0.5quant 0.975quant    mode kld
(Intercept) -0.4491 0.0113    -0.4712  -0.4491    -0.4270 -0.4491   0
b0           0.2073 0.0196     0.1689   0.2073     0.2457  0.2073   0
nicu1        0.0149 0.0656    -0.1140   0.0150     0.1434  0.0151   0

The model has no random effects

The model has no hyperparameters

Finn Lindgren

unread,
Aug 6, 2017, 4:59:26 AM8/6/17
to Paul Lantos, he...@r-inla.org, R-inla discussion group
Make sure not to store "bw" as a factor unless you want it to be a factor.

But I don't see any error? The max.print thing is just R not wanting to print everything. I think it's a recent R feature.

Finn

Paul Lantos

unread,
Aug 6, 2017, 5:02:38 AM8/6/17
to Finn Lindgren, he...@r-inla.org, R-inla discussion group
bw is stored as numeric, and in fact is estimated normally in many other renditions of the models. It's just that in this one, where nothing changed other than the response variable but bw was estimated like this.



To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsubscr...@googlegroups.com.

Paul Lantos

unread,
Aug 6, 2017, 5:03:51 AM8/6/17
to Finn Lindgren, he...@r-inla.org, R-inla discussion group
And yeah, there's no error per se, but clearly the model is behaving differently despite the variables and model syntax held constant.

Finn Lindgren

unread,
Aug 6, 2017, 5:05:14 AM8/6/17
to Paul Lantos, he...@r-inla.org, R-inla discussion group
The issue with factors and inla.stack is the handling of NAs. Even if there are no NAs in the input data, inla.stack necessarily by construction introduces NAs, and that can lead to issues later on inside inla(). From what I remember, inla() will abort or warn in at least some situations, telling the user to use expand.factor.strategy="inla". But as I explained in an earlier email, that doesn't quite do what you want when you have more than one factor.

So a safer method is to convert the factors yourself, as then you have complete control (even though the prior distribution is then not symmetric in the original levels). Or to use f(myfactor, model="iid", constr=TRUE), which keeps the symmetry, and makes the model non-singular by imposing a sum-to-zero constraint on the coefficients.

Finn

Finn Lindgren

unread,
Aug 6, 2017, 5:08:30 AM8/6/17
to Paul Lantos, he...@r-inla.org, R-inla discussion group
You said bw_std was sometimes called bw in the stack?
Inla is clearly receiving a factor. First check str() on you inla.stack input for bw.
Then check str(inla.stack.data(...)$bw)

Finn

Paul Lantos

unread,
Aug 6, 2017, 5:08:50 AM8/6/17
to Finn Lindgren, he...@r-inla.org, R-inla discussion group
So if I understand correctly in the latter case I'd be essentially giving a random effect to each level of the factor? Should I do that with every factor variable?

This still doesn't explain what happened with the bw variable, where in 3 versions of the model it was handled appropriately as a numeric variable but in 1 it was given all these levels.


Paul Lantos

unread,
Aug 6, 2017, 5:15:38 AM8/6/17
to Finn Lindgren, he...@r-inla.org, R-inla discussion group
Forgot to reply all here:

Yes, in my primary data I have bw and bw_std (for a centered/standardized version), but in the stack I specify that bw refers to bw_std. Both are numeric at any rate.

Just as context the dataset has 19,348 observations and no NAs in any variable. Which makes this a bit perplexing...


> dim(data) [1] 19348 24

> summary(data$nicu) 0 1 19089 259

> summary(data$bw_std) Min. 1st Qu. Median Mean 3rd Qu. Max. -4.57971 -0.64230 0.01054 0.00000 0.64297 4.07036

>
str(inla.stack.data(stk.est)$bw) num [1:12737] NA NA NA NA NA NA NA NA NA NA ...

> str(inla.stack.data(stk.est)$nicu) Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...

Finn Lindgren

unread,
Aug 6, 2017, 5:29:22 AM8/6/17
to Paul Lantos, he...@r-inla.org, R-inla discussion group
NAs in the inla.stack output are not perplexing. As explained in the "read this first" spde tutorial, the very purpose of inla.stack is to remove the need for the user to do tedious constructions adding NAs where needed to construct the desired A*eta predictors. (Which we've further encapsulated in inlabru.org, so that the user doesn't even need to call inla.stwck themselves...)

But it is perplexing that inla seems to make a factor out of a numeric. The inla.stack.data() output looks fine.  Is stack.est the stack used in the inla call, or do you join it with the prediction stack first? Make sure you check the one that's actually used in the inla() call.

Finn

Paul Lantos

unread,
Aug 6, 2017, 5:35:24 AM8/6/17
to Finn Lindgren, he...@r-inla.org, R-inla discussion group
Yes, I combine the estimation and prediction stacks first, for example:


model <- inla(formula,
                         data=inla.stack.data(stk.full),
                         family="binomial",
                         control.predictor=list(A =inla.stack.A(stk.full), compute=TRUE),

                         control.compute = list(dic = TRUE))

Finn Lindgren

unread,
Aug 6, 2017, 6:16:28 AM8/6/17
to Paul Lantos, he...@r-inla.org, R-inla discussion group
...and what is the str() result on the full stack?

Finn

Paul Lantos

unread,
Aug 6, 2017, 12:53:54 PM8/6/17
to Finn Lindgren, R-inla discussion group, he...@r-inla.org
For bw it is "chr", not "num". Which doesn't make sense, but also doesn't explain why in most but not all models it's being treated as numeric.

This is killing me, I just ran three models overnight, in which I put in f(term...iid, constr=TRUE) for all dichotomous variables. One model with just the estimation stack treated nicu correctly but this time treated gender incorrectly (not dropping the level). The other two models with the full stack treated bw incorrectly reporting hundreds of estimates out.

BTW I've restarted R and reloaded all the data many times in this process, starting fresh from a .csv with my data.

Finn Lindgren

unread,
Aug 6, 2017, 1:16:01 PM8/6/17
to Paul Lantos, r-inla-disc...@googlegroups.com, he...@r-inla.org
Aaaaah, the chr does ring a bell, and explains why it's treated as a factor. I recall some issues with setting up NA vectors when writing inla.stack, so I'll try to go through the code and see if there's a lingering bug I can identify.

Finn

Paul Lantos

unread,
Aug 6, 2017, 1:20:32 PM8/6/17
to Finn Lindgren, R-inla discussion group, he...@r-inla.org
Ok, thanks. I'm now trying an experiment, populating my prediction grid with columns of factor / 0 values for every factor variable and numeric / mean values for every numeric variable. I need to supply the grid with these anyway to do this in mgcv or in brms. At least I'll be making sure the grid and the data have the same variables.

Paul Lantos

unread,
Aug 6, 2017, 9:05:50 PM8/6/17
to Finn Lindgren, R-inla discussion group, he...@r-inla.org
Well, ran some new models, but still have one binary variable misbehaving, in this case it's gender not dropping the zero level. Why do you think this is happening?

Here's a look:

gender, NICU, sti, and race are all 2-level factor variables

Fixed effects:
           mean      sd 0.025quant 0.5quant 0.975quant    mode kld
b0      -2.9030 18.2577   -38.7492  -2.9035    32.9131 -2.9030   0
bw       0.0432  0.0510    -0.0575   0.0433     0.1428  0.0437   0
ga      -0.2025  0.0489    -0.2980  -0.2026    -0.1061 -0.2029   0
gender0 -1.4738 18.2576   -37.3196  -1.4743    34.3422 -1.4738   0
gender1 -1.3875 18.2576   -37.2333  -1.3880    34.4284 -1.3875   0
nicu1    0.1614  0.3295    -0.5314   0.1775     0.7648  0.2108   0
sti1     0.2942  0.0932     0.1099   0.2946     0.4758  0.2955   0
parity   0.0256  0.0417    -0.0574   0.0259     0.1065  0.0266   0
race1    0.8381  0.1276     0.5921   0.8365     1.0931  0.8332   0

Model hyperparameters: mean sd 0.025quant 0.5quant 0.975quant mode Range for s 8352.813 1.193e+04 1106.9747 4.850e+03 3.707e+04 2.350e+03 Stdev for s 0.217 5.620e-02 0.1217 2.124e-01 3.409e-01 2.047e-01 Precision for nicu 18089.470 1.997e+04 1263.7991 1.207e+04 7.093e+04 3.448e+03 Precision for sti 37896.276 2.177e+04 9865.7847 3.341e+04 9.234e+04 2.427e+04 Precision for race 20398.145 2.017e+04 922.8571 1.420e+04 7.338e+04 2.122e+03 Precision for gender 15693.031 2.028e+04 953.2671 9.495e+03 6.806e+04 2.515e+03

The INLA call:
Model <- inla(formula,

                         data=inla.stack.data(stk.full),
                         family="binomial",
                         control.predictor=list(A =inla.stack.A(stk.full), compute=TRUE),
                         control.compute = list(dic = TRUE))


The model input:
stk.est <- inla.stack(tag='estimation',
                      data=list(y=data$Random),

                      A=list(A.est, 1),
                      effects=list(s=1:spde2$n.spde,
                                   list(b0=1,
                                        ga=data$ga_std,
                                        race=data$binrace,
                                        gender=data$gender,
                                        bw=data$bw_std,
                                        nicu=data$nicu,
                                        parity=data$parity_std,
                                        sti=data$sti)))

stk.pred <- inla.stack(tag='prediction',
                       data=list(y=NA),
                       A=list(A.pred, 1),
                       effects=list(s=1:spde2$n.spde,
                                    list(b0=1,
                                         ga=grid$ga_std,
                                         race=grid$race,
                                         gender=grid$gender,
                                         bw=grid$bw_std,
                                         nicu=grid$nicu,
                                         parity=grid$parity_std,
                                         sti=grid$sti)))

stk.full <- inla.stack(stk.est, stk.pred)

formula  <- y ~ 0 + b0 + bw + ga + gender + nicu + sti + parity + race +
  f(s, model=spde2)+
  f(nicu, model="iid", constr=TRUE)+
  f(sti, model="iid", constr=TRUE)+
  f(race, model="iid", constr=TRUE)+
  f(gender, model="iid", constr=TRUE)

Finn Lindgren

unread,
Aug 7, 2017, 12:00:44 AM8/7/17
to Paul Lantos, R-inla discussion group, he...@r-inla.org
This was explained by Elias post; since model.matrix doesn't know that b0 is an intercept, it keeps the first level of the first covariate.

Håvard, which metod should be used to either drop the level, or to otherwise make he model identifiable?

Finn

Paul Lantos

unread,
Aug 7, 2017, 12:08:18 AM8/7/17
to Finn Lindgren, R-inla discussion group, he...@r-inla.org
But why just one of the factor variables?

Finn Lindgren

unread,
Aug 7, 2017, 12:20:18 AM8/7/17
to Paul Lantos, R-inla discussion group, he...@r-inla.org
The key is that it is the _first_ of the factor variables;
Identifiability is an agelong issue with factor models in general, since if each level for each factor has its own, independent and unconstrained, coefficient, any model with two factors would be nonidentifiable. In addition, any model with an intercept and a factor would also be nonidentifiable.

Several solutions to this are used in practice.

One is to impose constraints (see my earlier email for how to use that approach in inla). 

Another is to remove enough  coefficients to make the model identifiable, typically by removing the first level of all the factor covariates, in the case of a model with an intercept, or all but the first factor, in the case of a model with no intercept.
Since inla relies on a basic R model.matrix() that has no way of detecting that b0 is really an intercept, the first level of the first factor is kept in your model, even though it should more reasonably have been removed.
Removing b0 from the model should make it identifiable.

Finn

Paul Lantos

unread,
Aug 7, 2017, 12:58:58 AM8/7/17
to Finn Lindgren, R-inla discussion group, he...@r-inla.org
Ah, ok, thanks for the explanation.

Well, I can take out b0 and I suppose see how the model compares.

Another thought would be to put in some meaningless first dummy factor variable (ie a variable in which all data points are 0). Could that work?

Paul Lantos

unread,
Aug 7, 2017, 1:22:10 AM8/7/17
to Finn Lindgren, R-inla discussion group, he...@r-inla.org
Guess I can't do it with all 0 (it requires more than one level), but I did rbinom(n, 1, 0.5) to generate random 1s and 0s, which for a sample with 19,384 observations should really not affect the model. I put it first in the data stack and in the formula. Will see how that works.

INLA help

unread,
Aug 7, 2017, 4:42:12 AM8/7/17
to Finn Lindgren, Paul Lantos, R-inla discussion group
On Mon, 2017-08-07 at 05:00 +0100, Finn Lindgren wrote:
> This was explained by Elias post; since model.matrix doesn't know
> that b0 is an intercept, it keeps the first level of the first
> covariate.
>
> Håvard, which metod should be used to either drop the level, or to
> otherwise make he model identifiable?

its possible to code this manually, converting a factor into a numerical covariate, or a set of those.

the 'fixed effect' part in inla(), is treated in the same way as 'lm()' using the similar code. its possible that the argument
'contrasts' can be helpful here, but I have really not understood its use in lm() [its just passed in to model.matrix()]

Finn Lindgren

unread,
Aug 7, 2017, 7:26:24 AM8/7/17
to Paul Lantos, r-inla-disc...@googlegroups.com, he...@r-inla.org
Hi again Paul,

I've been unable to replicate the "chr" issue you had, and realised
that you'd given be the str() info for the stack.est and stack.full
output, but not for stack.pred. To save further back-and-forths, can
you please provide
1) the complete str() result for the whole "effects" input list for
each inla.stack() call (for stack.est and stack.pred)
2) the complete str() on each inla.stack.data() for stack.est,
stack.pred, and stack.full

Finn
--
Finn Lindgren
email: finn.l...@gmail.com

Elias T. Krainski

unread,
Aug 7, 2017, 9:28:05 AM8/7/17
to r-inla-disc...@googlegroups.com

On 08/07/2017 05:42 AM, INLA help wrote:
> its possible to code this manually, converting a factor into a numerical covariate, or a set of those.
>
> the 'fixed effect' part in inla(), is treated in the same way as 'lm()' using the similar code. its possible that the argument
> 'contrasts' can be helpful here, but I have really not understood its use in lm() [its just passed in to model.matrix()]

In consideration to that, I've created a small tutorial (attached) that
can be included in r-inla if you find it useful.
Elias
stack-with-factors.pdf

Paul Lantos

unread,
Aug 7, 2017, 7:34:59 PM8/7/17
to Elias T. Krainski, R-inla discussion group
Thank you, Elias, VERY helpful. Will let you know how it goes.



--
You received this message because you are subscribed to a topic in the Google Groups "R-inla discussion group" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/r-inla-discussion-group/iimQ1t1onE0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to r-inla-discussion-group+unsubscr...@googlegroups.com.
To post to this group, send an email to r-inla-discussion-group@googlegroups.com.

Paul Lantos

unread,
Aug 7, 2017, 7:49:01 PM8/7/17
to Finn Lindgren, r-inla-disc...@googlegroups.com, he...@r-inla.org
Sure, here you go. I don't see it as "chr" anymore except in the names.

> str(stk.est$effects)
List of 5
 $ data :'data.frame':	12737 obs. of  9 variables:
  ..$ s     : int [1:12737] 1 2 6 7 14 15 20 35 36 37 ...
  ..$ b0    : num [1:12737] NA NA NA NA NA NA NA NA NA NA ...
  ..$ ga    : num [1:12737] NA NA NA NA NA NA NA NA NA NA ...
  ..$ bw    : num [1:12737] NA NA NA NA NA NA NA NA NA NA ...
  ..$ parity: num [1:12737] NA NA NA NA NA NA NA NA NA NA ...
  ..$ race  : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
  ..$ gender: Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
  ..$ nicu  : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
  ..$ sti   : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...
 $ nrow : int 12737
 $ ncol : Named int [1:9] 1 1 1 1 1 1 1 1 1
  ..- attr(*, "names")= chr [1:9] "s" "b0" "ga" "race" ...
 $ names:List of 9
  ..$ s     : chr "s"
  ..$ b0    : chr "b0"
  ..$ ga    : chr "ga"
  ..$ race  : chr "race"
  ..$ gender: chr "gender"
  ..$ bw    : chr "bw"
  ..$ nicu  : chr "nicu"
  ..$ parity: chr "parity"
  ..$ sti   : chr "sti"
 $ index:List of 1
  ..$ estimation: int [1:19979] 1 2 NA NA NA 3 4 NA NA NA ...
 - attr(*, "class")= chr "inla.data.stack.info"

> str(inla.stack.data(stk.est)) List of 10 $ y : int [1:19348] 0 0 1 1 1 1 0 0 0 0 ... $ s : int [1:12737] 1 2 6 7 14 15 20 35 36 37 ... $ b0 : num [1:12737] NA NA NA NA NA NA NA NA NA NA ... $ ga : num [1:12737] NA NA NA NA NA NA NA NA NA NA ... $ race : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ... $ gender: Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ... $ bw : num [1:12737] NA NA NA NA NA NA NA NA NA NA ... $ nicu : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ... $ parity: num [1:12737] NA NA NA NA NA NA NA NA NA NA ... $ sti : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...

> str(inla.stack.data(stk.pred)) List of 10 $ y : logi [1:4100] NA NA NA NA NA NA ... $ s : int [1:447] 2 3 4 5 6 7 8 9 10 11 ... $ b0 : num [1:447] NA NA NA NA NA NA NA NA NA NA ... $ ga : num [1:447] NA NA NA NA NA NA NA NA NA NA ... $ race : Factor w/ 1 level "0": NA NA NA NA NA NA NA NA NA NA ... $ gender: Factor w/ 1 level "0": NA NA NA NA NA NA NA NA NA NA ... $ bw : num [1:447] NA NA NA NA NA NA NA NA NA NA ... $ nicu : Factor w/ 1 level "0": NA NA NA NA NA NA NA NA NA NA ... $ parity: num [1:447] NA NA NA NA NA NA NA NA NA NA ... $ sti : Factor w/ 1 level "0": NA NA NA NA NA NA NA NA NA NA ...

> str(inla.stack.data(stk.full)) List of 10 $ y : int [1:23448] 0 0 1 1 1 1 0 0 0 0 ... $ s : int [1:12846] 1 2 6 7 14 15 20 35 36 37 ... $ b0 : num [1:12846] NA NA NA NA NA NA NA NA NA NA ... $ ga : num [1:12846] NA NA NA NA NA NA NA NA NA NA ... $ race : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ... $ gender: Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ... $ bw : num [1:12846] NA NA NA NA NA NA NA NA NA NA ... $ nicu : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ... $ parity: num [1:12846] NA NA NA NA NA NA NA NA NA NA ... $ sti : Factor w/ 2 levels "0","1": NA NA NA NA NA NA NA NA NA NA ...

Paul Lantos

unread,
Sep 9, 2017, 6:35:22 PM9/9/17
to R-inla discussion group
Finn and Elias,
 I've restructured my data using the model.matrix, as suggested in Elias' tutorial. That part seems easy enough, but my model crashes INLA now. Thanks for any help.

*** ERROR *** 	INLA.Data1: Binomial data[2] (nb,y) = (1,2) is void


Error in inla.inlaprogram.has.crashed() : 
  The inla-program exited with an error. Unless you interupted it yourself, please rerun with verbose=TRUE and check the output carefully.
  If this does not help, please contact the developers at <he...@r-inla.org>.
In addition: Warning message:
running command '"C:/Users/pl39/Documents/R/win-library/3.4/INLA/bin/windows/64bit/inla.exe"  -b -s -v  "C:/Users/pl39/AppData/Local/Temp/Rtmpkd59hS/file1cb06c3b3b56/Model.ini"' had status 1













On Sunday, August 6, 2017 at 4:38:56 AM UTC-4, Finn Lindgren wrote:
When running with inla.stack, you also use expand.factor.strategy="inla", right?
Also see Elias post, which showed what happens when the intercept is a regular covariates (as it must be for inla.stack to operate properly).

Håvard, I think we need an expand.factor.strategy="inla0", meaning "drop the first level of every factor" (I think the current "inla" strategy does something different to that, that only works sensibly when there is a single factor in the model...).

Paul, you can have full control over the factor level handling if you convert the factors yourself into model matrices instead when supplying them to inla.stack. (Only works for factors not involved in an f() model)
Something like
  ..., effects=list(list(myfactor=model.matrix(~0+thefactor, data)[,-1],  ...)), ..


Finn

On 6 Aug 2017, at 09:30, Paul Lantos <paul....@gmail.com> wrote:

The problem is that when I run the models in INLA it does not drop a level, so I get estimates for nicu0 and nicu1 -- something that does not happen for my other binary variables. It did drop the level in lm() and it has in the past when I've run the model in INLA (for instance just running an estimation model with no prediction stack).
On Sat, Aug 5, 2017 at 10:26 PM, INLA help <he...@r-inla.org> wrote:
On Sun, 2017-08-06 at 01:01 -0700, Paul Lantos wrote:
>
> Any idea what's happening here?

its interpreting the variable as a factor. if this is an error, can you
produce a simple reproducing example and we'll fix it.
H
--
Håvard Rue
he...@r-inla.org

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.

Elias T. Krainski

unread,
Sep 10, 2017, 12:40:22 AM9/10/17
to r-inla-disc...@googlegroups.com

Paul,

It looks like a problem in the response, not in the covariates.

Elias

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

Helpdesk

unread,
Sep 10, 2017, 8:58:11 AM9/10/17
to Paul Lantos, R-inla discussion group
As Elias says, its an error in the data input. you say that you have
y=2 sucesses but only nb=1 trails (or size as in R)






On Sat, 2017-09-09 at 15:35 -0700, Paul Lantos wrote:
> Finn and Elias,
> I've restructured my data using the model.matrix, as suggested in
> Elias' tutorial. That part seems easy enough, but my model crashes
> INLA now. Thanks for any help.
>
> *** ERROR *** INLA.Data1: Binomial data[2] (nb,y) = (1,2) is
> void
>
>
> Error in inla.inlaprogram.has.crashed() :
> The inla-program exited with an error. Unless you interupted it
> yourself, please rerun with verbose=TRUE and check the output
> carefully.
> If this does not help, please contact the developers at <help@r-inl
Håvard Rue
Helpdesk
he...@r-inla.org

Paul Lantos

unread,
Sep 10, 2017, 11:41:12 AM9/10/17
to he...@r-inla.org, R-inla discussion group
Thanks, I reloaded the data and it runs fine now! I think the problem was I'd coded the y variable as a factor rather than as numeric.
Paul
Reply all
Reply to author
Forward
0 new messages