Error when adding factor variables into R-INLA stack

117 views
Skip to first unread message

Sylvan Benaksas

unread,
Oct 19, 2021, 4:38:02 AM10/19/21
to R-inla discussion group

Hello,

I am running a joint hurdle model with R-INLA, the model has two liklihoods, a binary part and a count part modelled with a truncated negative binomial distribution. There are 8 covariates, 2 factors and 6 numeric. The same covariates are used for both model components but with different column names, and so I need to join two INLA stacks, one for each likelihood. Each individual stack seems fine but the error occurs when I join them into one full stack, some numerical covariates become factors and some factor covariates are converted to characters


``` 

stk.z2 <- inla.stack(tag='est.z',

                     data=list(z=z, 

                               response=cbind(z, NA)),

                     A=list(A2, 1),

                     effects=list(ind2.occ,

                                  list(data=covars.z)))


  str(stk.z2)

 $ effects:List of 5

  ..$ data :'data.frame': 10287 obs. of  11 variables:

  .. ..$ s.z      : int [1:10287] 1 2 3 4 5 9 10 11 12 13 ...

  .. ..$ s.z.group: int [1:10287] 1 1 1 1 1 1 1 1 1 1 ...

  .. ..$ s.z.repl : int [1:10287] 1 1 1 1 1 1 1 1 1 1 ...

  .. ..$ sbt      : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sal      : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ npp      : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sand_per : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ silt_per : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ depth    : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ month    : Factor w/ 4 levels "6","7","8","9": NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ source   : Factor w/ 2 levels "bts","ibts3": NA NA NA NA NA NA NA NA NA NA ...


  

 stk.y2 <- inla.stack(tag='est.y',

                     data=list(r=y,

                               response=cbind(NA, y)), 

                     A=list(A2, 1),

                     effects=list(ind2.abun,

                                  list(data=covars.y)))


str(stk.y2)

$ effects:List of 5

  ..$ data :'data.frame': 10287 obs. of  11 variables:

  .. ..$ s.y       : int [1:10287] 1 2 3 4 5 9 10 11 12 13 ...

  .. ..$ s.y.group : int [1:10287] 1 1 1 1 1 1 1 1 1 1 ...

  .. ..$ s.y.repl  : int [1:10287] 1 1 1 1 1 1 1 1 1 1 ...

  .. ..$ depth11   : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sbt11     : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ npp11     : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sand_per11: num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ silt_per11: num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sal11     : num [1:10287] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ source11  : Factor w/ 2 levels "bts","ibts3": NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ month11   : Factor w/ 4 levels "6","7","8","9": NA NA NA NA NA NA NA NA NA NA ... 

``` 


both stacks seem fine but the error now occurs when joining them into one full stack


``` 

stk.zy2 <- inla.stack(stk.z2, stk.y2)


>stk.zy2 <- inla.stack(stk.z2, stk.y2)

Warning messages:

1: In `[<-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA,  :

  invalid factor level, NA generated

2: In `[<-.factor`(`*tmp*`, ri, value = c(NA, NA, NA, NA, NA, NA, NA,  :

  invalid factor level, NA generated

``` 

and the class of the ```silt_perc```  and ```depth``` have changed from num to chr, ```source11``` and ```month11``` have changed from factor to chr and ```silt_per11``` and ```sal11``` have changed from num to factor with 0 levls


``` 

str(stk.zy2)


 $ effects:List of 5

  ..$ data :'data.frame': 20574 obs. of  22 variables:

  .. ..$ s.z       : int [1:20574] 1 2 3 4 5 9 10 11 12 13 ...

  .. ..$ s.z.group : int [1:20574] 1 1 1 1 1 1 1 1 1 1 ...

  .. ..$ s.z.repl  : int [1:20574] 1 1 1 1 1 1 1 1 1 1 ...

  .. ..$ sbt       : num [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sal       : num [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ npp       : num [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sand_per  : num [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ silt_per  : chr [1:20574] NA NA NA NA ...

  .. ..$ depth     : chr [1:20574] NA NA NA NA ...

  .. ..$ month     : Factor w/ 4 levels "6","7","8","9": NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ source    : Factor w/ 2 levels "bts","ibts3": NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ s.y       : int [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ s.y.group : int [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ s.y.repl  : int [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ source11  : chr [1:20574] NA NA NA NA ...

  .. ..$ month11   : chr [1:20574] NA NA NA NA ...

  .. ..$ depth11   : num [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sbt11     : num [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ npp11     : num [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sand_per11: num [1:20574] NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ silt_per11: Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...

  .. ..$ sal11     : Factor w/ 0 levels: NA NA NA NA NA NA NA NA NA NA ...


``` 


can anyone offer any advice how to fix this?


Cheers,

Sylvan

Finn Lindgren

unread,
Oct 19, 2021, 5:24:55 AM10/19/21
to R-inla discussion group
Hi,

what R and package versions?  sessionInfo() or sessioninfo::session_info() needed.

I tried to replicate the problem, but couldn't trigger it.  Can you provide a small test case that has this issue? (e.g. a subset of the data using just a few rows and not all the columns)
(if you can't construct a small test case but are able to share the data with me, I may be able to debug it using the actual data, if you write a test code file that I can run here)

Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/3e7b1f47-cba8-4d51-a453-889f59995534n%40googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com

Sylvan Benaksas

unread,
Oct 19, 2021, 7:17:42 AM10/19/21
to R-inla discussion group
Hi Finn,

Thanks a lot for getting back to me, here is the session info and I have attached the data and example code, I hope that is what you need, thanks again for the help,

best wishes,
Sylvan

``` 

> sessioninfo::session_info()

- Session info ------------------------------------------------------------------------------------------------------

 setting  value                      

 version  R version 4.0.4 (2021-02-15)

 os       Windows 10 x64             

 system   x86_64, mingw32            

 ui       RStudio                     

 language (EN)                       

 collate  English_United Kingdom.1252

 ctype    English_United Kingdom.1252

 tz       Europe/London              

 date     2021-10-19                 

 

- Packages ----------------------------------------------------------------------------------------------------------

 package      * version  date       lib source       

 assertthat     0.2.1    2019-03-21 [1] CRAN (R 4.0.3)

 backports      1.2.1    2020-12-09 [1] CRAN (R 4.0.3)

 broom          0.7.5    2021-02-19 [1] CRAN (R 4.0.4)

 cellranger     1.1.0    2016-07-27 [1] CRAN (R 4.0.3)

 cli            3.0.1    2021-07-17 [1] CRAN (R 4.0.5)

 codetools      0.2-18   2020-11-04 [1] CRAN (R 4.0.3)

 colorspace     2.0-0    2020-11-11 [1] CRAN (R 4.0.3)

 crayon         1.4.1    2021-02-08 [1] CRAN (R 4.0.3)

 DBI            1.1.1    2021-01-15 [1] CRAN (R 4.0.3)

 dbplyr         2.1.0    2021-02-03 [1] CRAN (R 4.0.3)

 dplyr        * 1.0.4    2021-02-02 [1] CRAN (R 4.0.3)

 ellipsis       0.3.1    2020-05-15 [1] CRAN (R 4.0.3)

 fansi          0.4.2    2021-01-15 [1] CRAN (R 4.0.3)

 forcats      * 0.5.1    2021-01-27 [1] CRAN (R 4.0.3)

 foreach      * 1.5.1    2020-10-15 [1] CRAN (R 4.0.3)

 fs             1.5.0    2020-07-31 [1] CRAN (R 4.0.3)

 gdata          2.18.0   2017-06-06 [1] CRAN (R 4.0.3)

 generics       0.1.0    2020-10-31 [1] CRAN (R 4.0.3)

 ggplot2      * 3.3.3    2020-12-30 [1] CRAN (R 4.0.3)

 glue           1.4.2    2020-08-27 [1] CRAN (R 4.0.3)

 gtable         0.3.0    2019-03-25 [1] CRAN (R 4.0.3)

 gtools         3.8.2    2020-03-31 [1] CRAN (R 4.0.3)

 haven          2.3.1    2020-06-01 [1] CRAN (R 4.0.3)

 here           1.0.1    2020-12-13 [1] CRAN (R 4.0.4)

 hms            1.0.0    2021-01-13 [1] CRAN (R 4.0.3)

 httr           1.4.2    2020-07-20 [1] CRAN (R 4.0.3)

 INLA         * 21.02.23 2021-02-22 [1] local        

 iterators      1.0.13   2020-10-15 [1] CRAN (R 4.0.3)

 jpeg           0.1-8.1  2019-10-24 [1] CRAN (R 4.0.3)

 jsonlite       1.7.2    2020-12-09 [1] CRAN (R 4.0.3)

 lattice        0.20-41  2020-04-02 [1] CRAN (R 4.0.3)

 latticeExtra   0.6-29   2019-12-19 [1] CRAN (R 4.0.4)

 lifecycle      1.0.0    2021-02-15 [1] CRAN (R 4.0.4)

 lubridate      1.7.9.2  2020-11-13 [1] CRAN (R 4.0.3)

 magrittr       2.0.1    2020-11-17 [1] CRAN (R 4.0.3)

 Matrix       * 1.2-18   2019-11-27 [1] CRAN (R 4.0.3)

 MatrixModels   0.5-0    2021-03-02 [1] CRAN (R 4.0.4)

 mnormt         2.0.2    2020-09-01 [1] CRAN (R 4.0.3)

 modelr         0.1.8    2020-05-19 [1] CRAN (R 4.0.3)

 munsell        0.5.0    2018-06-12 [1] CRAN (R 4.0.3)

 nlme           3.1-152  2021-02-04 [1] CRAN (R 4.0.3)

 pillar         1.5.1    2021-03-05 [1] CRAN (R 4.0.4)

 pkgconfig      2.0.3    2019-09-22 [1] CRAN (R 4.0.3)

 png            0.1-7    2013-12-03 [1] CRAN (R 4.0.3)

 psych        * 2.1.6    2021-06-18 [1] CRAN (R 4.0.5)

 purrr        * 0.3.4    2020-04-17 [1] CRAN (R 4.0.3)

 R6             2.5.0    2020-10-28 [1] CRAN (R 4.0.3)

 raster         3.4-5    2020-11-14 [1] CRAN (R 4.0.4)

 RColorBrewer   1.1-2    2014-12-07 [1] CRAN (R 4.0.3)

 Rcpp           1.0.6    2021-01-15 [1] CRAN (R 4.0.3)

 readr        * 1.4.0    2020-10-05 [1] CRAN (R 4.0.3)

 readxl         1.3.1    2019-03-13 [1] CRAN (R 4.0.3)

 reprex         1.0.0    2021-01-27 [1] CRAN (R 4.0.3)

 rgdal          1.5-23   2021-02-03 [1] CRAN (R 4.0.4)

 rlang          0.4.10   2020-12-30 [1] CRAN (R 4.0.3)

 rprojroot      2.0.2    2020-11-15 [1] CRAN (R 4.0.3)

 rstudioapi     0.13     2020-11-12 [1] CRAN (R 4.0.3)

 rvest          0.3.6    2020-07-25 [1] CRAN (R 4.0.3)

 scales         1.1.1    2020-05-11 [1] CRAN (R 4.0.5)

 sessioninfo    1.1.1    2018-11-05 [1] CRAN (R 4.0.3)

 sp           * 1.4-5    2021-01-10 [1] CRAN (R 4.0.4)

 splancs        2.01-40  2017-04-16 [1] CRAN (R 4.0.4)

 stringi        1.5.3    2020-09-09 [1] CRAN (R 4.0.3)

 stringr      * 1.4.0    2019-02-10 [1] CRAN (R 4.0.3)

 tibble       * 3.1.0    2021-02-25 [1] CRAN (R 4.0.4)

 tidyr        * 1.1.3    2021-03-03 [1] CRAN (R 4.0.4)

 tidyselect     1.1.0    2020-05-11 [1] CRAN (R 4.0.3)

 tidyverse    * 1.3.0    2019-11-21 [1] CRAN (R 4.0.3)

 tinytex        0.29     2021-01-21 [1] CRAN (R 4.0.3)

 tmvnsim        1.0-2    2016-12-15 [1] CRAN (R 4.0.3)

 utf8           1.2.1    2021-03-12 [1] CRAN (R 4.0.4)

 vctrs          0.3.7    2021-03-29 [1] CRAN (R 4.0.4)

 viridisLite    0.4.0    2021-04-13 [1] CRAN (R 4.0.5)

 withr          2.4.1    2021-01-26 [1] CRAN (R 4.0.3)

 xfun           0.21     2021-02-10 [1] CRAN (R 4.0.3)

 xml2           1.3.2    2020-04-23 [1] CRAN (R 4.0.3)

 yaml           2.2.1    2020-02-01 [1] CRAN (R 4.0.3)

 

[1] C:/Users/xxxxxxxx/Documents/R/win-library/4.0

[2] C:/Program Files/R/R-4.0.4/library

example stack factor error code.R
ns sandeels env and sed clipped inla.csv

Finn Lindgren

unread,
Oct 19, 2021, 7:41:37 AM10/19/21
to R-inla discussion group
Great!

I can confirm that the same error happens on R4.1.1 on Linux, with the latest INLA version, so at least it's not a system-specific issue; makes it easier to debug.
I'll see if I can trace the problem.

Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

Finn Lindgren

unread,
Oct 19, 2021, 8:03:57 AM10/19/21
to R-inla discussion group
Hi Sylvan,

I think I've identified the issue in the code; the internals of inla.stack tries to keep track of which data columns are factors, but it seems part of the code expects name-based indexing but does logical indexing, making it tag the wrong columns as being factors!
Strange that noone seems to have run into this issue before, but it definitely looks like a bug in the INLA code. I'll figure out a fix.

As a temporary workaround, you can pre-convert the factors to character, and then post-convert back to factors (in your case that happens to keep the level orders etc intact; the difficulty with factors is otherwise to keep track of the levels when converting and merging):

stk.z2$effects$data$source <- as.character(stk.z2$effects$data$source)
stk.z2$effects$data$month <- as.character(stk.z2$effects$data$month)
stk.y2$effects$data$source11 <- as.character(stk.y2$effects$data$source11)
stk.y2$effects$data$month11 <- as.character(stk.y2$effects$data$month11)

stk.zy2 <- inla.stack(stk.z2, stk.y2)

stk.zy2$effects$data$source <- as.factor(stk.zy2$effects$data$source)
stk.zy2$effects$data$month <- as.factor(stk.zy2$effects$data$month)
stk.zy2$effects$data$source11 <- as.factor(stk.zy2$effects$data$source11)
stk.zy2$effects$data$month11 <- as.factor(stk.zy2$effects$data$month11)



Finn

Finn Lindgren

unread,
Oct 19, 2021, 10:18:23 AM10/19/21
to R-inla discussion group
Bugfix now implemented, and should be available from the next binary package update:

Sylvan Benaksas

unread,
Oct 22, 2021, 6:33:23 AM10/22/21
to R-inla discussion group
Hi Finn,

That work around has worked for me, thank you very much for all of the help,

Best wishes,
Sylvan
Reply all
Reply to author
Forward
0 new messages