How to define covariates with temporal variations in inlabru?

432 views
Skip to first unread message

Michel Valette

unread,
Sep 10, 2021, 6:25:50 AM9/10/21
to R-inla discussion group
Hello,

I am trying to model the distribution of points data, according to years and a couple of covariates. I started to use inlabru as it generally looked way easier to run lgcp. My model for one year and all covariate, or basic SPDE model for multiples years are working well, but I don’t know how to include covariate which are changing among years.

First, I’ve selected data for one year and include the different covariables using personalized function, as in https://inlabru-org.github.io/inlabru/articles/web/2d_lgcp_covars.html. Per example, this function is looking at the percentage of deforestation around each observation points (I’m creating 1km buffer around the points because it corresponds to the precisions of my satellite precision).

f.defo<-function(x,y){
     # Turn the coordinates into sptial point object with appropriate crs
    
spp <- SpatialPoints(data.frame(x = x, y = y), proj4string = fm_sp_get_crs(defo2))
    
proj4string(spp) <- fm_sp_get_crs(defo2)
    
sf_spp<-st_as_sf(spp)

    # Extract the proportion of pa around one point (I do it using sf pakage)
    a<-st_buffer(sf_spp,1000)
    b<-st_intersection(a,defo)
   
if(nrow(b)==0){
      return(0) }
   else{b<-st_union(b)
     
return<-as.numeric(st_area(b))/as.numeric(st_area(a))}}

Then I’m fitting the model and the results look fine!

cmp<-coordinates~mySPDE(main=coordinates,model=matern)+
               settlement(f.settlement(x,y),model="linear")

The issue come when I run the model for several years.  I haven’t succeed to identified a way to rewrite the f.defo function so for every year, it will subset my data and only return the value of deforestation events happening the same year than the point was recorde. I think the closer solution that I though was this  

f.defo<-function(x,y,year){
      # Subset defo dataset based on year argument in the lgcp formula
      defo_temp<-subset(defo,year==year)

     # Turn the coordinates into sptial point object with appropriate crs
    
spp <- SpatialPoints(data.frame(x = x, y = y), proj4string = fm_sp_get_crs(defo2))
    
proj4string(spp) <- fm_sp_get_crs(defo2)
    
sf_spp<-st_as_sf(spp)

    # Extract the proportion of pa around one point (I do it using sf pakage)
    a<-st_buffer(sf_spp,1000)
    b<-st_intersection(a,defo)
   
if(nrow(b)==0){
      return(0) }
   else{b<-st_union(b)
     
return<-as.numeric(st_area(b))/as.numeric(st_area(a))}}

Then I am creating the formula of the model, including a year argument compare to before.

cmp <- coordinates+year ~ defo(f.defo(x,y,year),model="linear")+   mySPDE(main=coordinates, model = matern, group = year, ngroup = nyear, control.group=list(model="ar1"))

 Finally I'm fitting the model (with ips havong a year argument)

fit <- lgcp(cmp, df,ips,domain=list(coordinates=mesh,year=df$year))

However, when I return the summary.fixed of the model, value of intercept and defo are the same (and similar issue with other types of variables), indicating that my model is not working as expected. I think it’s an issue with the way I’m calling year into the function but I couldn’t find an example online on how to do this.

If you could give me a hint on how to use temporal index in the covariate definition function to have a spatio-temporal model with covariate, that would be great! I seem to don't wrap my head aroudn the way group argument are used in the inlabru argument.

Thanks in advance!

 

Finn Lindgren

unread,
Sep 12, 2021, 8:11:57 AM9/12/21
to Michel Valette, R-inla discussion group
Hi,

I think you almost got it right, but that your function as written only handles a single year in each call.

For inlabru to use the function properly, it needs to be vectorised, so that
  fun(x,y,year)
allows x, y, and year to be vectors, and returns a vector v of elements v_i, where
  v_i = fun(x[i], y[i], year[i])
that is, one value for each row of data.frame(x, y, year)

I can't quite tell if your code is vectorised with respect to x and y, but it's definitely not vectorised with respect to year, since it seems to assume that "year" is a scalar instead of a vector.
How to write the vectorised version depends on how your data is stored. inlabru has built-in methods for the case of gridded spaces, with values for each year stored as a separate column (using the main_layer and main_selector arguments to the component), in  a SpatialPointsDataFrame or SpatialGridDataFrame, but for other cases you need to write the vectorised method yourself.

Which inlabru version do you have? If your code would give an error when given a year vector it's meant to complain in recent versions, but it doesn't always detect problems (in particular not when they run but produce unintended output).

Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/42e4ff10-2611-407a-9f0d-30ecb6a45332n%40googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com

Michel Valette

unread,
Sep 14, 2021, 7:49:01 AM9/14/21
to R-inla discussion group

Hello Finn and thank you for the response,


Now I see my mistake, I wasn’t sure how the function was working internally, but now I understand that lgcp ‘feed” the function with a vector with all x, y and years, make sense why my code wasn’t working…

I succeeded to rewrite my function so it returns a vector of the value, and it seems to be working (I’ve got different summary for the fixed effect, now I’ll include other covariance and see further results).  Sorry for the syntax, this function is quite horribly written, and I have relied on the use of the sf package as I’m most familiar with it.

f.settlement<-function(x,y,year){
    spp <- SpatialPoints(data.frame(x = x, y = y), proj4string = fm_sp_get_crs(settlement2))
    spp$year<-year2
    proj4string(spp) <- fm_sp_get_crs(settlement2)
    sf_spp<-st_as_sf(spp)

  # create the buffer and index them

  a<-st_buffer(sf_spp,1000)
total_area<-as.numeric(st_area(a[1,]))
 a$index<-1:nrow(a)

  # Intersection of different objects with the year
    b<-st_intersection(a,settlement_total[,'year'==a$year])
    b$area<-as.numeric(st_area(b))
    b<-st_drop_geometry(b)
    a<-st_drop_geometry(a)
    b<-b%>%
          group_by(index,year)%>%
          summarise(total_area=sum(area))

  # Create a loop to associate settlement of each buffer zone to variable
    ibis=1
    for (i in 1:nrow(a)){
            if(any(b[,'index']==a[i,"index"])){
                 a[i,"settlement"]<-b[ibis,"total_area"]/total_area
                 ibis=ibis+1

           }else{
 a[i,"settlement"]<-0}}
return(as.vector(a$settlement))}

 


By any chance, do you have an example of a generic function to catch the covariance value in a spatio-temporal model such as the one we can find in the  covariance tutorial (https://inlabru-org.github.io/inlabru/articles/web/2d_lgcp_covars.html)? I think it might be helpful for people starting with inlabru.

Concerning the output I haven’t got any error message, even if my function was not working. However, I could see that all fixed effects have almost similar impact, while when specified properly we have different coefficient. I’m using the version 2.3.1 of inlabru.

Also I wanted to thank you for the amazing work on the inlabru package, it does really make points process model way easier.

Michel

Finn Lindgren

unread,
Sep 15, 2021, 5:03:07 PM9/15/21
to Michel Valette, R-inla discussion group
Hi Michel,

I think the development version may be a bit better at detecting potential input problems (version 2.3.1.9000, the devel branch on github).

When the inlabru examples were written, I don't think we had any spatio-temporally varying covariates, and there isn't necessarily a standard format for how such covariates are stored; how to write the function will depend quite a bit on the storage details.

In your new code, I'm not sure how to interpret what it tries to do, in particular the line
   b<-st_intersection(a,settlement_total[,'year'==a$year])
which in normal syntax wouldn't really make sense, since a$year would be numeric, and 'year' simply the literal string "year", so I'm guessing this is some special syntax (perhaps for data.table ? )
If that's the case, I'm guessing that it's trying to extract data from a different column for each value of a$year?  That looks similar to what we have implemented for SpatialGridDataFrame objects, where
  main_layer=year
would use the year variable to select the corresponding data column for each row of the spatial data grid. (In practice one would need something like main_layer=year-year_before_first to get the indices to start at 1).
If this interpretation of the code is correct, then please let me know what the storage class you have is, and we may be able to add direct support for it.

Finn


Michel Valette

unread,
Sep 16, 2021, 5:32:55 AM9/16/21
to R-inla discussion group

Dear Finn,

I know my case is quite specific and I’ve spent some time to process my data before starting the modelling so it’s probably hard to generalize from other cases. Anyhow, thank you for the patience and helpful answer.

I’ve tried to use the development versions but doesn’t display warning message if I have an issue with my inputs (but it can still be seen by the similarity of the intercept with coefficient of problematic variables).

The purpose of    b<-st_intersection(a,settlement_total[,'year'==a$year]) is to get the overlap between the object a (in my cases buffers zones of 1 km around the point indicated by the coordinates) and the settlement for the year given in the argument (selecting the different rows for this year, each row corresponding to a polygon). This is using a simple feature object, implemented by the sf package. This package contains more or less the same feature as the sp package, but the syntax and data structure is slightly different. However it’s easy to convert the object from one type to another.

Thanks to your explanation, I’ve been able to create function for a couple of covariate (by extracting value from raster, calculating coverage in a 1 km radius and determining distance between points and lines features).

Best regards,

Michel

Charles Cunningham

unread,
Aug 24, 2022, 7:01:51 AM8/24/22
to R-inla discussion group
Dear Finn,

I have a few quick queries related to this discussion so will reply here rather than starting another thread.

I have the same motivation as the OP - to include a temporally (and spatially) varying covariate within a spatio-temporal inlabru model. It seems there are 2 ways to do this - either by defining a function fun(x,y,year) as the OP has done, or through using the built in component arguments (main_layer/main_selector). I'm a little unclear on both of these.

Function approach - are there any spatio-temporal examples for how to do this for standard data types? The function used by the OP may work well, but unsure of the format of certain objects i.e. 'year2', so it isn't helpful in understanding how an f(x,y,time) function should be constructed.

Component arguments - from this thread it seems like this may be more straightforward/reliable for standard gridded data formats anyway, i.e. SpatialGridDataFrame. However I'm not sure what the difference is between the main_layer and main_selector arguments, and which formats are considered "standard" for these to work. From the Reference manual it is not clear to me, apologies. There are also layer and selector arguments within the bru_fill_missing function and I'm not sure how to interpret these either, presumably they operate in a very similar way.

E.g. Say you have a SpatialGridDataFrame containing 3 covariate data layers with arbitrary names - one for each discrete time step - you wish to include within your model (covarSGDF). You also have a SpatialPointsDataFrame containing data observations in each of the 3 time steps that we wish to model, with a column "year" containing a numeric index (1-3) of which each time step the observations correspond to. Would either of the following be the correct argument usage? If not, how should the data format / arguments be changed. Should "year" be a separate index vector?

covar(main = covarSGDF, main_selector = "year", model = "linear") or
covar(main = covarSGDF, main_layer = "year", model = "linear")

Any further information on exactly how the layer and selector are working would be massively appreciated!

Best wishes,
Charles

Finn Lindgren

unread,
Aug 24, 2022, 8:33:13 AM8/24/22
to R-inla discussion group
Hi Charles,

there are too many potential data storage formats for space-time data
to give clear answers for the general case; the function method leaves
the data extraction details to the user who hopefully knows their data
best.
The first of the two versions with SpatialGrid DataFrame below is how
it was supposed to work, but I think I made a logic mistake when
implementing it. More comments below.

> E.g. Say you have a SpatialGridDataFrame containing 3 covariate data layers with arbitrary names - one for each discrete time step - you wish to include within your model (covarSGDF). You also have SpatialPointsDataFrame containing data observations in each of the 3 time steps that we wish to model, with a column "year" containing a numeric index (1-3) of which each time step the observations correspond to. Would either of the following be the correct argument usage? If not, how should the data format / arguments be changed. Should "year" be a separate index vector?

covar(main = covarSGDF, main_selector = "year", model = "linear") or

If I read the inlabru code correctly, it tries to find the selector in
the covarSGDF object, but that doesn't really make sense. It would
make more sense for it to pick the 1,2,3 year information from the
general data object, so it can get a different year value for each
observation. I'll need to fix that bug! In the meantime you'll need to
use the function method. See the utils.R code in inlabru, function
eval_SpatialDF for possible inspiration.

covar(main = covarSGDF, main_layer = "year", model = "linear")

THis would just extract the year information, so is not what you need.

Finn
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/ace69a48-a58b-4ae8-a241-164dc55b765en%40googlegroups.com.

Finn Lindgren

unread,
Aug 26, 2022, 5:20:14 PM8/26/22
to R-inla discussion group
Hi again Charles,

I've double checked the inlabru internals, and I was wrong about there
being a bug; the feature is actually implemented as intended!
What you need is this component specification:

covar(main = covarSGDF, main_selector = "year", model = "linear")

For this to work, the data argument to like() must have include a
column/variable called "year", containing either numerical or
character value that can be used to index the layers/columns of
covarSGDF.

Finn

berlinho lionel

unread,
Sep 3, 2022, 5:07:29 AM9/3/22
to Finn Lindgren, R-inla discussion group
We also deal and specialize in helping you to get high quality Biometric documents, registered PASSPORT, DRIVING LICENSE, ID CARD, VISA, SSN TOEFL, IELTS, IDP, ESOL, GMAT CELTA/DELTA, DEGREE, DIPLOMAS & other English Language Certificates,We have High Quality 100% Undetectable Grade AA+ Counterfeit Banknotes For Sale WhatsApp:( +1(315)702-6894 )

Grant Humphries

unread,
Oct 13, 2022, 7:06:14 AM10/13/22
to R-inla discussion group
This is really useful.  How might this work though if you've got a parameter like bathymetry. I.E.,  let's say you've done one survey per month over 4 months, and you want to incorporate a static parameter like bathymetry with data that have a temporal component? 

Finn Lindgren

unread,
Oct 13, 2022, 8:14:04 AM10/13/22
to Grant Humphries, R-inla discussion group
It depends a bit on the observation/likelihood setting. For
point-referenced data (counts at specific locations, and similar), you
can either use the
covar(covar_object, model=...)
or
covar(covar_object, main_layer = "bathymetry", model=...)
approach (the first one assumes you want the first layer, or one
called "covar", like the component name). The latter is equivalent to
bathymetry(covar_object, model=...)
Or store the covariate values in the data object together with the
other measurements.
For point process models, the covariate needs to be available
everywhere, and then you need the first approach, since you don't know
where the covariate will need to be evaluated by the likelihood
approximation construction.
Whether time is involved or not just affects whether you need to
specify additional information to extract different columns for
different time points.
In the inlabru devel version, the selector argument is'nt needed as
much anymore, as we've expanded the main_layer argument to take a
vector from the input data, like time:
covar(covar_object, main_selector="time", model=...)
can now be written as
covar(covar_object, main_layer=time, model=...)
instead. The internal system can also be accessed directly, like for example
covar(eval_spatial(covar_object, .data, layer = "bathymetry"), model=...)
or
covar(eval_spatial(covar_object, .data, layer = time), model=...)

A new vignette (still under construction, but mostly done) discusses
more details on component construction:
https://inlabru-org.github.io/inlabru/articles/components_vignette.html
(the name&link may change in the future)
Finn
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/cf58d8e4-5ac8-4773-a228-3ea35b2d2e96n%40googlegroups.com.

Charles Cunningham

unread,
Nov 16, 2023, 7:26:44 AM11/16/23
to R-inla discussion group
Just to flag to the group that there currently an issue with extracting covariate information from different layers of multi-layer terra spatRasters. This happens when using '_layer' and '_selector' arguments of eval_spatial, e.g. needed if you are fitting a spatio-temporal model. This produces an error message like: "Error in data.frame(e[, 1, drop = FALSE], names(e)[idx[, 2] - 1], value = e[idx]) :  arguments imply differing number of rows: 211, 162"

This is a known issue with terra and has been fixed in the latest version of terra (>1.7-60). For now this isn't on CRAN, so can install using:
remotes::install_github("rspatial/terra") to install version 1.7-60

Thanks for solving, Finn!

Cheers,
Charles
Reply all
Reply to author
Forward
0 new messages