prediction with smoother

937 views
Skip to first unread message

Boris Espinasse

unread,
Oct 27, 2021, 6:36:28 AM10/27/21
to R-inla discussion group

Hi all,

I am new with R-INLA. In the last few weeks I have developed a model in the North Atlantic including a term for spatial correlation via the definition of a mesh, mainly following instructions from Zuur's books. Then, I have done some predictions over a new grid system, by combining two stacks based on https://sites.stat.washington.edu/peter/591/INLA.htm.
For one of the covariates, I wanted to try using a smoother and so I followed Zuur's instruction to build smoothers based on mgcv and lincombs functions. But now, I am unsure how to process the new/extended version of this covariate so as to fit it into the 'predictor' stack.
Is it something like I should keep the number of knots and locations from smoother of the 'fit' version of the covariate and force it to the smoother of the 'predictor' version of the covariate?
I know that another way to do it would be use the rw2 smoother, but I am building in parallel a frequentist GAM model and like the possibility of specifying the number of knots,
Thanks
Boris

Finn Lindgren

unread,
Oct 27, 2021, 6:55:00 AM10/27/21
to R-inla discussion group
Hi,

in the Bayesian setting it makes much more sense to have a process prior (like rw2) than to mix it with the mgcv spline constructions.
But a more INLA-way of doing it than what the Zuur book seems to do (that book seems to have some odd choices; I haven't read it myself though) if you want more control over the discretisation than with rw2, would be to use a 1D spde model; with inla.mesh.1d() you can choose up to order 2 B-spline basis and choose knot locations, but then when you apply inla.spde2.pcmatern to it, you'll get a process that behaves more like a rw2 model (if you set the range to a large and fixed value), and only uses the basis functions for dimension reduction in the computations, similarly to the finite number of knots smoothers in mgcv (but with a proper continuous function distribution instead of just a spline penalty).

Smoother models where the result drastically changes when the knots are changed, like those with independent penalties on the coefficients, are problematic.  Penalties and process distributions that are based on the continuous properties of the functions are much more stable (e.g. penalising the integral of the squared second derivative, like the spline equivalent of the rw2 model does); in theory, you should be able to add knots until the results no longer change. When the results do change substantially, it means the discretised quasi-deterministic behaviour of the model dominates.
In practice, having too many knots can lead to numerical issues in the computations, but as long as the chosen knots allow the function to adapt properly to the posterior, is should be ok.

Finn


--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/3dcb7f25-a986-46dc-b1f7-381b0596df71n%40googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com

Boris Espinasse

unread,
Oct 27, 2021, 9:51:39 AM10/27/21
to R-inla discussion group
Hi Finn,

thanks for the rapid answer.

Yes I understand that using a random walk would be more straightforward with INLA, it's just that for people like me who have been working with mgcv in the past, it's comforting to use mgcv smoother with number of knots etc, but in the end it is quite tedious to adapt them to INLA so it might not be worth it. I'll probably stick to GLM then, that I have a better idea/control of what I am doing.

Best
Boris

Finn Lindgren

unread,
Oct 27, 2021, 11:03:17 AM10/27/21
to Boris Espinasse, R-inla discussion group
Hi,

from what I can tell, it's the weird approaches taken in the Zuur book that are confusing as they seem to mix different concepts in a way that requires much more coding effort than more conventional approaches to Bayesian GAMs; a GAM is a model, not an inference framework; GAMS are neither inherently frequentist nor Bayesian. The main differences in mgcv and INLA are in what types of smoothers are easily available, and the Bayesian hierarchy added by INLA.
For spatial models, I would highly recommend taking a look at the inlabru package (see examples here https://inlabru-org.github.io/inlabru/articles/ that are focused on distance sampling and related problems, but the 2D field example shows more of the general structure) that takes away nearly all the messy inla.stack etc coding needed with plain INLA.

Finn

Boris Espinasse

unread,
Oct 28, 2021, 4:10:21 AM10/28/21
to R-inla discussion group
Hi,
The thing is without this kind of books, as an ecologist, it would be too difficult and too time consuming to get to run package as INLA in a correct way. But I didn't know about inlabru, I'll take a look a that, thanks for the tip

Boris

high...@highstat.com

unread,
Oct 28, 2021, 10:10:03 AM10/28/21
to R-inla discussion group

Hi Finn and Boris,

Finn......I can understand that our choice to use mgcv smoothers in INLA may look weird...but what you are missing is the general picture behind it.

The ecological data sets that we use in the book are more complicated than the classical textbook examples. The data sets that we use contain multiple collinear covariates (which all have non-linear effects + interactions) and with plenty of zeros in the response variable and also spatial (-temporal) dependency. I call those 'shitty data sets'.

Using either INLA or mgcv, it is easy to end up with a model in which the smoother(s), spatial correlation and the zero-inflation component of the statistical distribution all fight for the same information (e.g. zeros at sites close to each other, with similar covariate conditions). In such a situation, you quite often end up with a smoother that contains all kinds of bumps that do not make any ecological sense. One strategy to deal with this problem is to force the smoother to be more simple (in mgcv language..stick to smoothers with not more than 4 or 6 df). If you are lucky, you then up with patterns that do make ecological sense (e.g. plateau type relationships). But obviously, sometimes you need more variation in the smoothers.

So.....our approach in the book is to borrow the function `smoothCon` from mgcv, obtain the basis functions of a simple smoother (e.g. an unpenalised cubic regression spline) at a limited number of knots, and write the smoother as X * beta. It then becomes a linear regression or a GLM. The main idea is to extract a signal that is simple and easy to comprehend.

I do agree that the coding becomes slightly cumbersome....but at the end of the day, it forces you to know what you are doing. And yes, there are pitfalls (like knot positions and the number of knots).

I also agree that if rw1/rw2 gives you a decent working model in INLA, then that is considerably more convenient than going for unpenalised smoothers borrowed from mgcv. But for the shitty data sets that we use in the book, rw1/rw2 turned out to be a (too) big battle with priors. From an ecological point of view, it is much easier to force the smoothers to be simple and easy to understand.

Boris..as to your specific question....here is some example code.  I want a smoothing function of depth. The first block of code uses smoothCon from mgcv to obtain the basis functions.


# Depth smootherLego.Depth <- smoothCon(s(Depth, bs = "tp", k = 6, fx = TRUE),
                         data = EU2,   absorb.cons = TRUE)[[1]]

# Extract the X matrices
XDepth  <- Lego.Depth$X  #X for the Depth smoother


The XDepth contains the basis functions. It is just a set of artificial covariates that forms the backbone of the smoother. You let INLA (or lm, or glm, lmer or glmmTMB) estimate the corresponding regression parameters. One option to plot the smoother in INLA is inla.make.lincombs.

If you want to do what you asked in your original post, then you have to follow the same steps that the predict function from mgcv is doing. First, you have to make up the values of Depth for which you want to predict.

MyDataPred <- data.frame(
             Depth = seq(from   = min(EU2$Depth),
                         to     = max(EU2$Depth),
                         length = 100),
              SweptArea = mean(EU2$SweptArea))

Obviously, you can add more covariates. You now need to get the basis functions for these specific depth values. We can borrow again tools from mgcv:


# Basis function for the 100 artificial depth values.
XpDepth <- PredictMat(Lego.Depth, MyDataPred)

This XpDepth can go into the second stack (with NA for the response variable)....and you rerun the model (after combining the two stacks). If you get stuck, feel free to drop me an email.


To come back to Finn's comments...it all comes down to how you want to use a smoother. Do you allow it to do whatever it wants, or do want to force it to be something simple? And if you opt for the latter approach, how do you want to do that? Do you do it via priors on rw1/rw2 terms in INLA, increasing the penalty for non-smoothness in mgcv, limiting the number of knots in mgcv/INLA, or specifying the amount of smoothing a priori in mgcv? I essentially adopted the old S-PLUS approach and use mgcv smoothers with 4 or 5 df. And if my residuals show a pattern I will consider increasing it.

Feel free to disagree with this approach...but I call it pragmatic for shitty data sets.

Kind regards,

Alain

Finn Lindgren

unread,
Oct 28, 2021, 11:13:11 AM10/28/21
to R-inla discussion group
Hi Alain,

thanks for the explanation!

So it's essentially used as a dimension reduction/pre-smoothing technique and/or constructed covariate approach, which to me makes it easier to interpret.  Models of similar structure, also with X*beta in the predictor for some externally defined X came up on the list recently, and it's on my TODO-list to add a more streamlined interface for that to inlabru; currently though one can use X %*% beta explicitly in the inlabru predictor formula for this.  The challenges are more in being able to automatically compute the needed X matrix for posterior prediction, which will be easier if the basis functions can be obtained within the INLA R code (such as via the inla.mesh.1d() interface).

Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.

high...@highstat.com

unread,
Oct 28, 2021, 12:15:13 PM10/28/21
to R-inla discussion group
Hello Finn....yes I think so. I personally think that GAMs (or smoothing techniques) are heavily misused in the literature. It is easy to end up with funny-looking shapes that are difficult to interpret.

Having said that.....I do think that they have merit (e.g. for estimating long-term trends, or plateau-type relationships)....if you use them with care. 

Keep up the good INLA work...highly appreciated.

Alain

Boris Espinasse

unread,
Oct 29, 2021, 7:58:54 AM10/29/21
to R-inla discussion group
Hi Alain,

thanks for joining the discussion, and for the piece of code, it works well. 
Yes, for me to keep a control on the wiggliness of the smoothers is essential, especially because I want to use the model to do prediction based on new covariates dataset, so I don't want the model to be too specific to my original dataset but more to capture the main factors of variations,

Best
Boris
Reply all
Reply to author
Forward
0 new messages