Gradients calculations in R-INLA?

96 views
Skip to first unread message

Adam Howes

unread,
Oct 23, 2023, 5:49:42 AM10/23/23
to R-inla discussion group
Hi all,

Would you be able to tell me about how first and second derivatives are obtained in R-INLA? That is:

1) The first derivatives used to perform gradient-based optimisation to find modes.
2) The second derivatives used to calculate Hessian matrices.

Based on the logs, I know that central finite differences are used. Is this for both the first and second derivatives? Is there a way in which the set of models specified in R-INLA are utilised to (efficiently) precalculate any aspect of the derivatives? Were there particular design criteria you had for the R-INLA software as to why numerical methods were preferred to automatic differentiation (like TMB say)?

Apologies if I'm missing something in the documentation about these details!

Best wishes,
Adam




Helpdesk (Haavard Rue)

unread,
Oct 23, 2023, 6:05:33 AM10/23/23
to Adam Howes, R-inla discussion group, Esmail Abdul Fattah
Hi,

the gradients and hessian are computed like described here

https://www.aimsciences.org/article/doi/10.3934/fods.2021037

using finite differences. this 'smart gradients' really just work.

finite differences parallelise nicely as discussed here


https://www.aimsciences.org/article/doi/10.3934/fods.2021037


and the number of hyperparmeters are moderate so it is ok, and allow us
to have good parallel performance.

we're also using, extensively, external sparse matrix libraries, which
needs to be adapted to AD if that is the goal, which you can read about
on STAN blogs how straight forward that is.

hope this helps.
H
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/cb7bb3c1-5690-43c7-acef-df722351304cn%40googlegroups.com
> .

--
Håvard Rue
he...@r-inla.org

Adam Howes

unread,
Oct 23, 2023, 6:45:59 AM10/23/23
to R-inla discussion group
Thanks Haavard, very helpful answer!

By "and the number of hyperparameters are moderate so it is ok" you mean that the smart gradients method doesn't scale that well with hyperparameter dimension, but that as the INLA method works best with <= 5 or so hyperparameters this isn't really an issue here?

Do you mean along the lines of this thread on the Stan forum? Do you have an idea about how much e.g. this work helps here? Or this is a separate issue to integration of AD with sparse matrix libraries?

Best,
Adam

Helpdesk (Haavard Rue)

unread,
Oct 23, 2023, 2:26:01 PM10/23/23
to Adam Howes, R-inla discussion group
On Mon, 2023-10-23 at 03:45 -0700, Adam Howes wrote:
> Thanks Haavard, very helpful answer!
>
> By "and the number of hyperparameters are moderate so it is ok" you
> mean that the smart gradients method doesn't scale that well with
> hyperparameter dimension, but that as the INLA method works best with
> <= 5 or so hyperparameters this isn't really an issue here?

smart gradient scales well. the positive impact will be less in high
dimensions, see the paper

for num.hyper up to 20, f.ex, it do well. it does not seems to be doing
worse even in high dimensions.

also the problem getting a spd hessian at the mode, that could occour
years ago, is essentially gone, using the 'smart' idea.

the paper describes this well. its an easy fix and it really works...

>
> Do you mean along the lines of this thread on the Stan forum? Do you
> have an idea about how much e.g. this work helps here? Or this is a
> separate issue to integration of AD with sparse matrix libraries?


I do not believe that AD for the hyperparameters will be beneficial in
the end, in a multicore environment. If you can prove me wrong, please
do ;-)

Best
H

--
Håvard Rue
he...@r-inla.org

Adam Howes

unread,
Oct 24, 2023, 10:21:43 AM10/24/23
to R-inla discussion group
Great, thank you for clarifying that the smart gradient method scales well.


> I do not believe that AD for the hyperparameters will be beneficial in
> the end, in a multicore environment. If you can prove me wrong, please
> do ;-)

Is there a reason you have this intuition? When you write "in a multicore environment" that's because AD ultimately wouldn't perform as well in a parallel setting as other methods?

When you write "AD for the hyperparameters" I assume this refers to the gradient of the Laplace approximation. Do you have any thoughts about AD to calculate the Laplace approximation itself (as in, not AD for the hyperparameters, but AD for the latent field to arrive at its Gaussian approximation)? Is there a way in which calculation of these latent field precision matrices is sped up by knowledge of the structure (something like this is what I was aiming to get at when asking "Is there a way in which the set of models specified in R-INLA are utilised to (efficiently) precalculate any aspect of the derivatives?")?

And yes, I am sure anyone would have difficulty implementing the INLA algorithm more efficiently than R-INLA! A huge amount of work has gone into this software (some of it quite technical and difficult to grasp from the outside!) so thanks to yourself and the other developers for doing this. Part of the motivation I have for investigating AD is that the models the scientists I work with use are often not compatible with R-INLA, as much as they might like to use it.

Thanks again for your responsive comments about this,
Adam

Finn Lindgren

unread,
Oct 24, 2023, 10:33:33 AM10/24/23
to R-inla discussion group
I can answer this from the inlabru perspective;

in "plain" INLA, all the likelihoods and their derivatives with
respect to the linear predictor are hand-coded (I believe). Since each
observation is linked to the latent field via a single element of the
linear predictor, the chain rule is explicit; just the individual
likelihood derivatives, combined with the fixed model matrix for the
linear predictor. For that AD wouldn't really add anything except the
possibility of simplifying some kind of user-defined likelihood
capability.

For inlabru however, we _would_ benefit from AD for models with
non-linear predictors, as it currently uses numerical derivatives to
evaluate the Jacobian and construct the sequence of linearised
predictor models that is passed on to INLA.
Currently, this linearisation step is slow for some models in ways
that AD might help with. The "bru_mapper" system used to construct
the link between latent variables, effects, and the predictor
expression, is setup to facilitate some simple AD behaviour, but if it
could "hook" into an existing AD system for R expressions, that could
speed things up (though inlabru 2.9.0 saw some significant speedups
just from dealing with the most obvious bottlenecks indicated by the R
profiler).

Finn
> --
> You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/a97b74ef-9e83-4d38-86eb-973c7ace519dn%40googlegroups.com.



--
Finn Lindgren
email: finn.l...@gmail.com
Reply all
Reply to author
Forward
0 new messages