INLA for genomic analysis.

469 views
Skip to first unread message

Bruno da Costa Perez

unread,
Feb 13, 2016, 12:43:51 PM2/13/16
to R-inla discussion group
Hello all,

I've been thinking about the possibility of implementing INLA for genomic analysis (via R-INLA package), such as Bayes A/B/A*B/Lasso, Ridge Regression and so on.

(as far as I understand, shrinkBayes only copes with RNA data, right?)

Does anyone have any idea, or maybe have already implemented Bayesian methods applied to genomic data in INLA?

Or maybe, is there anyone that might be willing to discuss it deeply in order to come up with an "R-INLA implementation" for genomic analysis?

Is it feasible? Do INLA have any restrictions (that may have passed trough my thoughts unnoticed) that could impair it's capability to deal with this kind of models?

Thank you very much in advance.

Bruno Perez

Håvard Rue

unread,
Feb 13, 2016, 10:33:20 PM2/13/16
to Bruno da Costa Perez, R-inla discussion group
Hi,

only the Ridge regression you can do in INLA. this is due to the
assumption of a Gaussian prior. [ well, you could put these into the
set of hyperparametes, but then its slower and you cannot have to many
of them]

sourry about that,
H
> -- 
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to r-inla-discussion...@googlegroups.com
> .
> To post to this group, send email to r-inla-discussion-group@googlegr
> oups.com.
> Visit this group at https://groups.google.com/group/r-inla-discussion
> -group.
> For more options, visit https://groups.google.com/d/optout.

--
Håvard Rue
Department of Mathematical Sciences
Norwegian University of Science and Technology
N-7491 Trondheim, Norway
Voice: +47-7359-3533 URL : http://www.math.ntnu.no/~hrue
Mobile: +47-9260-0021 Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org


Bruno da Costa Perez

unread,
Feb 14, 2016, 9:29:21 AM2/14/16
to R-inla discussion group, bruno...@gmail.com, hr...@r-inla.org
How come I haven't noticed the Gaussian prior assumption.
Indeed there would be problems when trying to fit Bayes B/AB/Lasso.

But ok.
Ridge regression is already interesting.

Thank you very much for the fast reply, Prof. Harvard.

Bruno da Costa Perez

unread,
Feb 14, 2016, 6:13:29 PM2/14/16
to R-inla discussion group, bruno...@gmail.com, hr...@r-inla.org
Hi Prof. Rue,

Could you give me an example (code like) of how could I include these into the set o hyperparameters?
(Bayes A - Student T, Bayes B - Student T mass density in zero; Lasso - Double exponential)

I dont know if thats a real bad question, but here it goes...
would it be possible to include a "variation" of a Gaussian-like prior that could get somewhere close to a student T distribution for the parameter ?

Sorry for being so insistent, but the possibility of applying INLA for genomic analysis is really awesome as it could drastically reduce the time required for analysis.
(even if it delivered a slightly biased estimate when compared to MCMC)
I'm just trying to cover all options.

Thank you very much again for your attention.



Em domingo, 14 de fevereiro de 2016 01:33:20 UTC-2, Havard Rue escreveu:

Håvard Rue

unread,
Feb 15, 2016, 1:44:29 PM2/15/16
to Bruno da Costa Perez, R-inla discussion group
On Sun, 2016-02-14 at 15:13 -0800, Bruno da Costa Perez wrote:
> Hi Prof. Rue,
>
> Could you give me an example (code like) of how could I include these
> into the set o hyperparameters?
> (Bayes A - Student T, Bayes B - Student T mass density in zero; Lasso
> - Double exponential)
>
> I dont know if thats a real bad question, but here it goes...
> would it be possible to include a "variation" of a Gaussian-like
> prior that could get somewhere close to a student T distribution for
> the parameter ?

Hi, 

this is for  priors of the linear effects, right?

Bruno da Costa Perez

unread,
Feb 15, 2016, 2:34:42 PM2/15/16
to R-inla discussion group, bruno...@gmail.com, hr...@r-inla.org
Yes. (I guess)

I mean, changes between these models are based on the prior for the marker effects (which are going to be estimated)
(if that's what you mean by "the linear effects".)
So I understood by your first reply that since INLA is based/build upon latent gaussian models, the vector of randoms to be estimated (in this case, markers effect) must follow a Gaussian distribution.
(as is for the random animal effects in case the of Animal Models, for example, right?)

Am I right on that idea or did I just missed something?

Thank (REALLY) very much for your attention, Prof. Rue.

Bruno da Costa Perez

unread,
Feb 15, 2016, 8:13:14 PM2/15/16
to R-inla discussion group, bruno...@gmail.com, hr...@r-inla.org
Just to add more information, in the case it helps for further discussion.

The mixed model is the same for all approaches:

y = Xb + Za + e

where,

y = observations (phenotypes)
X = matrix accounting fixed effects
b = vector of fixed effects for each observation
Z = matrix addressing jth marker information within the ith animal
a = marker effects
e = residual term (noise)

As I said, each approach considers different prior distribution for a.

So, in...

GBLUP -
ai ~ N(0, sigma[a]^2)

Bayes A -
ai ~ t(0, v, sigma[a]^2)
also, for Meuwissen,

ai ~N(0, sigma[ai]^2) quiSqrd[v]^-2 sigma[a]^-2

Bayes B - ai ~ pi(0) + (1-pi)t(0,v, sigma[a]^2)
(a fraction "pi" of markers has zero effect and another (1-pi) fraction has t distribution)

Lasso - ai ~ λ/2 exp (-λ|ai|)

Is it possible to implement such priors for "a"?
If not, is there another strategy I could implement in order to force an approximation for these priors?
(using Gaussian-like priors)

I was thinking, if maybe we could approximate the t distribution by the normal distribution as in...

http://www.m-hikari.com/ams/ams-2015/ams-49-52-2015/zogheibAMS49-52-2015.pdf

Would it be feasible?

Another strategy would be applying weights for the marker effects (using Ridge Regression), that would get updated after each iteration.

I really think INLA method is an opportunity for turning Bayesian approach more widely applied in genomics.

I don't really need to do Bayes A, B... etc.... exactly as they are.
If there is a way to adapt INLA in order to implement different priors for "a", or adapt a mixture in order to approximate it to less skewed normal, then I want to try it.

Again, sorry for the long massage Prof. Rue.
I've been thinking of it and got to write ideas down.

Best regards and thank you for the attention.

Elias T Krainski

unread,
Feb 16, 2016, 3:40:53 AM2/16/16
to r-inla-disc...@googlegroups.com

On 16/02/16 02:13, Bruno da Costa Perez wrote:

Bayes A -
ai ~ t(0, v, sigma[a]^2)

Elias T Krainski

unread,
Feb 16, 2016, 3:51:01 AM2/16/16
to r-inla-disc...@googlegroups.com

On 16/02/16 02:13, Bruno da Costa Perez wrote:
>
> If there is a way to adapt INLA in order to implement different priors
> for "a"

For this point, you can start considering section 4.5 in the following
paper:
http://arxiv.org/pdf/1403.4630v4.pdf

Elias

Bruno da Costa Perez

unread,
Feb 16, 2016, 8:42:46 AM2/16/16
to R-inla discussion group

"Extending Integrated Nested Laplace Approximation to a Class of Near-Gaussian Latent Models."


No title could be more exciting than this one.

Thank you very much Elias.

Scott Foster

unread,
Feb 16, 2016, 4:13:05 PM2/16/16
to r-inla-disc...@googlegroups.com
Hi,

This thread has got me thinking too.

Another interesting idea is that both the t-distribution and the Laplace distribution can be formulated as a scale mixture of normals. See link
below. Basically, the t-distribution can be formulated by having a normally distributed random variable, whose variance is a gamma distribution, and
the Laplace is a normal whose variance is an exponential.

I'm not sure how this could relate to INLA though... Sure, some of the randomness is normal but some of it is still non-normal. Unless the gamma can
be approximated by some sort of truncated normal (and that a truncated normal is easier to deal with).

Also, for what it is worth (maybe not very much): I have seen adaptive LASSO type work in genetics (QTL mapping, although I can't remember the
reference). Here the penalty is weighted by the inverse of the *unpenalised* effects -- basically penalising small effects more than large effects.
This approach still has a number of desirable properties (although I can't remember exactly what. This approach may help maintain normal random
effects (just with different scales), just what the INLA machine is designed for.

Cheers,

Scott

http://www.jstor.org/stable/2984774?seq=1#page_scan_tab_contents

On 17/02/16 00:42, Bruno da Costa Perez wrote:
>
>
> *"Extending Integrated Nested Laplace Approximation to a Class of Near-Gaussian Latent Models."*
>
>
> No title could be more exciting than this one.
>
> Thank you very much Elias.
>
>
> Em terça-feira, 16 de fevereiro de 2016 06:51:01 UTC-2, Elias T. Krainski escreveu:
>
>
> On 16/02/16 02:13, Bruno da Costa Perez wrote:
> >
> > If there is a way to adapt INLA in order to implement different priors
> > for "a"
>
> For this point, you can start considering section 4.5 in the following
> paper:
> http://arxiv.org/pdf/1403.4630v4.pdf <http://arxiv.org/pdf/1403.4630v4.pdf>
>
> Elias
>
> --
> You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com
> <mailto:r-inla-discussion...@googlegroups.com>.
> To post to this group, send email to r-inla-disc...@googlegroups.com <mailto:r-inla-disc...@googlegroups.com>.
Scott Foster
CSIRO
E scott....@csiro.au T +61 3 6232 5178
Postal address: CSIRO Marine Laboratories, GPO Box 1538, Hobart TAS 7001
Street Address: CSIRO, Castray Esplanade, Hobart Tas 7001, Australia
www.csiro.au

Bruno da Costa Perez

unread,
Feb 16, 2016, 8:52:28 PM2/16/16
to R-inla discussion group, scott....@csiro.au
Thanks for your considerations Scott.

I've also thought about the t-distribution and laplace (double exponential) being mixture of normals.
This could be a path for implementing these prior distributions without losing "normality", in order to stay near the latent gaussian model assumptions.

I would probably need someone to help me with coding these in proper R-INLA sintax.

Could I implement those priors by creating a prior "expression" directly from R-INLA?
Don't I need a correction in the likelihood in order to maintain the model more stable?
> To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com
> <mailto:r-inla-discussion-group+unsubscribe@googlegroups.com>.
> To post to this group, send email to r-inla-disc...@googlegroups.com <mailto:r-inla-discussion-gr...@googlegroups.com>.

Bob O'Hara

unread,
Feb 17, 2016, 4:32:47 AM2/17/16
to Bruno da Costa Perez, R-inla discussion group, scott....@csiro.au
I've mused about similar issues (using continuous mixtures of
normals): one obvious problem is that you need a random effect for
every coefficient, so for most applications where you're interested in
doing this you would have to have far too many random effects to be
practical. There may also be problems with fitting, because the prior
has infinite density at zero (I dimly recall Håvard explaining this to
me a few years ago).


I can't see any good way around this: you could combine INLA with MCMC
(use MCMC for indicators of whether a variable is in the model). This
might not be very quick, although at least you would only have to
visit each combination of indicators once. I don't know if this would
really be better, though.

Bob
>> > an email to r-inla-discussion...@googlegroups.com
>> > <mailto:r-inla-discussion...@googlegroups.com>.
>> > To post to this group, send email to r-inla-disc...@googlegroups.com
>> > <mailto:r-inla-disc...@googlegroups.com>.
>> > Visit this group at
>> > https://groups.google.com/group/r-inla-discussion-group.
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> Scott Foster
>> CSIRO
>> E scott....@csiro.au T +61 3 6232 5178
>> Postal address: CSIRO Marine Laboratories, GPO Box 1538, Hobart TAS 7001
>> Street Address: CSIRO, Castray Esplanade, Hobart Tas 7001, Australia
>> www.csiro.au
>>
> --
> You received this message because you are subscribed to the Google Groups
> "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to r-inla-discussion...@googlegroups.com.
> To post to this group, send email to
> r-inla-disc...@googlegroups.com.
Bob O'Hara

Biodiversity and Climate Research Centre
Senckenberganlage 25
D-60325 Frankfurt am Main,
Germany

Tel: +49 69 798 40226
Mobile: +49 1515 888 5440
WWW: http://www.bik-f.de/root/index.php?page_id=219
Blog: http://occamstypewriter.org/boboh/
Journal of Negative Results - EEB: www.jnr-eeb.org

Håvard Rue

unread,
Feb 17, 2016, 4:58:30 AM2/17/16
to Bob O'Hara, Bruno da Costa Perez, R-inla discussion group, scott....@csiro.au
On Wed, 2016-02-17 at 10:32 +0100, Bob O'Hara wrote:
> I've mused about similar issues (using continuous mixtures of
> normals): one obvious problem is that you need a random effect for
> every coefficient, so for most applications where you're interested
> in
> doing this you would have to have far too many random effects to be
> practical. There may also be problems with fitting, because the prior
> has infinite density at zero (I dimly recall Håvard explaining this
> to
> me a few years ago).


that is correct. if you have a handfull of these we can deal with it,
otherwise, it simply gets to much.

H

Bruno da Costa Perez

unread,
Feb 17, 2016, 8:50:56 AM2/17/16
to R-inla discussion group, rni...@gmail.com, bruno...@gmail.com, scott....@csiro.au, hr...@math.ntnu.no
Thank you Harvard and Bob.

Using MCMC for detecting non-zero marker effects to than proceed to estimation in INLA is not a bad idea.
I don't know if doable, but certainly not bad at all.

I was trying to elucidate a way to include 50K-800K random effects in the model.
And doesn't matter what I think, this always seem like too much.
That's the same problem that comes up when trying to fit genomic selection models into MCMCglmm R package, how to enumerate the marker effects.

But, just to clarify, when you say "it's too much", are you referring to INLA method itself or that it's just too much for R-INLA to cope with given the way it processes the model?

Thanks again.

rcdr...@gmail.com

unread,
Feb 19, 2016, 4:10:19 AM2/19/16
to R-inla discussion group, rni...@gmail.com, bruno...@gmail.com, scott....@csiro.au, hr...@math.ntnu.no
Hi Bruno,

If your interest is related to speed, you might like to try a different approach, eg something like a partial least squares approach (see, for example, Moser et al 2009, Genet Sel Evol 41 (1), 56). Most approaches give pretty similar accuracies of prediction, supposedly due to so much of the improvement being due to modelling of relationships. Unless you have markers with genuinely large effect - in which case you might want to handle them separately.

Alternatively, Mario Calus and colleagues at Wageningen UR have some good MCMC software (the name escapes me) written specifically for genomic analysis. I don't know what the licensing/distribution of that program is though.

Ron.

Bruno da Costa Perez

unread,
Feb 19, 2016, 11:01:08 AM2/19/16
to R-inla discussion group, rni...@gmail.com, bruno...@gmail.com, scott....@csiro.au, hr...@math.ntnu.no
Hi Ron,

Thank you for your considerations.

I think you're referring to beagle software, right?

I agree with you that most approaches yield pretty similar accuracies of prediction.
But two things come to my mind:
- First is sequencing, which will probably (in the future) rely much more on Bayesian assumptions than on frequentist approach.
- Second is that if we can implement a faster Bayesian approach (yet, without losing accuracy), Bayesian inference may get a lot more adopted in genomics.

I've been looking for automatic differentiation variational inference (ADVI), which can be implemented in STAN.
I don't know yet if genomic prediction models may actually succeed under ADVI perspective, but it woth trying.

Anyway, thanks a lot for the ideas.
Its always a pleasure to discuss such interesting topics.

All best,

Patrik Waldmann

unread,
Feb 19, 2016, 3:22:37 PM2/19/16
to Bruno da Costa Perez, R-inla discussion group, scott....@csiro.au, rni...@gmail.com, hr...@math.ntnu.no

The R package BGLR does what you want.

Patrik Waldman

--
Reply all
Reply to author
Forward
0 new messages