Missing values for continuous traits?

68 views
Skip to first unread message

DJO

unread,
Jan 24, 2024, 10:53:41 AMJan 24
to beast-users
Hi all!

Is it possible (in BEAST 1 or 2) to have missing values for continuous traits, and if so, to specify a prior from which they might be drawn (analogous to missing tip dates).

For example, supposing I have body size and lifespan for each species. These probably correlate well so that I could have a partition with these two values and their covariance, but I might be missing size for some species and lifespan for others (which could be estimated from the other trait)

Any suggestions gratefully received!

D

Gabriel Hassler

unread,
Jan 24, 2024, 12:21:32 PMJan 24
to beast...@googlegroups.com

Hi Darren,

 

Yes, BEAST 1 is certainly capable of handling missing values for continuous traits.

 

For many Gaussian models (e.g., MBD, OU), you don’t actually need to sample the missing values to estimate other model parameters (like the covariance between traits). BEAST just analytically integrates out the missing data.

 

If you want to estimate the values of the missing data, then the prior on the missing observations is fully specified by the trait model. In other words, you don’t need to add an extra prior for the missing observations.

 

Here’s an example xml of how to do this: https://github.com/suchard-group/incomplete_measurements/blob/master/xml/timing/mammals_newTiming.xml

 

Essentially, just code all missing values as “NA”, and BEAST should handle the rest.

 

You should also be able to create a similar xml in BEAUti by feeding it trait data with missing values coded as “NA”.

 

If you want to make sure that BEAST is treating the NA’s as missing and see what the estimates of the missing values are, add the following to the file log portion of the xml:

 

  <traitLogger id="<any id you want> "

                             traitName="<the name of the trait attribute, probably ‘traits’> "

                             taxonNameExplicit="true"

                             nodes="external">

    <treeModel idref="<your tree model id> "/>

    <traitDataLikelihood idref="<your traitDataLikelihood id> "/>

  </traitLogger>

 

Observed traits should have constant values and you will see it sampling the missing values in the log file.

 

Please let me know if you have any more questions.

 

Best,
Gabe

 

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/3f998d39-6fcc-43f2-89e2-cd5b62089e03n%40googlegroups.com.

DJO

unread,
Jan 24, 2024, 12:44:34 PMJan 24
to beast-users
Perfect, thank you!  

Seeing your response to my question points me to https://www.tandfonline.com/doi/full/10.1080/01621459.2020.1799812 , and then a follow-on question not closely related to the start of the thread ...

That paper fits a residual (co)variance matrix, allowing the estimation of phylogenetic heritability. If I have an estimated lambda in my output from BEAST 1.10.4, is that the model that has been fitted?

But what is lambda? shouldn't there be a lambda for each continuous trait? 

Are the estimates for the residual in the output? (I wasn't sure if the precision matrix and covariance matrix were just inverses of each other, or if one was the phylogenetic effect and one the residual) 

... or am I totally failing to understand?

Thanks!

D

Gabriel Hassler

unread,
Jan 24, 2024, 1:34:02 PMJan 24
to beast...@googlegroups.com

Hi Darren,

 

Glad that helped.

 

Short answer:

Pagel’s lambda transformation and the residual variance extension are not the same.

 

Long answer:

If you assume that the residual covariance matrix is equal to the evolutionary covariance matrix multiplied by a constant, then the likelihood under the Pagel’s lambda transformation and residual variance extension are roughly equivalent (the priors are different). In this case there would just be one “heritability” value fully determined by the constant multiplier on the residual covariance matrix.

 

Generally, we don’t make that assumption. When you violate that assumption, then you can get trait-specific heritability estimates from the residual variance model. Pagel’s Lambda will still give you a single value. I don’t recommend using both at the same time.

 

If you want to use the residual variance models, then here is an example: https://github.com/suchard-group/incomplete_measurements/blob/master/xml/hiv.xml

 

I recently implemented that model in BEAUti, but you’ll need to build BEAST and BEAUti from source if you want to use it. There are instructions for building them here (Java & BEAST section of the README): https://github.com/suchard-group/incomplete_measurements/tree/master

 

For the precision question, a precision matrix is just an inverse of a covariance matrix. For the residual variance model, there are two precision matrices corresponding to the two covariance matrices (evolutionary covariance and residual covariance).

 

Reply all
Reply to author
Forward
0 new messages