Question for TempEST

102 views
Skip to first unread message

Emma Wang

unread,
Apr 9, 2024, 2:20:10 AMApr 9
to beast-users
Hello everyone,

I have a question regarding my analysis of the transmission pathways of an influenza outbreak across different states from April to August (in the same year). I input 296 sequences and their corresponding sampling times into TempEST and selected the best-fitting model. The results showed a correlation of 0.2969 and an R-squared value of 8.82E-2. Given these values, is it still feasible to proceed with the subsequent phylogeographic analysis?

If not, should I optimize the included sequences, such as by adding more data? Our current dataset already contains samples from multiple states, but the time frame is limited to April through August. I'm unsure if this limited time range might be the reason for the low R-squared value.

Could someone please provide some guidance on this matter? Thank you in advance for your help. Thanks a lot!!

Brandon Stark

unread,
May 10, 2024, 8:13:32 AMMay 10
to beast-users
what kind of tree you used?
Message has been deleted

Artem B

unread,
May 16, 2024, 9:16:38 PMMay 16
to beast-users
 Hello Emma,
If everything is done correctly (including the using maximum likelihood tree as input), the time range (or sampling window) of your data can be one of the main reason for low R2. But, low R2 also means high rate variation among branches, and then a relaxed clock is more preferable. High R2 values, in turn, indicate that strict clock is more applicable.

Nevertheless, wider sampling window might change nothing in the case of influenza. As I see from the literature, coefficient of rate variation for influenza data sets is above 1.0, so researchers applied relaxed clock as I expected (https://doi.org/10.1016/j.ympev.2019.01.019). When coefficient of rate variation is high, R2 in TempEst tends to be low.

Also, TempEst does not detect temporal signal in data, it is just a tool for assessing the degree of clock-like behavior of data. To test temporal signal reliably, you should use whether a permutation test, where dates are randomly permuted between sequences, or Bayesian evaluation of temporal signal: https://beast.community/bets_tutorial.

Best,
Artem


вторник, 9 апреля 2024 г. в 14:20:10 UTC+8, Emma Wang:

Emma Wang

unread,
May 19, 2024, 3:30:56 PMMay 19
to beast-users
Hi Artem,

Thanks a lot! It is much clear for this now! 



Gaspary Eugene

unread,
Jun 6, 2024, 2:13:28 PMJun 6
to beast...@googlegroups.com
Hello Artem,

As part of this diagnosis you can also try to

1.  Run treetime to drop out sequences that violate assumptions of the. molecular clock given the sampling time 

  treetime --aln <alignment.fasta --tree <tree.nwk> --dates <dates (months).csv> --clock-rate <value for influenza subst/site/year> --reroot oldest --reconstruct-tip-states

example molecular clock value for pH1N1.  With 25 months of data, our estimates for both hemagglutinin and neuraminidase were virtually identical to the whole-genome estimate of 5.02 × 10−3 subst./site/year with early pH1N1 data (). Therefore, 25 months of data were sufficient to capture the long-term evolutionary trends.

What strain are you working on? choose the appropriate molecular clock value to use in your case 

2. Reconstruct again ML trees with appropriate substitution model with iqtree2   
3. Check TempEST signal for evaluating divergence time  (R2? 
4. To note as described by high R2 means the residual values are low and strict model may apply but low R2 with high residuals value means relaxed clock model will be appropriate as in most cases apply for influence a highly evolved virus

cheers,

Gaspary


On Thu, May 16, 2024 at 5:26 AM Artem B <ui.ar...@gmail.com> wrote:
Hello Emma,
If everything is done correctly (including the using maximum likelihood tree as input), the time range (or sampling window) of your data can be one of the main reason for low R2. But, low R2 also means high rate variation among branches, and then a relaxed clock is more preferable. High R2 values indicate that strict clock is more applicable.  In other words, wider sampling window might change nothing. As I see from the literature, coefficient of rate variation for influenza data sets is above 1.0, so researchers applied relaxed clock as I expected (https://doi.org/10.1016/j.ympev.2019.01.019). When coefficient of rate variation is high, R2 in TempEst tends to be low.

The fact is that, TempEst does not detect temporal signal in data, it is just a tool for assessing the degree of data clock-like behavior . To test temporal signal reliably, you should use whether a permutation test, where dates are randomly permuted between sequences, or Bayesian evaluation of temporal signal: https://beast.community/bets_tutorial.

Best,
Artem

пятница, 10 мая 2024 г. в 20:13:32 UTC+8, Brandon Stark:

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/beast-users/f6f4d00b-333d-412b-9238-e3530b2fc76an%40googlegroups.com.

Gaspary Eugene

unread,
Jun 10, 2024, 2:04:48 PMJun 10
to beast...@googlegroups.com
As part of this diagnosis you can also try to

1.  Run treetime to drop out sequences that violate assumptions of the. molecular clock given the sampling time 

  treetime --aln <alignment.fasta --tree <tree.nwk> --dates <dates (months).csv> --clock-rate <value for influenza subst/site/year> --reroot oldest --reconstruct-tip-states

example molecular clock value for pH1N1.  With 25 months of data, our estimates for both hemagglutinin and neuraminidase were virtually identical to the whole-genome estimate of 5.02 × 10−3 subst./site/year with early pH1N1 data (). Therefore, 25 months of data were sufficient to capture the long-term evolutionary trends.

What strain are you working on? choose the appropriate molecular clock value to use in your case 

2. Reconstruct again ML trees with appropriate substitution model with iqtree2   
3. Check TempEST signal for evaluating divergence time  (R2? 
4. To note as described by high R2 means the residual values are low and strict model may apply but low R2 with high residuals value means relaxed clock model will be appropriate as in most cases apply for influence a highly evolved virus

cheers,

Gaspary

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages