Interpret R squared

162 views
Skip to first unread message

Minh Nguyễn

unread,
Jan 15, 2025, 1:25:24 PM1/15/25
to beast...@googlegroups.com
Dear community,

I would like to ask: if my R squared calculated in TempEst is low, i.e. 0.05-0.1, does that mean my data is not reliable to do BEAST analysis?

Please excuse my probably basic question, I'm inexperienced with this analysis.

I read in the literature something like "We detected a strong temporal signal in the genome alignment (root-to-tip correlation, R2=0.9), sufficient to estimate evolutionary rates and dates for the most recent common ancestors (MRCAs) with BEAST."
However, I struggle to find out what is sufficient.

Please advise.
Many thanks.

Minh


Mamerto Jr Brina

unread,
Jan 17, 2025, 1:03:56 PM1/17/25
to beast-users
Hi Minh. I have also experienced this issue with my sequences. If you have low rsquared, one thing you can do is to remove outliers to increase your rsquared then check again in TempEst if it increases. I believe you need at least 0.6 with your rsquared to have good output. 

Artem B

unread,
Jan 20, 2025, 2:46:52 AM1/20/25
to beast-users
Hi Minh,

In TempEst has no cut-off values to make this decision. Low R2 values do not necessarily mean the lack of temporal signal. Root-to-tip regression analysis implies a strict clock model (substitution rate for all branches is the same). But if there is among-branch rate variation (relaxed clock model), then R2 decreases. Hence, if you have low R2, this may also indicate that there is extensive rate variation and the relaxed clock model is able to resolve this issue.

In your complex case, a formal statistical analysis should be done. You can use the Bayesian evaluation of temporal signal (BETS) where you calculate log marginal likelihood for a model with tip dates and without tip dates (isochronous model). The difference between log marginal likelihoods of two models is a Bayes factor. For log marg likelihood calculation, I generally use Path sampling in BEAST 2. I don't know any other method to prove temporal signal in the case of low R2.

Read more about BETS here:


Cheers,
Artem

четверг, 16 января 2025 г. в 02:25:24 UTC+8, Minh Nguyễn:

Artem B

unread,
Jan 20, 2025, 2:50:02 AM1/20/25
to beast-users
My mistake:  The difference between log marginal likelihoods of two models is a log Bayes factor.

четверг, 16 января 2025 г. в 02:25:24 UTC+8, Minh Nguyễn:
Dear community,

Minh Nguyễn

unread,
Jan 20, 2025, 2:54:02 PM1/20/25
to beast...@googlegroups.com
Dear Mamerto,

Thank you very much for your response. 

Taking your advice, I have tried to remove outliers, up to 1/3 of my sequences, but could only improve R squared to 0.2 (T_T). Not sure what else I can do.
Do you happen to remember the source citing r squared threshold of 0.6?

Best regards,
Minh
 

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/beast-users/7c789084-2437-45af-b8a0-6b9e3aa57088n%40googlegroups.com.



Minh Nguyễn

unread,
Jan 20, 2025, 2:54:02 PM1/20/25
to beast...@googlegroups.com
Hi Artem,

Many thanks for your explanation. Very very helpful!
I'll try BETS as you suggested. 

Best wishes,
Minh


--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.

Artem B

unread,
Jan 21, 2025, 10:05:17 PM1/21/25
to beast-users
Also, I do not advice to remove outliers just to increase R2. You should have some justified reason to do that (e.g., sequencing errors, date mislabeling, passage history, recombination, etc.). In the case of removing outliers just to increase R2, you force the data to fit your model, instead of trying to to use a model to describe your data. It is a common mistake.

Cheers,
Artem

вторник, 21 января 2025 г. в 03:54:02 UTC+8, Minh Nguyễn:

Minh Nguyễn

unread,
Jan 22, 2025, 12:14:27 PM1/22/25
to beast...@googlegroups.com
Dear Artem,

Thank you for emphasizing this important point! 
I agree with you. My data indeed does not seem to have technical problems resulting in outliers. I retrieved all my sequences and am rerunning the analysis keeping in mind that there is significant rate variation. 

Best regards,
Minh


Mamerto Jr Brina

unread,
Jan 22, 2025, 12:14:28 PM1/22/25
to beast...@googlegroups.com
Yes, you should be able to justify if your sequences have sequencing errors, date mislabeling, passage history, recombination, etc. I'm sorry I forgot to mention these. But if you want to justify whether or not your sequences have a temporal signal, you can use BETS. Though I haven't used BETS, and I am hearing it the first time, too, so I am not much of an assistance. Thank you Artem for sharing.

Minh Nguyễn

unread,
Jan 23, 2025, 12:38:18 PM1/23/25
to beast...@googlegroups.com
No worries, Mamerto! I guess it's nice to always learn something new here.




--
Minh Nguyen


Reply all
Reply to author
Forward
0 new messages