gamma categories

1,531 views
Skip to first unread message

Chrissen Gemmill

unread,
Jan 14, 2013, 8:44:55 PM1/14/13
to beast...@googlegroups.com
hello from New Zealand, apologies for this uber simplistic question....
1. to estimate frequencies or to use empirical?
In sites is it best to "estimate" the base frequencies or to use "empirical" and what is the effect of this on the analysis?

2. what are the 10 gamma categories and how do you determine the number of gamma categories to use? what is the effect of selecting a higher number than you actually need? (e.g. selecting 8 instead of 4).

thanks so much,
Chrissen

Eduardo Castro Nallar

unread,
Jan 15, 2013, 8:19:15 AM1/15/13
to beast...@googlegroups.com
Hi Chrissen,

1. to estimate frequencies or to use empirical?

I'd recommend "to estimate" as estimating frequencies for the MCMC is a piece of cake. If you notice even though when you run an analysis for a few number of steps, you get pretty decent results on those parameters. Also, the weight on the related operators tells you that is not that hard to estimate. Choosing empirical could have a negative impact on your frequency estimates, I guess if you have a lot of funky stuff going on like composition bias and saturated sites.

2. what are the 10 gamma categories and how do you determine the number of gamma categories to use? what is the effect of selecting a higher number than you actually need? (e.g. selecting 8 instead of 4).

The gamma categories is the number of bins in which you discretize a gamma distribution that describes rate heterogeneity. I think this has been already tested and having more than say 4 or 5 yields little improvement in rate heterogeneity estimation but a great computational burden. So, 4 is like the de facto standard, though using more might improve your alpha parameter estimate.

Hope this helps,

Eduardo


--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beast-users/-/vMp1Wq62O1cJ.
To post to this group, send email to beast...@googlegroups.com.
To unsubscribe from this group, send email to beast-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beast-users?hl=en.

Chrissen Gemmill

unread,
Jan 15, 2013, 1:44:54 PM1/15/13
to beast...@googlegroups.com, beast...@googlegroups.com
Thanks so much!!! That helps heaps. So the effect of using > 4 gamma categories is solely computational time? That is great. We started a new run with 4 :-)

All the best,
Chrissen

Chrissen Gemmill

unread,
Jan 17, 2013, 7:37:45 PM1/17/13
to beast...@googlegroups.com, beast...@googlegroups.com
Hi all, is there anything known about how BI (and even ML) perform when trying assess relationships among closely related/ recently radiated taxa? We have low PP for some clades, and I think Alexi (or Dave) said that these methods didn't perform in cases with short beach lengths and/or polytomies? Is there some way to improve performance in this situation?

All the best,
Chrissen

On 16/01/2013, at 2:19 AM, Eduardo Castro Nallar <castro...@gmail.com> wrote:

zqz...@hotmail.com

unread,
Jan 18, 2013, 8:45:23 PM1/18/13
to beast...@googlegroups.com

Hi, Eduardo

What is “saturated site” mean? I have seen this word many places else but no definition.

By what means or what program (which parameter) can test the whether the sites are saturated? Any words or citation of papers would be OK.

Thanks for your help

Qian ZHANG

在 2013年1月15日星期二UTC+8上午9时44分55秒,Chrissen Gemmill写道:

Santiago Sanchez

unread,
Jan 19, 2013, 10:08:40 AM1/19/13
to beast...@googlegroups.com, beast...@googlegroups.com
Dear Qian,

A saturated site refers to a given position in the alignment that has been subject to recurrent mutations mainly in GTR-like models. So for example, one way to do it is to plot the "corrected" p-distances, for instance using a GTR model, vs the "uncorrected" p-distances, for instance with a JC model. You can do this for each of the 3 codon position to show for instance that the 3th codon position is saturated. People have also done it to compare groups of taxa (e.g. ingroup / outgroup), sets of genes or other portions with in alignments.

There are many papers out there that recur to this method to show particularities of their data. Just type "site saturation plot phylogeny GTR" in http://scholar.google.com and you will find many papers in journals like MPE and MBE.

Hope this helps,

Santiago

Santiago Sanchez-Ramirez
Ecology and Evolutionary Biology, University of Toronto
Natural History (Mycology), Royal Ontario Museum
100 Queen's Park
Toronto, ON
M5S 2C6
Canada
--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/beast-users/-/9EF3Mi6ldL0J.

zqz...@hotmail.com

unread,
Jan 19, 2013, 8:35:58 PM1/19/13
to beast...@googlegroups.com
Hi,Santiago.

Thanks for your help. You opened a door for me. 

Qian

在 2013年1月19日星期六UTC+8下午11时08分40秒,santiago写道:
Reply all
Reply to author
Forward
0 new messages