How to set the number of gamma categories under a GTRGAMMA or PROTGAMMA model?

282 views
Skip to first unread message

Florent Lassalle

unread,
Apr 23, 2012, 9:11:11 AM4/23/12
to raxml
Hi,

I am a rather new user of RAxML, and found that the diversity of
options for running the program isa very good point as the user is
free of using it for very precise tasks.
However, I was surprized I could not set the number of site-
heterogeneity categories while using a GTRGAMMA or any PROTGAMMA
model, this number being hard-coded as 4, as I understood from the
v704 doc.
Is there any mean to make the -c option to work with GAMMA models as
well as with CAT models?

Best Regards,

FL

Fernando Izquierdo

unread,
Apr 23, 2012, 9:19:05 AM4/23/12
to ra...@googlegroups.com
Hi Florent,

Your understanding is correct, -c number_of_categories refers always
to the number of categories used in the CAT (aka PSR per site rates)
approximation of rate heterogeneity.

Cheers,
Fernando

Alexandros Stamatakis

unread,
Apr 23, 2012, 9:22:13 AM4/23/12
to ra...@googlegroups.com
Dear Florent,

This is not possible and we do not plan on implementing it.
The Gamma rate cats are hard-coded to 4 in RAxML for greater
computational efficiency of the phylogenetic likelihood function implementation.

Besides, especially on large datasets our CAT approximation of rate heterogeneity works
equally welll, if not even better, while at the same time requiring less memory and CPU cycles, i.e., the CO^2 footprint
of your analyses will be smaller ;-)

see, e.g.:

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0009490

and

http://www.biomedcentral.com/1471-2105/12/470

Alexis

--
Dr. Alexandros Stamatakis
www.exelixis-lab.org

Florent Lassalle

unread,
Apr 23, 2012, 9:51:03 AM4/23/12
to raxml
Dear Alexandros,

Thank you for clarifying this doubt I had. Indeed, I heard a lot of
good things about the CAT aproximation-based models when working on
large datasets, and I understand that when something good is found in
science, one shall move to the next-geneneration technology and not
stay stuck in old-fashionned practices.
However, I'm not working on large datasets, at least not in the sense
of whole-genome phylogenies, as I intend to compute phylogenies of
potentially (very) large gene families, those having usually around
1,000 DNA sites. My sentiment is CAT approximation would not work well
on so few sites. Do you agree on this? And, if you don't, would it be
a fair reason to implement flexible categories?

Cheers,

Florent

Alexandros Stamatakis

unread,
Apr 24, 2012, 1:51:31 PM4/24/12
to ra...@googlegroups.com
Dear Florent,

> Thank you for clarifying this doubt I had. Indeed, I heard a lot of
> good things about the CAT aproximation-based models when working on
> large datasets, and I understand that when something good is found in
> science, one shall move to the next-geneneration technology and not
> stay stuck in old-fashionned practices.

Well, I wasn't syaing CAT is that good. Please keep in mind that the RAxML CAT
model is fundamentally different from the PhyloBayes CAT model, the naming was just very unfortunate because I was not
aware of the PhyloBayes CAT model back then.

> However, I'm not working on large datasets, at least not in the sense
> of whole-genome phylogenies, as I intend to compute phylogenies of
> potentially (very) large gene families, those having usually around
> 1,000 DNA sites. My sentiment is CAT approximation would not work well
> on so few sites. Do you agree on this?

No, the CAT model (or per-site rate category model in RAxML) should work quite well on this.
In fact, it mostly only doesn't work well if you don't have that many taxa (say less than 100).

You may want to have a look at the original paper:

http://sco.h-its.org/exelixis/pubs/HICOMB2006.pdf


Alexis

Reply all
Reply to author
Forward
0 new messages