RAxML partition AA analysis

722 views
Skip to first unread message

Haiwei

unread,
Apr 17, 2012, 8:49:22 AM4/17/12
to raxml
Hello,

I am using RAxML 7.3.0 to do protein partition (140 partitions)
analysis of a super-alignment. If I use "PROTCATLG", it works; but if
I use "PROTGAMMALG", it does not work.

The command-line is "./raxmlHPC-PTHREADS -f a -x 12345 -m PROTCATLG -q
gene.part -n RoseoSAR11 -s concat.phy -T 4 -# 100 -p 2".

From the google group, I learned that RAxML 7.2.8-ALPHA needs to be
recompiled to handle >128 partitions. So I also tried a recomplied
7.2.8 version but "PROTGAMMALG" still does not work. Could you let me
know why it happens?

Best,

Haiwei Luo
University of Georgia

Alexandros Stamatakis

unread,
Apr 17, 2012, 2:10:20 PM4/17/12
to ra...@googlegroups.com
Most probably you are running out of memory, please check the memory requirements calculator at:

http://sco.h-its.org/exelixis/software.html

Gamma needs 4 times more memory than CAT.

Also have a look at the new options/techniques for memory saving implemented in RAxML-Light,
manuscript drafts available here:

http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-3.pdf
http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-4.pdf

Alexis

--
Dr. Alexandros Stamatakis
www.exelixis-lab.org

Haiwei

unread,
Apr 23, 2012, 10:36:16 AM4/23/12
to raxml
Hi Alexis,

Thanks for reply. For calculating memory requirement, what is pattern
(m)?

Haiwei


On Apr 17, 2:10 pm, Alexandros Stamatakis
<alexandros.stamata...@gmail.com> wrote:
> Most probably you are running out of memory, please check the memory requirements calculator at:
>
> http://sco.h-its.org/exelixis/software.html
>
> Gamma needs 4 times more memory than CAT.
>
> Also have a look at the new options/techniques for memory saving implemented in RAxML-Light,
> manuscript drafts available here:
>
> http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-3.pdfhttp://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-4.pdf

Fernando Izquierdo

unread,
Apr 23, 2012, 10:51:42 AM4/23/12
to ra...@googlegroups.com
Hi Hawei,

m refers to the number of alignment patterns or unique columns in the
alignment. There is no need make computations (or allocate memory) for
repeated columns because raxml always assumes independency among
sites. The exact number for this is reported in the RAxML_info files
as "Alignment has m distinct alignment patterns".

If you need to calculate this in advance you can also safely use the
number of columns in your alignment (that will always be larger or
equal to the number of patterns)

Cheers,
Fernando

Haiwei

unread,
Apr 23, 2012, 12:15:08 PM4/23/12
to raxml
Hi Fernando,

Thanks so much for the nice explanation.

Haiwei

On Apr 23, 10:51 am, Fernando Izquierdo <fer.izquie...@gmail.com>
wrote:
> Hi Hawei,
>
> m refers to the number of alignment patterns or unique columns in the
> alignment. There is no need make computations (or allocate memory) for
> repeated columns because raxml always assumes independency among
> sites. The exact number for this is reported in the RAxML_info files
> as "Alignment has m distinct alignment patterns".
>
> If you need to calculate this in advance you can also safely use the
> number of columns in your alignment (that will always be larger or
> equal to the number of patterns)
>
> Cheers,
> Fernando
>
>
>
>
>
>
>
> On Mon, Apr 23, 2012 at 4:36 PM, Haiwei <hluo2...@gmail.com> wrote:
> > Hi Alexis,
>
> > Thanks for reply. For calculating memory requirement, what is pattern
> > (m)?
>
> > Haiwei
>
> > On Apr 17, 2:10 pm, Alexandros Stamatakis
> > <alexandros.stamata...@gmail.com> wrote:
> >> Most probably you are running out of memory, please check the memory requirements calculator at:
>
> >>http://sco.h-its.org/exelixis/software.html
>
> >> Gamma needs 4 times more memory than CAT.
>
> >> Also have a look at the new options/techniques for memory saving implemented in RAxML-Light,
> >> manuscript drafts available here:
>
> >>http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-3.pdfhttp://sco...

Haiwei

unread,
Apr 23, 2012, 1:20:05 PM4/23/12
to raxml
Dear Alexis,

I just estimated the memory requirement. If using AA+CAT, it needs 300
MB; if using 1.2 GB. I am running a cluster with 16GB in each node
(which contains 8 cores).

When using the CAT model, RAxML 7.3.0 runs several days and it outputs
100 bootstrapped trees but stopped there. It generates a "core" file
which contains error information as below.
-------------------------------------------------------------------------------------------------------------------------------------------
warning: no loadable sections found in added symbol-file system-
supplied DSO at 0x7fff569fd000
Core was generated by `/usr/local/raxml/latest/raxmlHPC-PTHREADS -f a -
x 12345 -m PROTCATLG -q gene.pa'.
Program terminated with signal 6, Aborted.
#0 0x000000369b830285 in ?? ()
-------------------------------------------------------------------------------------------------------------------------------------------

The command-line for CAT model is here.
./raxmlHPC-PTHREADS -f a -x 12345 -m PROTCATLG -q gene.part -n
RoseoSAR11 -s concat.phy -T 4 -# 100 -p 2

When using the GAMMA model, RAxML 7.3.0 runs a few minutes and
crashed. It outputs as below.

RAxML was called as follows:
-------------------------------------------------------------------------------------------------------------------------------------------
/usr/local/raxml/latest/raxmlHPC-PTHREADS -f a -x 12345 -m PROTGAMMALG
-q gene.part -n RoseoSAR11 -s concat.phy -T 4 -# 100 -p 2

Testing which likelihood implementation to use
Standard Implementation full tree traversal time: 19.543052
Subtree Equality Vectors for gap columns full tree traversal time:
20.403099
... using standard implementation

raxmlHPC-PTHREADS: evaluateGenericSpecial.c:3217: evaluateIterative:
Assertion `partitionLikelihood < 0.0' failed.
-------------------------------------------------------------------------------------------------------------------------------------------

Can you let me know where is problem? If you want to test it using my
data, I can send you the alignment and the gene partition file.

Best regards,
Haiwei



On Apr 17, 2:10 pm, Alexandros Stamatakis
<alexandros.stamata...@gmail.com> wrote:
> Most probably you are running out of memory, please check the memory requirements calculator at:
>
> http://sco.h-its.org/exelixis/software.html
>
> Gamma needs 4 times more memory than CAT.
>
> Also have a look at the new options/techniques for memory saving implemented in RAxML-Light,
> manuscript drafts available here:
>
> http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-3.pdfhttp://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-4.pdf

Haiwei

unread,
Apr 23, 2012, 1:22:47 PM4/23/12
to raxml
Sorry! The first paragraph was not complete. Here is the change.

I just estimated the memory requirement. If using AA+CAT, it needs
300
MB; if using AA+GAMMA, it needs 1.2 GB. I am running a cluster with
16GB in each node
(which contains 8 cores), so the problem should not be related to the
memory.

Haiwei
> >http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-3.pdfhttp://sco...

Alexandros Stamatakis

unread,
Apr 24, 2012, 1:54:56 PM4/24/12
to ra...@googlegroups.com
please download the latest RAxML version from github,
the error you are getting should be fixed in there.

Also, please compile and use the SSE3-based version that should
be much faster,

Alexis

--
Dr. Alexandros Stamatakis
www.exelixis-lab.org

Haiwei

unread,
Apr 25, 2012, 11:08:47 PM4/25/12
to raxml
Thanks a lot, Alexis. It works very well.

Haiwei


On Apr 24, 1:54 pm, Alexandros Stamatakis
> >>http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2012-3.pdfhttp://sco...

Haiwei

unread,
Jul 30, 2012, 12:09:54 PM7/30/12
to ra...@googlegroups.com
Hi Alexis

For protein partition analysis, can I specify +I+G for each partition? For instance,

LG+I+G, p1 = 1-102, 103-221, 270-423, 474-577, 710-830, 3953-4130, 6658-6980, 6981-7468, 7868-8161, 8851-9005, 9093-9176, 9197-9304
LG+G, p2 = 222-269, 4955-5114
LG+I+G, p3 = 424-473, 1968-2457, 5445-5516
.....

Best,
Haiwei

Alexandros Stamatakis

unread,
Jul 31, 2012, 5:15:00 AM7/31/12
to ra...@googlegroups.com
You can not specify individual rate heterogeneity models for each
partition. One type of rate heterogeneity model will be applied to
all partitions. The rate heterogeneity model is specified via the -m
string in the command line, e.g., if you use a partition file you could
specify -m PROTGAMMALG in which case a GAMMA model will be applied to
all partitions and or -m PROTGAMMAILG in which case a GAMMA+P-Invar
model will be applied to all partitions.

Note that, if you use a partition file, the LG part in the above string
will be ignored and RAxML will just extract the type of rate
heterogeneity model that shall be used.

Also please have a look at the RAxML v704 manual regarding my thoughts
on using GAMMA+P-Invar.

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Reply all
Reply to author
Forward
0 new messages