How to use RAxML EPA for combined DNA+multistate morphology dataset

1,167 views
Skip to first unread message

Martin Fikáček

unread,
Jul 7, 2013, 3:27:56 PM7/7/13
to ra...@googlegroups.com
Hi everybody,

I am trying to use RAxML evolutionary placement algoritm to test the placement of a fossil taxon (for which I only have morphology data) into the tree of recent taxa for which I have both multigene DNA data and the morphology.

I tried to use web server of RAxML EPA, but failed with uploading the data. I tried two possibilities:

- upload it and run using the single gene algoritm (as it also offers to partition the data and upload the partition file), but in this case multistate data cause problems - I only obtained the error message concerning the first occurence of non-aminoacid symbol

- upload it and run using the multigene algoritm, but here I am not sure what to upload as "query reads file" and without it is goes to error as well (moreover I am not sure if multistate data are supported even in this possibility).

Can anybody give me an advice how to run such an analysis, whether such combined data may be in fact run through web server, and what should be the content and format of "query reads file", if this is neccesary for the run?

Thanks a lot in advance!

With best regards

Martin

Alexandros Stamatakis

unread,
Jul 7, 2013, 5:18:28 PM7/7/13
to ra...@googlegroups.com
I assume that you are trying top apply the techniques described in this
paper here:

http://sco.h-its.org/exelixis/pubs/Exelixis-RRDR-2009-1.pdf

For this you will have to download and run RAxML in the command line.

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Luis Palazzesi

unread,
Sep 10, 2013, 7:36:19 AM9/10/13
to ra...@googlegroups.com
Hi Alex,

As Martin, I am also trying to do the same; combine fossil-morphology with a more complete matrix data, as in the paper you published.
The function in the command line is "-f U"; I tried it, but I could not find the way of getting any result. Do you have any example?  (I could not find any at the manual).

Cheers and thanks

Luis

Alexey Kozlov

unread,
Sep 11, 2013, 1:23:10 PM9/11/13
to ra...@googlegroups.com
Hi Luis,

the misconception could be that "-f u" (lowercase!) doesn't perform any placement by itself, but only computes the calibrated weights for morphological sites. These weights can be then used with "-a" option when doing a standard EPA run ("-f v").

So let's say you have the following alignment (test_full.phy):

s1 1001AAA
s2 0110CTA
s3 1111GCA
s4 1000TGA
s5 0000CAA
s6 1101---    <--- fossil


they you can proceed as following:

1. First, you need a reference tree for s1-s5. It could be either built by RAxML from the molecular part of data or constructed by other means. Let it be "reftree.tre"

2. Then, run the weights calibration routine as follows:

./raxmlHPC-SSE3 -s test_bin.phy -f u -n TST -p 12345 -m BINGAMMA -t reftree.tre

Please note, that here we need a subalignment containing the morphological data only (test_bin.phy):

s1 1001
s2 0110
s3 1111
s4 1000
s5 0000
s6 1101    <--- fossil


You could try the exclusion option ("-E") as well.

3. As a result, you will get a text file with the weights (RAxML_weights.TST). No you could either leave the weights as they are, or apply a cutoff as described in the paper. Please note, however, that the weights file must contain values for all alignment sites, so some constant values for the DNA sites must be added, e.g.:

    12 31 31 80 100 100 100
    ----------------------  ----------------------
              ^                       ^
     morphological,          DNA
 computed by RAxML

Let's call this processed weights file "test.weights".

4. Run the standard EPA with the obtained weights:

./raxmlHPC-SSE3 -s test_full.phy -f v -n TST -p 12345 -q test.model -t reftree.tre -m GTRGAMMA -a test.weights

In this case, we're providing the full alignment and a file describing the partitions (test.model). The result will be the EPA placement of the fossil.

Hope this helps.

Alexey

Luis Palazzesi

unread,
Oct 21, 2013, 11:47:43 AM10/21/13
to ra...@googlegroups.com
Hi everyone,

Thanks all you guys for always being there, answering back all request (including the most stupid questions, like the one I am posting now..!!)  
The example given by Alexey helped me a lot. However, I did not understand the part 3). I got the file with the weights, and I added the values of the remaining DNA because, as Alexey said, all alignments sites must contain a certain value. My question is: How can I get the DNA weight values?

Cheers

Luis

Alexandros Stamatakis

unread,
Oct 22, 2013, 3:15:59 AM10/22/13
to ra...@googlegroups.com
Hi Luis,

You don't need the DNA weight values.

Once you have the morpohological weihght values you just run the
evolutionary placement algorithm (-f v option) with the weights to
place the fossils onto the molecular input tree using only the
morphological data partition.

Alexis

Alexey Kozlov

unread,
Oct 22, 2013, 7:28:15 AM10/22/13
to ra...@googlegroups.com
Sorry, it was my mistake in the original reply. Here is the corrected version of HOWTO:

Let's say you have the following alignment (test_full.phy):


s1 1001AAA
s2 0110CTA
s3 1111GCA
s4 1000TGA
s5 0000CAA
s6 1101---    <--- fossil


they you can proceed as following:

1. First, you need a reference tree for s1-s5. It could be either built by RAxML from the molecular part of data or constructed by other means. Let it be "reftree.tre"

2. Then, run the weights calibration routine as follows:

./raxmlHPC-SSE3 -s test_bin.phy -f u -n TST -p 12345 -m BINGAMMA -t reftree.tre

Please note, that here we need a subalignment containing the morphological data only (test_bin.phy):

s1 1001
s2 0110
s3 1111
s4 1000
s5 0000
s6 1101    <--- fossil


You could try the exclusion option ("-E") as well.

3. As a result, you will get a text file with the weights (RAxML_weights.TST). Now you could either leave the weights as they are, or apply a cutoff as described in the paper. Let's call this processed weights file "test.weights".


4. Run the standard EPA with the obtained weights:

./raxmlHPC-SSE3 -s test_bin.phy -f v -n TST -p 12345 -t reftree.tre -m GTRGAMMA -a test.weights

Luis Palazzesi

unread,
Oct 23, 2013, 7:15:13 AM10/23/13
to ra...@googlegroups.com
Hi there,

When I invoke the EPA method with bootstrapping (using the previously computed weight vector, the morphological data of all taxa, and the reference tree as input), all taxa for which only
morphological data is available (fossils in my case) are plotted in a tree, but this resulted tree is not congruent with the reference tree (e.g. the outgroup species is misplaced and some well supported clades are dispersed among several branches). 

Thanks for your constant effort and support,

Luis

Alexandros Stamatakis

unread,
Oct 23, 2013, 7:18:29 AM10/23/13
to ra...@googlegroups.com
Hi Luis,

the reference tree does not change, this must be an issue with the tree
visualization.

Alexis

On 10/23/2013 01:15 PM, Luis Palazzesi wrote:
> Hi there,
>
> When I invoke the EPA method with bootstrapping (using the previously
> computed weight vector, the morphological data of all taxa, and the
> reference tree as input), all taxa for which only
> morphological data is available (fossils in my case) are plotted in a tree,
> but this resulted tree is not congruent with the reference tree (e.g. the
> outgroup species is misplaced and some well supported clades are dispersed
> among several branches).
>
> Thanks for your constant effort and support,
>
> Luis
>
> El martes, 22 de octubre de 2013 12:28:15 UTC+1, Alexey Kozlov escribi�:

Alexandros Stamatakis

unread,
Oct 23, 2013, 7:54:35 AM10/23/13
to Luis Palazzesi, ra...@googlegroups.com
they look pretty congruent to me, please us Dendroscope and apply the
unrooted tree view,

alexis

On 10/23/2013 01:25 PM, Luis Palazzesi wrote:
> Alexis, please, check the resulted trees if you have time.
>
>
> 2013/10/23 Alexandros Stamatakis <alexandros...@gmail.com>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "raxml" group.
>> To unsubscribe from this topic, visit https://groups.google.com/d/**
>> topic/raxml/IgAklHINpzs/**unsubscribe<https://groups.google.com/d/topic/raxml/IgAklHINpzs/unsubscribe>
>> .
>> To unsubscribe from this group and all its topics, send an email to
>> raxml+unsubscribe@**googlegroups.com<raxml%2Bunsu...@googlegroups.com>
>> .
>> For more options, visit https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>

Luis

unread,
Nov 12, 2013, 7:23:54 AM11/12/13
to ra...@googlegroups.com, Luis Palazzesi
Hi there,

I've read the article "Morphology-based phylogenetic binning of the lichen genera Graphis and Allographa (Ascomycota: Graphidaceae) using molecular site
weight calibration" and see you execute a a post-analysis script that parses the EPA output files to determine the phylogenetic bins for each of the morphologically defined species using a Java tool. I found the script of the Java at https://raw.github.com/sim82/java_tools/master/src/ml/EpaBinning.java although I could not find the "eb.jar" file. Please, can you upload or attach the file?

Cheers


Luis



El miércoles, 23 de octubre de 2013 08:54:35 UTC-3, Alexis escribió:
they look pretty congruent to me, please us Dendroscope and apply the
unrooted tree view,

alexis

On 10/23/2013 01:25 PM, Luis Palazzesi wrote:
> Alexis, please, check the resulted trees if you have time.
>
>
> 2013/10/23 Alexandros Stamatakis <alexandros...@gmail.com>
>
>> Hi Luis,
>>
>> the reference tree does not change, this must be an issue with the tree
>> visualization.
>>
>> Alexis
>>
>>
>> On 10/23/2013 01:15 PM, Luis Palazzesi wrote:
>>
>>> Hi there,
>>>
>>> When I invoke the EPA method with bootstrapping (using the previously
>>> computed weight vector, the morphological data of all taxa, and the
>>> reference tree as input), all taxa for which only
>>> morphological data is available (fossils in my case) are plotted in a
>>> tree,
>>> but this resulted tree is not congruent with the reference tree (e.g. the
>>> outgroup species is misplaced and some well supported clades are dispersed
>>> among several branches).
>>>
>>> Thanks for your constant effort and support,
>>>
>>> Luis
>>>
>>> El martes, 22 de octubre de 2013 12:28:15 UTC+1, Alexey Kozlov escribi�:

Alexandros Stamatakis

unread,
Nov 12, 2013, 10:08:36 AM11/12/13
to ra...@googlegroups.com
Hi Luis,

This was developed by a former PhD student of mine who has left the lab.

I'll try to dig it out with him.

Alexis

On 11/12/2013 01:23 PM, Luis wrote:
> Hi there,
>
> I've read the article "Morphology-based phylogenetic binning of the lichen
> genera Graphis and Allographa (Ascomycota: Graphidaceae) using molecular
> site
> weight calibration" and see you execute a a post-analysis script that
> parses the EPA output files to determine the phylogenetic bins for each of
> the morphologically defined species using a Java tool. I found the script
> of the Java at
> https://raw.github.com/sim82/java_tools/master/src/ml/EpaBinning.java
> although I could not find the "eb.jar" file. Please, can you upload or
> attach the file?
>
> Cheers
>
>
> Luis
>
>
>
> El mi�rcoles, 23 de octubre de 2013 08:54:35 UTC-3, Alexis escribi�:
>>
>> they look pretty congruent to me, please us Dendroscope and apply the
>> unrooted tree view,
>>
>> alexis
>>
>> On 10/23/2013 01:25 PM, Luis Palazzesi wrote:
>>> Alexis, please, check the resulted trees if you have time.
>>>
>>>
>>> 2013/10/23 Alexandros Stamatakis <alexandros...@gmail.com <javascript:>>
>>
>>>
>>>> Hi Luis,
>>>>
>>>> the reference tree does not change, this must be an issue with the tree
>>>> visualization.
>>>>
>>>> Alexis
>>>>
>>>>
>>>> On 10/23/2013 01:15 PM, Luis Palazzesi wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> When I invoke the EPA method with bootstrapping (using the previously
>>>>> computed weight vector, the morphological data of all taxa, and the
>>>>> reference tree as input), all taxa for which only
>>>>> morphological data is available (fossils in my case) are plotted in a
>>>>> tree,
>>>>> but this resulted tree is not congruent with the reference tree (e.g.
>> the
>>>>> outgroup species is misplaced and some well supported clades are
>> dispersed
>>>>> among several branches).
>>>>>
>>>>> Thanks for your constant effort and support,
>>>>>
>>>>> Luis
>>>>>
>>>>> El martes, 22 de octubre de 2013 12:28:15 UTC+1, Alexey Kozlov
>> escribi�:
>>>> raxml+unsubscribe@**googlegroups.com<raxml%2Bu...@googlegroups.com<javascript:>>
>>
>>>> .
>>>> For more options, visit https://groups.google.com/**groups/opt_out<
>> https://groups.google.com/groups/opt_out>
>>>> .
>>>>
>>>
>>>
>>>
>>
>> --
>> Alexandros (Alexis) Stamatakis
>>
>> Research Group Leader, Heidelberg Institute for Theoretical Studies
>> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>> Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
>> of Arizona at Tucson
>>
>> www.exelixis-lab.org
>>
>

Alexandros Stamatakis

unread,
Nov 12, 2013, 10:55:11 AM11/12/13
to ra...@googlegroups.com
Hi Luis,

I found it again, it got lost when we updated our web-pages.

You can download the code here:

http://sco.h-its.org/exelixis//resource/download/softwar
e/BinningTool.tar.bz2

Cheers,

Alexis

On 11/12/2013 01:23 PM, Luis wrote:
> Hi there,
>
> I've read the article "Morphology-based phylogenetic binning of the lichen
> genera Graphis and Allographa (Ascomycota: Graphidaceae) using molecular
> site
> weight calibration" and see you execute a a post-analysis script that
> parses the EPA output files to determine the phylogenetic bins for each of
> the morphologically defined species using a Java tool. I found the script
> of the Java at
> https://raw.github.com/sim82/java_tools/master/src/ml/EpaBinning.java
> although I could not find the "eb.jar" file. Please, can you upload or
> attach the file?
>
> Cheers
>
>
> Luis
>
>
>
> El mi�rcoles, 23 de octubre de 2013 08:54:35 UTC-3, Alexis escribi�:
>>
>> they look pretty congruent to me, please us Dendroscope and apply the
>> unrooted tree view,
>>
>> alexis
>>
>> On 10/23/2013 01:25 PM, Luis Palazzesi wrote:
>>> Alexis, please, check the resulted trees if you have time.
>>>
>>>
>>> 2013/10/23 Alexandros Stamatakis <alexandros...@gmail.com <javascript:>>
>>
>>>
>>>> Hi Luis,
>>>>
>>>> the reference tree does not change, this must be an issue with the tree
>>>> visualization.
>>>>
>>>> Alexis
>>>>
>>>>
>>>> On 10/23/2013 01:15 PM, Luis Palazzesi wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> When I invoke the EPA method with bootstrapping (using the previously
>>>>> computed weight vector, the morphological data of all taxa, and the
>>>>> reference tree as input), all taxa for which only
>>>>> morphological data is available (fossils in my case) are plotted in a
>>>>> tree,
>>>>> but this resulted tree is not congruent with the reference tree (e.g.
>> the
>>>>> outgroup species is misplaced and some well supported clades are
>> dispersed
>>>>> among several branches).
>>>>>
>>>>> Thanks for your constant effort and support,
>>>>>
>>>>> Luis
>>>>>
>>>>> El martes, 22 de octubre de 2013 12:28:15 UTC+1, Alexey Kozlov
>> escribi�:
>>>> raxml+unsubscribe@**googlegroups.com<raxml%2Bu...@googlegroups.com<javascript:>>
>>
>>>> .
>>>> For more options, visit https://groups.google.com/**groups/opt_out<
>> https://groups.google.com/groups/opt_out>
>>>> .
>>>>
>>>
>>>
>>>
>>
>> --
>> Alexandros (Alexis) Stamatakis
>>
>> Research Group Leader, Heidelberg Institute for Theoretical Studies
>> Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
>> Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
>> of Arizona at Tucson
>>
>> www.exelixis-lab.org
>>
>

Luis Palazzesi

unread,
Nov 12, 2013, 11:34:03 AM11/12/13
to ra...@googlegroups.com
Great, thanks Alexis!!


2013/11/12 Alexandros Stamatakis <alexandros...@gmail.com>
escribi�:
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raxml/IgAklHINpzs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raxml+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Luis

Luis

unread,
Nov 15, 2013, 8:46:48 AM11/15/13
to ra...@googlegroups.com
Hi there (again!),

I am still trying (hard) to understand everything I can using RAxML for placing fossils.
When I invoke the EPA method with bootstrapping (using -f v) I obtain my gorgeous tree with the inserted fossils and (among other files) two tables: RAxML_classificationLikelihoodWeights and RAxML_classification. These tables have three columns; I assume that the first one is the placement positions of the fossils in the reference trees, but what about the other two columns?
Another short (and silly) observation: When I invoke EPA method (-f v) with bootstrapping, I tried even with 1000 replicates, and the bootstrap analysis (I think) is never conducted (I mean, I don't see the Bootstrap replicates running). 

Thanks guys!!! You're saving my miserable life... =r

Luis

El martes, 12 de noviembre de 2013 13:34:03 UTC-3, Luis escribió:
Great, thanks Alexis!!

Alexandros Stamatakis

unread,
Nov 16, 2013, 8:42:23 AM11/16/13
to ra...@googlegroups.com
Hi Luis,

> I am still trying (hard) to understand everything I can using RAxML for
> placing fossils.
> When I invoke the EPA method with bootstrapping (using -f v) I obtain my
> gorgeous tree with the inserted fossils and (among other files) two tables:
> RAxML_classificationLikelihoodWeights and RAxML_classification. These
> tables have three columns; I assume that the first one is the placement
> positions of the fossils in the reference trees, but what about the other
> two columns?

see here:

https://groups.google.com/forum/?hl=de#!searchin/raxml/EPA$20output/raxml/iIphbkoybOY/1GgpRbTGjLMJ

> Another short (and silly) observation: When I invoke EPA method (-f v) with
> bootstrapping, I tried even with 1000 replicates, and the bootstrap
> analysis (I think) is never conducted (I mean, I don't see the Bootstrap
> replicates running).

The bootstrap option in conjunction with the EPA has been deprecated
since it didn't make a lot of sense.

Alexis

>
> Thanks guys!!! You're saving my miserable life... =r
>
> Luis
>
> El martes, 12 de noviembre de 2013 13:34:03 UTC-3, Luis escribi�:
>>
>> Great, thanks Alexis!!
>>
>>
>>
>> Hi Luis,
>>>
>>> I found it again, it got lost when we updated our web-pages.
>>>
>>> You can download the code here:
>>>
>>> http://sco.h-its.org/exelixis//resource/download/softwar
>>> e/BinningTool.tar.bz2
>>>
>>> Cheers,
>>>
>>>
>>> Alexis
>>>
>>> On 11/12/2013 01:23 PM, Luis wrote:
>>>
>>>> Hi there,
>>>>
>>>> I've read the article "Morphology-based phylogenetic binning of the
>>>> lichen
>>>> genera Graphis and Allographa (Ascomycota: Graphidaceae) using molecular
>>>> site
>>>> weight calibration" and see you execute a a post-analysis script that
>>>> parses the EPA output files to determine the phylogenetic bins for each
>>>> of
>>>> the morphologically defined species using a Java tool. I found the script
>>>> of the Java at
>>>> https://raw.github.com/sim82/java_tools/master/src/ml/EpaBinning.java
>>>> although I could not find the "eb.jar" file. Please, can you upload or
>>>> attach the file?
>>>>
>>>> Cheers
>>>>
>>>>
>>>> Luis
>>>>
>>>>
>>>>
>>>> El mi�rcoles, 23 de octubre de 2013 08:54:35 UTC-3, Alexis escribi�:
>>>>>>> escribi�:
>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>> topic/raxml/IgAklHINpzs/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> raxml+un...@googlegroups.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>>
>> --
>> Luis
>>
>

Luis

unread,
Nov 21, 2013, 5:14:59 AM11/21/13
to ra...@googlegroups.com
Hi there,

First of all, thanks again for your invaluable support. I wish I could do something for you guys... may be some day =]
I was trying to calibrate site weights under parsimony, invoking "-f U", but it did not work. 
Until now, I have been using (under ML) "-f u", but I would like to compare both weights.

Looking forward to meeting you all in Crete!!!

Abrazo


Luis

Alexandros Stamatakis

unread,
Nov 21, 2013, 6:48:30 AM11/21/13
to ra...@googlegroups.com
Hi Luis,

> First of all, thanks again for your invaluable support. I wish I could do
> something for you guys... may be some day =]

You (and others on here) may send us chocolates, the address is:

Alexandros Stamatakis
Scientific Computing Group
Heidelberg Institute for Theoretical Studies
Schloss-Wolfsbrunnenweg 35
D-69118 Heidelberg

> I was trying to calibrate site weights under parsimony, invoking "-f U",
> but it did not work.

Apparently, I removed the parsimony option from RAxML, I don't remember
why, but I think it is because ML worked better.

> Until now, I have been using (under ML) "-f u", but I would like to compare
> both weights.
>
> Looking forward to meeting you all in Crete!!!

See you there :-)

un abrazo,

alexis

>
> Abrazo
>
>
> Luis
>
> El domingo, 7 de julio de 2013 20:27:56 UTC+1, Martin Fikáček escribió:
>>
>> Hi everybody,
>>
>> I am trying to use RAxML evolutionary placement algoritm to test the
>> placement of a fossil taxon (for which I only have morphology data) into
>> the tree of recent taxa for which I have both multigene DNA data and the
>> morphology.
>>
>> I tried to use web server of RAxML EPA, but failed with uploading the
>> data. I tried two possibilities:
>>
>> - upload it and run using the single gene algoritm (as it also offers to
>> partition the data and upload the partition file), but in this case
>> multistate data cause problems - I only obtained the error message
>> concerning the first occurence of non-aminoacid symbol
>>
>> - upload it and run using the multigene algoritm, but here I am not sure
>> what to upload as "query reads file" and without it is goes to error as
>> well (moreover I am not sure if multistate data are supported even in this
>> possibility).
>>
>> Can anybody give me an advice how to run such an analysis, whether such
>> combined data may be in fact run through web server, and what should be the
>> content and format of "query reads file", if this is neccesary for the run?
>>
>> Thanks a lot in advance!
>>
>> With best regards
>>
>> Martin
>>
>

Reply all
Reply to author
Forward
0 new messages