How to construct maximum-likelihood tree of chloroplast genomes

wes

unread,

May 6, 2021, 12:27:09 PM5/6/21

to raxml

Hi All,

I had obtained a complete chloroplast genome of a plant. Next, I had aligned all the chloroplast CDS (amino acid sequence) from 15 different organisms using MAFFT online service.

May I know how to construct maximum-likelihood tree using RAxML with bootstrap probability values set to 1000 replicates? Is the command line below correct?

raxmlHPC -p 12345 -x 12345 -# 100 -m PROTGAMMAAUTO -s plastome_alignment_2.fasta -n AUTO -T 32

Alexandros Stamatakis

unread,

May 6, 2021, 2:44:26 PM5/6/21

to ra...@googlegroups.com

That looks okay, but this command will compute 100 rapid BS replicates.

All this is specified in the RAxML manual.

However, I would recommend that you switch to the new RAxML version
called RAxML-NG:

https://github.com/amkozlov/raxml-ng

Alexis

> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/raxml/73a1e7ac-1560-4aff-9199-32a094236080n%40googlegroups.com
> <https://groups.google.com/d/msgid/raxml/73a1e7ac-1560-4aff-9199-32a094236080n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Researcher, Evolutionary Genetics and Paleogenomics (EGP) lab,
Institute of Molecular Biology and Biotechnology, Foundation for
Research and Technology Hellas

www.exelixis-lab.org

Ching Ching WEE

unread,

May 7, 2021, 1:52:25 PM5/7/21

to ra...@googlegroups.com

Thanks Alexandros for pointing me to use Raxml-ng and I had listened to the talk on RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference at the Youtube channel.

However, I have several questions

1. To perform sanity check, the cmd below is suggested

raxml-ng --check --msa bad.fa --model GTR+G

However, since my alignment data is amino acid sequence, is the cmd below correct?

raxml-ng --check --msa bad.fa --model LG ;or

raxml-ng --check --msa bad.fa --model LG+G

2. I notice most of the examples given are for DNA alignment. Does it mean DNA alignment is preferred over amino acid alignment for constructing a maximum likelihood tree?

3. To construct a maximum-likelihood tree using RAxML with bootstrap probability values set to 1000 replicates using aligned chloroplast CDS (amino acid sequence) from 15 different organisms, is the command line below correct?

raxml-ng --all --msa plastome_alignment_3.fasta --model LG+G8+F --tree pars{10} --bs-trees 1000 --prefix chloro_tree

4. Is there any other detailed explanation for:

+G (discrete GAMMA with 4 categories, mean category rates, ML estimate of alpha)

+F or +FC (empirical)

Is these parameters +G , +F , --threads important to be included or can be left out for the system to choose the best parameter to run?

Thanks!

To unsubscribe from this group and stop receiving emails from it, send an email to raxml+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/raxml/57cf56da-292b-815e-0ff4-55d1ed8be0ae%40gmail.com.

Grimm

unread,

May 10, 2021, 4:38:03 AM5/10/21

to raxml

Hi Wes,

Regarding questions 1 and 3 (and 4), as somebody new to the wonderful world of tree inference using RAxML, you first may want to check out the splendid documentation by Alexej, Alexi and friends:

The github-wiki: https://github.com/amkozlov/raxml-ng/wiki

The hands-on-manual: https://github.com/amkozlov/raxml-ng/wiki/Tutorial

It's pretty much all covered there what one needs to know about how-to-run an analysis and basic backgrounds on the optional parameters (e.g. why and how many threads to use, too many for too small data sets, few tips like yours, will slow down analysis)

Regarding question 2: it entirely depends on the signal/taxon sample, i.e. wether 1) you look at deep or flat plastid relationships; 2) which group of organisms you collected in your complete plastomes.

AA data will give you a, to some degree positively filtered (involving non-neutral evolution, hence, one applies special AA substitution models), deep-ish signal. Nucleotides will give you up two three differently filtered signals, the 1st and even more the 2nd codon position are relatively conserved to maintain the protein's structure, hence focus on deeper splits but are invariable (or even somewhat noise) when it comes to younger divergences. The 3rd codon position may be saturated in case of the deep splits, i.e. supporting arbitrary connections conflicting with the 1st and 2nd, or the only one providing any discriminative signal at all in case of young splits.
Regarding the taxonomic breadth of your data, and a fully comprehensive tree inference (using different partitioning schemes) check out, for instance, this open access paper by Walker et al. in PeerJ. The complete-plastome based all-angiosperm tree is effectively a two-gene tree: rpoC2 + matK (I highlighted some aspects in a mini-series on the Genealogical World of Phylogenetic Networks [pt1: The Mighty matK][pt 2: A Thicket of Trees][pt 3: Conflict or not?]). While, in e.g. extra-tropical trees at the genus-level, where the plastomes are poorly sorted during speciation, the signal from complete plastomes may be much more complex, and if one would use protein-coding gene nucleotide sequences or their AA translation, it would only lead you towards a so-called "star-tree" as all phylogenetically discriminative information is concentrated in certain non-coding intergenic spacer regions (most commonly known most-variable is currently the trnH-psbA intergenic spacer)

In conclusion, since one often does not know the data situation, and RAxML is very speedy, one is well advised to always to both: infer an AA-based tree, and a (codon-position sensitive: classic is 1st and 2nd as one partition, 3rd as second partition – refer to the wiki/tutorial on how to do define a partition file for RAxML inference) nucleotide tree. And then just compare the two outcomes (especially watch out for branches with BS < 100 support)