Long branch attraction in Potyvirus phylogenetic tree reconstruction

53 views
Skip to first unread message

Tatiana Brailovskaya

unread,
Jul 13, 2017, 7:43:49 PM7/13/17
to raxml
Hello, 

I am trying to construct a phylogenetic tree of the Potyvirus genus using RAxML with amino acid sequences of the Potyvirus polyprotein. I have 142 species in my data set. I constructed a multiple sequence alignment in PASTA and MAFFT, then trimmed the alignments in trimAL. I also tried removing some spurious sequences (i.e. sequences < 2000 residues long, which is shorter than the majority of the polyproteins) and used MAFFT again to construct a multiple sequence alignment and then trimmed it in trimAL. Using each of these multiple sequence alignments I constructed a tree with RAxML-HPC2 on XSEDE (8.2.10) through CIPRES portal. The resultant tree has one set of branches that are significantly longer than the others, for each of these alignments. Using MAFFT on sequences that are > 2000 amino acids performed best (i.e. the least discrepancy in branch lengths between the longer branches and the rest of the tree). I believe this is an instance of long branch attraction and I am not sure how to fix this. I am attaching the multiple sequence alignment (done in MAFFT, trimmed, with spurious sequences removed) as well as the tree I got from this alignment. I used the following parameters: 

raxmlHPC-HYBRID -T 4 -N autoMRE -n result -s infile.txt -p 12345 -m PROTGAMMAAUTO -f a -x 12345

Thank you!
Tatiana

trimmed_algn_mafft_seq_longer_than_2000
ML_tree_mafft_algn_seq_len_more_than_2000.txt

Alexandros Stamatakis

unread,
Jul 14, 2017, 7:15:23 AM7/14/17
to ra...@googlegroups.com
Dear Tatiana,

> I am trying to construct a phylogenetic tree of the Potyvirus genus
> using RAxML with amino acid sequences of the Potyvirus polyprotein. I
> have 142 species in my data set. I constructed a multiple sequence
> alignment in PASTA and MAFFT, then trimmed the alignments in trimAL.

Regarding trimming, it is rather debatable if it is actually required,
see here:

https://academic.oup.com/sysbio/article/64/5/778/1685763/Current-Methods-for-Automated-Filtering-of

I also personally think that it is not required.

> I
> also tried removing some spurious sequences (i.e. sequences < 2000
> residues long, which is shorter than the majority of the polyproteins)
> and used MAFFT again to construct a multiple sequence alignment and then
> trimmed it in trimAL. Using each of these multiple sequence alignments I
> constructed a tree with RAxML-HPC2 on XSEDE
> <https://www.phylo.org/portal2/createTask!selectTool.action?selectedTool=RAXMLHPC2_TGB>
> (8.2.10) through CIPRES portal. The resultant tree has one set of
> branches that are significantly longer than the others, for each of
> these alignments.

Well this may just indicate that there is some sort of subtype here or
that the taxon sampling was uneven.

You may try to infer the tree again with RAxML-NG
https://github.com/amkozlov/raxml-ng which is a de novo implementation
of RAxML to see if you get similar results.

Also, I don't really understand what exactly the problem might be with
this variance in branch lengths. Long branch attraction usually refers
to the fact that the topology might be incorrect because of long branches.

Is this the case for your dataset or is there just this variance in
branch lengths? If you were expecting more similar branches, then maybe
the density of your taxon sampling varies and is not uniform.

You may also want to assess how many different ML trees you can infer
that don't substantially differ in their likelihood scores.

Finally, inferring support values would help in assessing this as well.

All the best,

Alexis

> Using MAFFT on sequences that are > 2000 amino acids
> performed best (i.e. the least discrepancy in branch lengths between the
> longer branches and the rest of the tree). I believe this is an instance
> of long branch attraction and I am not sure how to fix this. I am
> attaching the multiple sequence alignment (done in MAFFT, trimmed, with
> spurious sequences removed) as well as the tree I got from this
> alignment. I used the following parameters:
>
> raxmlHPC-HYBRID -T 4 -N autoMRE -n result -s infile.txt -p 12345 -m
> PROTGAMMAAUTO -f a -x 12345
>
> Thank you!
> Tatiana
>
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology

www.exelixis-lab.org
Reply all
Reply to author
Forward
0 new messages