Masking gappy alignment

303 views
Skip to first unread message

Shyam Saladi

unread,
Jul 15, 2015, 5:33:27 PM7/15/15
to ra...@googlegroups.com
Dear RAxML group,

I have an alignment produced by PASTA that is quite gappy. While there seems to be a good bit of discussion on how RAxML deals with gaps in the group archives, I was wondering about best practices in dealing with the gaps before doing phylogenetic inference. 

With a gappy alignment, should I mask to retain only those positions that have <X% gaps? If so, is there a suggested percentage to mask at?

Are there advantages/disadvantages to masking an alignment before phylogenetic inference calculations?

Thanks for the input,
Shyam

Alexandros Stamatakis

unread,
Jul 15, 2015, 5:36:31 PM7/15/15
to ra...@googlegroups.com
Dear Shyam,

Have a look at this paper here:

http://sysbio.oxfordjournals.org/content/early/2015/07/02/sysbio.syv033.full

Alexis
> --
> You received this message because you are subscribed to the Google
> Groups "raxml" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to raxml+un...@googlegroups.com
> <mailto:raxml+un...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Grimm

unread,
Jul 16, 2015, 8:32:41 AM7/16/15
to ra...@googlegroups.com
Hi Shyam,

in addition to the link Alexi sent. Depending how long the analysis takes, why not making a couple of them using different gap-exclusion proportions and see if it matters anyhow or not. Length-polymorphic regions are notoriously difficult, and it doesn't hurt experimenting.
In the optimal case the relationship you need are equally supported based on the reduced and complete trees.

See also this study

http://la-press.com/article.php?article_id=1734

for a case with extremely divergent and length-polymorphic portions inserted into an other high-conserved core sequence, providing also a set-up to test how strong topology is influence by the alignments

Good luck hunting in the tree space,
Guido


Brian Foley

unread,
Jul 16, 2015, 10:33:39 AM7/16/15
to ra...@googlegroups.com
The HIV Databases at LANL prvide a quick and easy tool, called "GapStreeze" which removes columns from an alignment where one
or more sequences have a gap character.  It can be set to remove columns where more than x% of sequences are represented by a
gap.

http://www.hiv.lanl.gov/content/sequence/GAPSTREEZE/gap.html

The question of whether stripping of masking gappy sites is good or not depends on what questions you are asking about
the phylogeny and evolution.  Most programs will treat a string of six gap characters "------" as six mutations whereas the
more likely evolution is a single deletion event.  Including gap sites is probably good for getting the right tree topology
but not so good for measuring accurate distances within the tree.

But anyway, tools like GAPSTREEZE make it so easy to test the question for yourself, that it seems foolish not
to test at least a couple of settings and see how much difference it makes to your phylogeny.  If both leaving all
gaps in, and stringently removing all gaps results in essentially the same tree, that is a useful thing to know.

Brian Foley, PhD
Reply all
Reply to author
Forward
0 new messages