Mascot with SNP or core-alignement for phylogeographic analysis

178 views
Skip to first unread message

Karine Durand

unread,
Feb 7, 2019, 4:07:17 AM2/7/19
to beast-users
Hello,


A question I often ask myself is: how to choose genes for BEAST analysis?

It is a pity to limit to 10 genes when genomic data is available.

When bacteria are recombinant there are often hotspots in the genomes and the choice of some random genes then brings a bias.

To make the most of the available data, is it possible to perform a core-alignment analysis or would it take too much computing  time ?

or is it better to use SNPs? in this case, it is obviously necessary to  modify the xml infile to provide an approximate count of constant sites

Best,

KARINE


Nicola de maio

unread,
Feb 7, 2019, 10:15:53 AM2/7/19
to beast-users
Hi,

If recombination is not too abundant, I would recommend using a tool like Gubbins or ClonalFrameML to filter out recombinations, then use the filtered alignment as input in BEAST.
Yes, feeding SNPs and approximate numbers of fixed sites to BEAST is faster, although not strictly speaking necessary.
A way to do it is described here: https://groups.google.com/forum/#!searchin/beast-users/SNPs/beast-users/V5vRghILMfw/jMtC_DwS5EYJ
The running time will depend on the number of SNPs and samples rather than on genome size.
I have done it before with a few hundred bacterial genomes, it was feasible, although I was not using MASCOT.

If recombination is considerable, it might make more sense to infer the tree directly in ClonalFrameML or Gubbins, and fix this tree from input in the BEAST analysis.

If recombination is very prevalent, say like in H. pylori, it makes probably more sense to use a structure-like approach instead of a phylogenetic one like BEAST: https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006546

Nicola

Alexei Drummond

unread,
Feb 7, 2019, 3:23:06 PM2/7/19
to 'Gideon Pisanty' via beast-users
BTW: The BACTER package in BEAST2 can directly estimate gene conversion events using a full Bayesian implementation of the ClonalFrame model. Not sure how well it scales, but it is a nice model.


--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at https://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Karine Durand

unread,
Feb 8, 2019, 12:30:21 PM2/8/19
to beast-users
Thank you for the answers 

I don't want to infer the phylogeny, so I keep all the recombinations sites because I want to infer the migration matrix between the populations with BASTA or Mascot.
Cheers,
Karine

Nicola de maio

unread,
Feb 9, 2019, 12:34:52 PM2/9/19
to beast-users
I see; however, I would still think about recombination, even if the phylogeny is not your primary aim of inference.
This is because MASCOT, BASTA, and in general all phylogeographic models assume a unique phylogeny.
Recombination breaks this assumption, as it causes different parts of the genome to have different phylogenetic histories.
I am not aware of scientific papers showing biasing effects of bacterial recombination on phylogeography, but it has been shown that not accounting for recombination causes biases in phylodynamics, see for example mbio.asm.org/content/5/6/e02158-14.short .
Of course, the amount of bias one would expect depends on the intensity of recombination, so, depending on the dataset, it might be negligeable or strong.

Best wishes,
Nicola
Reply all
Reply to author
Forward
0 new messages