recently diverged bacterial dataset

124 views
Skip to first unread message

Davide

unread,
May 22, 2014, 8:49:26 AM5/22/14
to dppdiv...@googlegroups.com
Dear Tracy and all,

In my current work I'm trying to estimate the origin of a bacterial 'clone' that is quickly spreading (estimated origin around 20 years ago).

My dataset is composed of 187 genomes of this bacterium.

I extracted the core SNPs (16853 SNPs) and performed phylogeny with Fasttree.

I then dated one node for calibration (based on epidemiological data, it should have originated 4 years ago) and ran PPL-DPPDiv using the snp alignment and the tree obtained with Fasttree.

To do this analysis I used default settings:

./dppdiv-par-sse -in [SNP_alignment.phy] -tre [newick_rooted_tree_without_bootstraps] -out [output_prefix] -cal [calibration_file] -n 10000 -sf 10 

When the PPL-DPPDiv run is over (very quick, nice!), I get the final ultrametric tree, but I seem to have an underestimation of the divergence time (I get 8 years instead of the expected 18).

So my questions are:

1. Is PPL-DPPDiv a suitable tool for estimating divergence times in such a small timeframe?

2. Is a SNPs alignment a good input for PPL-DPPDiv?

3. Are the default settings ok for my dataset?

4. I'm currently trying to detect possibly recombined sites to exclude them both from the phylogenetic analysis and from the PPL-DPPDiv analysis. To do so I'm using BRATNextGen. Does this sound like a good plan?


Thank you so much, best

D.

Alexandros Stamatakis

unread,
May 22, 2014, 10:31:09 AM5/22/14
to dppdiv...@googlegroups.com, Tomas Flouri, Diego Darriba
Hi Davide,

There is one main concern I have with your approach:

For building phylogenies from SNPs you should not use the standard
likelihood model of DNA susbtitution, but one that corrects for
ascertainment bias, i.e., the fact that a SNP alignment contains only
variable sites while there are of course a lot of variable sites as well
that are just not included in the alignment.

Thus, to build a tree you can use RAxML that does offer that sort of
likelihood correction (just search for ascertainment bias in the RAxML
google group).

Then, you should also use a divergence time program that has this sort
of correction. Accommodating asc. bias has mainly an effect on branch
lengths, hence your estimates might change.

Alexis
--
Alexandros (Alexis) Stamatakis

Research Group Leader, Heidelberg Institute for Theoretical Studies
Full Professor, Dept. of Informatics, Karlsruhe Institute of Technology
Adjunct Professor, Dept. of Ecology and Evolutionary Biology, University
of Arizona at Tucson

www.exelixis-lab.org

Tracy Heath

unread,
May 22, 2014, 1:49:29 PM5/22/14
to dppdiv...@googlegroups.com
Hi Davide,

Yes, Alexis is correct. You do not want to perform divergence-time estimation on a dataset of SNPs if you do not have a model that accounts for how the data were ascertained. DPPDiv does not have such a model. (It's possible that BEAST does, but I am not sure.)

I also have some comments about your analysis and questions. First, it is unlikely that the default settings are suitable for your dataset (or anyone's for that matter), many of the default settings are specific to test datasets and not intended as general settings for users. Furthermore, running only 10,000 iterations (-n 10000) is not a good idea. This is far too few MCMC cycles to adequately sample the posterior, particularly for any complex model. Second, the method does not give a "final tree". This is the same for any Bayesian method, which sample the posterior distribution of trees and parameters. Thus, the output you get are files containing the MCMC samples of your parameters. You must then summarize these using a program that analyzes such things. I recommend using TreeAnnotator (from BEAST) or SumTrees (part of DendroPy) to obtain a summary tree and times from DPPDiv. And using Tracer or R to assess the marginal posterior densities of your other parameters. 

Cheers!
Tracy




--
You received this message because you are subscribed to the Google Groups "dppdiv-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dppdiv-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Davide

unread,
May 26, 2014, 4:43:00 AM5/26/14
to dppdiv...@googlegroups.com
Dear Alexis and Tracy,

thank you for your quick and informative replies.

So I guess I'll have to go RaxML and then Beast.
I found this article : Testing Spatiotemporal Hypothesis of Bacterial Evolution Using Methicillin-Resistant Staphylococcus aureus ST239 Genome-wide Data within a Bayesian Framework

They use Beast and they test the presence of ascertainment bias. From what I understand they find that the bias is not too strong, and that the most important parameter is the relaxed clock. I will try to replicate their analysis on my dataset.

Thank you again
D.

Reply all
Reply to author
Forward
0 new messages