How to set the Mkv model in BEAST

736 views
Skip to first unread message

cmh

unread,
Jul 26, 2013, 1:44:52 PM7/26/13
to beast...@googlegroups.com
Hello all,
Is there away to set the Mkv (Ref. 1) model in BEAST for using SNP datasets...where all the data are variable? I am a bit conflicted about the model, as it seems as it should be the same as JC69 for nucleotide datasets as long as the gamma and invariant sites parameters are not selected. Does this seem correct to anyone else? Recent publications (see below; 2 & 3) and at least one journal editor have suggested that this is the model of choice for SNP datasets and will correct for overestimation of evolutionary rates and branch lengths.

1) Lewis, PO. 2001. A Likelihood Approach to Estimating Phylogeny from Discrete Morphological Character Data. Systematic Biology 50:913-925.

2) Pearson, T, HM Hornstra, JW Sahl, et al. 2013. When Outgroups Fail; Phylogenomics of Rooting the Emerging Pathogen, Coxiella burnetii. Systematic Biology.

3) Yoder, JB, R Briskine, J Mudge, et al. 2013. Phylogenetic signal variation in the genomes of Medicago (Fabaceae). Systematic Biology.

Thank you,
Crystal


cmh

unread,
Jul 30, 2013, 5:43:42 PM7/30/13
to beast...@googlegroups.com
I wanted to share a response I got from Derrick Zwickl, Garli developer. It makes quite a bit of sense regarding use of the Mk and Mkv models versus JC. Now, is it a model that can be used implemented in BEAST?

Question:

In different software packages, there are options to select gamma and invariant sites models. If I do not select the invariant sites model, since there are no positions that are invariant in SNP datasets, wouldn't the JC69 alone result in the Mkv model for nucleotides? I have also posted a similar question in the BEAST user groups forum, as I would like to incorporate the Mkv model in a phylogenetic analysis, using SNP data, for the purpose of timing difference MRCA between bacterial strains. Again, thank you for your help.

Response:

I see your thought process, but no, not quite.  The invariant sites model models a class of invariable sites, i.e., ones with a rate of zero that cannot change.  Such sites can exist whether or not one includes invariant sites in their model.  What Mkv does is very different, it conditions on (~accounts for) a class of sites that _cannot_ be observed, which is different from sites that _are not_ observed.  In other words, not including a class of sites in your model (via invariant sites) is not the same as correcting for the unobservability of those sites.


Alexander Alekseyenko

unread,
Aug 1, 2013, 3:20:13 PM8/1/13
to beast...@googlegroups.com
Dear Crystal,

Accounting for "sites that _cannot_ be observed" is implemented to some degree in BEAST with ascertainment correction, which one can instantiate using <ascertainedPatterns> instead of regular <patterns> XML element. See example file in examples/testXML/testAscertainment.xml. Also see testLewisMk.xml for specification of the Mk model.

Cheers,
Alex

cmh

unread,
Aug 14, 2013, 3:45:09 PM8/14/13
to beast...@googlegroups.com
Hi Alex,
Thanks for your reply. Could you (or anyone) also provide an explanation on how to implement the testLewis.xml into my xml? I have an .xml that I have created in BEAUTI (GUI based) with the GTR substitution model, uncorrelated lognormal clock model, and BSC demographic models. I was guessing (probably naively) that I just needed to go into my starting xml, remove all instances where base frequencies occur or where GTR is mentioned, and paste in the testLewis.xml. This is not working though, as I BEAST keeps crashing.

Another question: In cases where I have stripped gapped positions from my alignment, am I correct in altering kst to 4 and setting the frequencies to 25% for each (rather than 20%)?

For accounting for "sites that _cannot_ be observed", I changed all instances of <patterns> to <ascertainedPatterns> as you suggested.

Thanks for all your help.
Crystal

cmh

unread,
Aug 21, 2013, 1:50:03 PM8/21/13
to beast...@googlegroups.com, talima....@nau.edu
Hello all,

I have been trying to incorporate the Lewis Mk(v) (from the testLewisMk.xml) model into my analysis, but it keeps crashing. I have attached the final xml that causes the major crash and while I have been trying to troubleshoot this for a good 1.5 weeks, I am not getting very far. There are multiple ongoing bacterial SNP genomic projects in my center that are relying on this model working, so I (and many others) will be grateful for and will acknowledge any helpful suggestions that can generate a successful run (with or without convergence). We do not get convergence using standard incorporated substitution models even when different clock models and demographic models are selected.

For reference, I started out with an xml that I made using the HKY substitution model, strict clock model and constant demographic model. Then, I removed the substitution model element and pasted in the testLewisMk.xml lines from the examples folder since the LewisMk model will be the custom substitution model for my analysis.

Then I restarted BEAST and not surprisingly got this error:
BEAST has terminated with an error. Please select QUIT from the menu...
and further down...
Parsing error - poorly formed BEAST file, Recoded2SNPs_LMk.xml:
Object with idref=kappa has not been previously declared.


Next, I found two instances where kappa was referred to (<parameter idref="kappa"/>) and removed the lines since I am no longer using the HKY model. Then I restarted BEAST and it crashed, but I was able to take a quick screen shot before it completely closed down. The screen shot is attached:

Again, any help is most welcome.
Thank you,
Crystal
 
Recoded2SNPs_LMk.xml

Alexander Alekseyenko

unread,
Aug 22, 2013, 10:36:22 AM8/22/13
to beast...@googlegroups.com, talima....@nau.edu
Dear Crystal,

It looks like the state-space of the lewis model you define doesn't match that of the alignment (patterns) that you provide. This is causing the errors you are seeing. The specification of the LewisMk model that you provide expects the patterns to have states 1,2,3,4,5, while your alignment is over A,C,G,T. It is not clear to me what you are trying to do here. Moreover, you *may* need to define wildcards (i.e. for ?) in the lewisMk specification.

Cheers,
Alex

Roxana Capper

unread,
Aug 28, 2013, 4:26:54 PM8/28/13
to beast...@googlegroups.com
Hi Crystal,

I too am trying to use the Mk model in BEAST with only SNP data (no invariant sites).  I have also had a difficult time adopting this model into something that plays nicely with BEAST!  As far as your troubles go, I had the same problem with the same crashes and the same error message that I also managed to capture via screen shot.  For me, if I remember correctly, the solution was to change the state count in the Mk model to this:

<kStateType id="kst" stateCount="4" startWith="1"/>

 <frequencyModel id="freqModel">
        <kStateType idref="kst"/>
        <frequencies>
            <parameter dimension="4" value="0.25 0.25 0.25 0.25" id="model.frequencies"/>
        </frequencies>
    </frequencyModel>

    <lewisMk id="lewisModel" totalOrder="false">
        <frequencies>
            <frequencyModel idref="freqModel"/>
        </frequencies>
        <order state="1" adjacentTo="2"/>
        <order state="3" adjacentTo="2"/>
        <order state="3" adjacentTo="4"/>
    </lewisMk>

I have no idea why this worked because in my data, I actually do have five states: A, C, G, T and N, but it fixed the crashing problem for me.  If anyone has comments about why this works or how to better code SNPs with the Mk model, I would love to hear them!  

In addition to the above, I've been having a problem of my own: I can begin a BEAST run using the Mk model on my local computer, which runs and finishes just fine.  However, when I upload the same xml file to a cluster which is running the same version of BEAST (1.7.4) and a higher version of java, I get the following error message:

[Ljava.lang.Object;@33bc3d08
Constructing 4-state datatype
Parsing error - poorly formed BEAST file, test.xml:
The '<lewisMk>' element with id, 'lewisModel', is incorrectly constructed.
The following was expected:
Exactly one ELEMENT of name frequencies REQUIRED containing
    Exactly one ELEMENT of type FrequencyModel REQUIRED

Why would my file run fine in one environment but terminate in another?  I've attached an example file if anyone would be so kind as to look over it for me...

Thanks!
Roxana
test.xml
Reply all
Reply to author
Forward
0 new messages