analysis of variable sites only in BEAST or MrBayes

2,596 views
Skip to first unread message

HeidiS

unread,
Sep 19, 2013, 10:01:51 AM9/19/13
to beast...@googlegroups.com

Is it appropriate to use ONLY variable sites  to construct a Bayesian tree?  I will have a large data set produced by GBS (genotyping by sequencing) and thus it will contain SNPs (variable sites) only without any neighboring/invariate sequence. It also will have lots of missing data.  I am evaluating what options I have to reconstruct a phylogeny based on such data.  Is Bayesian Analysis (BEAST or Mr Bayes) an option?  I'd also appreciate references that address this topic. Many thanks for your input  

Heidi



Andrew Rambaut

unread,
Sep 19, 2013, 11:44:17 AM9/19/13
to beast...@googlegroups.com
The answer to your question is generally, no, it is not advisable to only use variable sites. However there is an easy way to correct for constant sites by specifying the number of constant sites (these could even be relatively approximate, i.e., the overall base frequency times the sequence length minus the number of variable sites). Load the variable sites as normal in BEAUti, generate the BEAST XML and then edit it as follows:

Look for the pattern list:

<patterns id="patterns" from="1" every="1" >
<alignment idref="alignment"/>
</patterns>

and replace it with this:

<mergePatterns id="patterns">
<patterns from="1" every="1">
<alignment idref="alignment"/>
</patterns>

<constantPatterns>
<alignment idref="alignment">
<counts>
<parameter value="555 434 543 432"/>
</counts>
</constantPatterns>
</mergePatterns>

Where the numbers in value="..." are the counts of constant sites of A, C, G & T

Best,
Andrew

Schwaninger, Heidi

unread,
Sep 19, 2013, 12:00:41 PM9/19/13
to beast...@googlegroups.com
Dear Andrew,
Thank you so much for this information. In the meantime I also read some about SNAPP and it is related to BEAST. It is meant to be used with SNPs. Would you recommend it for my type of data? (I realize it is for species trees).
Many thanks for your input.

Heidi
--
You received this message because you are subscribed to a topic in the Google Groups "beast-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beast-users/V5vRghILMfw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/groups/opt_out.





This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

Andrew Rambaut

unread,
Sep 19, 2013, 12:06:19 PM9/19/13
to beast...@googlegroups.com
It depends what you mean by SNPs - if these are just variable sites obtained using normal sequencing but you have removed the invariant sites then you can use the method I described. However, if the SNPs are known polymorphic sites and you have collected them (either by sequencing or a SNP chip or something) then there will be an ascertainment bias towards shared polymorphisms (it will be missing polymorphisms unique to one taxon). I think SNAPP can help with this or else there is an Ascertained data option in BEAST but I am not familiar with how to use it - it was written by Alex Alekseyenko who might be able to respond?

Andrew
> You received this message because you are subscribed to the Google Groups "beast-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.

Schwaninger, Heidi

unread,
Sep 19, 2013, 1:37:58 PM9/19/13
to beast...@googlegroups.com
GBS uses Illumina platform to sequence fragment ends in a highly multiplexed way after a digestion with a specific enzyme, bar codinf and PCR. After aligning many very short reads to a reference genome sequence and lots of filtering a SNP genotype table is available. Thus, it appears that this data does not have an ascertainment bias (there may be a PCR bias). It is unlikely that the info on number of constant sites is available (I have not done the pipeline myself, I will inquire). Does this change your advice below (both BEAST and SNAPP)?
Many thanks for your input

Alexei Drummond

unread,
Sep 19, 2013, 6:36:48 PM9/19/13
to beast...@googlegroups.com
If the data are SNPs and from multiple species, then SNAPP is appropriate. Unless the whole genome is evolving clonally (i.e no recombination between SNPs) you can't just concatenate the data and apply a single gene tree. SNAPP accounts for the fact that different SNPs have evolved on different gene trees.

Alexei

Sent from my iPhone

Schwaninger, Heidi

unread,
Sep 23, 2013, 8:02:32 AM9/23/13
to beast...@googlegroups.com
Thanks so much for all of your feedback, it is very helpful! Heidi

BernieC

unread,
Mar 25, 2017, 12:23:38 PM3/25/17
to beast-users


On Thursday, September 19, 2013 at 3:01:51 PM UTC+1, Andrew Rambaut answered a question about using only variable sites, by explaining how to modify the xml infile to provide an approximate count of constant sites without the burden of analysing them all. His text was: 

However there is an easy way to correct for constant sites by specifying the number of constant sites (these could even be relatively approximate, i.e., the overall base frequency times the sequence length minus the number of variable sites). Load the variable sites as normal in BEAUti, generate the BEAST XML and then edit it as follows: 

Look for the pattern list: 

       <patterns id="patterns" from="1" every="1" > 
               <alignment idref="alignment"/> 
       </patterns> 

and replace it with this: 

       <mergePatterns id="patterns"> 
               <patterns from="1" every="1"> 
                       <alignment idref="alignment"/> 
               </patterns> 

               <constantPatterns> 
                       <alignment idref="alignment"> 
                       <counts> 
                               <parameter value="555 434 543 432"/> 
                       </counts> 
               </constantPatterns> 
       </mergePatterns> 

Where the numbers in value="..." are the counts of constant sites of A, C, G & T 

I wish to implement Andrew's procedure but something is wrong with my attempt to do so. Can anyone familiar with xml please spot the error?

My text replacement, inserted as suggested, follows here, but attempts to run it bring the warning that "parameter value" should be followed by values or close tags. Yet those are present. I have tried enclosing each value in its own pair of " " and I have tried writing them as "A=692" etc, to no avail, and I have searched for missing slashes, etc, but without finding any. Very frustrating! It seems simple enough! Difficult to believe that Andrew can have mis-written the necessary syntax, yet I can see no error in mine, although I am sure it will prove to be my fault somehow..

Please help!

Bernie Cohen

My replacement as per Rambaut:

<patterns from="1" every="1">
<alignment idref="alignment"/>
</patterns>
<constantPatterns>
<alignment idref="alignment">
<counts>
<parametervalue="692" "703" "562" "665"/>
</counts>
</constantPatterns>
</mergePatterns>

Andrew Rambaut

unread,
Mar 25, 2017, 4:00:50 PM3/25/17
to beast...@googlegroups.com
Hi Bernie,

Change this line:

<parametervalue="692" "703" "562" "665"/>

to:

<parameter value="692" "703" "562" "665"/>

(note the space).

Best,
Andrew

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.

Andrew Rambaut

unread,
Mar 25, 2017, 4:14:32 PM3/25/17
to beast...@googlegroups.com
Actually you need to change it to:

<parameter value=“692 703 562 665”/>

A.

BernieC

unread,
Mar 26, 2017, 11:38:49 AM3/26/17
to beast-users


On Saturday, March 25, 2017 at 8:00:50 PM UTC, rambaut wrote:
Hi Bernie,

Change this line:

<parametervalue="692" "703" "562" "665"/>

to:

<parameter value="692" "703" "562" "665"/>

 Many thanks, Andrew. A great comfort to know that posts here are seen! Alas that did not solve the problem; it now complains that the closing Beast tag is absent, but it is present. Presumably something is stopping the read in the patterns block, but I cannot see what it might be and therefore give it here together with what immediately precedes and follows, in an attached file. NB the parameter values are altered.

While I have your attention may I remark that to run 1.8 for a partitioned analysis has been done in the past (a year or more ago) but it again will not run ?because? OS updates to 10.9.5 and 10.10.5 have happened meantime in different machines. (Beast 2 has never run in any of my machines as yet, but probably I don't need it as I am almost certainly working on my last-ever MS. Age withers!)

Thanks in advance for further help.
Bernie
extract

Andrew Rambaut

unread,
Mar 26, 2017, 11:48:12 AM3/26/17
to beast...@googlegroups.com
Try this:

<constantPatterns>
<alignment idref=“alignment"/>
<counts>
<parameter value="15621 15374 19667 14548"/>
</counts>
</constantPatterns>

It looks like my original posting was slightly wrong, not having the closing /> in the <alignment … element.

A.

<extract>

Andrew Rambaut

unread,
Mar 26, 2017, 11:50:43 AM3/26/17
to beast...@googlegroups.com
Hi Bernie,

I am not sure I understand this paragraph. Are you using BEAST v1.8.4? When you say ‘will not run’, in what way precisely does it not run?

A.

On 26 Mar 2017, at 16:38, BernieC <bernar...@gmail.com> wrote:

Yaser Alsahafi

unread,
May 16, 2017, 1:40:38 AM5/16/17
to beast-users
Hi Bernie and Andrew,
A BEAST warring keeps coming up in my run and the job got terminated.
"Likelihood component, null, created but not used in the MCMC"

I am trying to run a job on BEAST for SNPs alignment (variable sites extracted from core genomes) the alignment have 32 taxa and 41037 variable sites. I created an XML file on BEAUti V1.8.4 with the following parameters:
Substitution model: HKY, Baser frequencies: Empirical, Site Heterogeneity Model: Gamma, no. of Gamma Categories: 4
Clocks: Strict
Tree Prior: Coalescent: Constant Size.

The data includes Traits (4 diagnostic statuses) associated with taxa, their BEAUti set were:
Site: Discrete trait substitution model: Symmetri substitution mode ( I choose "Infer social network with BSSVS)
the clocks and trees were strict and coalescent constant size

The job Length of the chain: 500 millions.

I followed the above suggestions for edition the BEAST xml file, and also tried a suggestion from another person. However the error keep coming back. The confusing thing is that this error does come with similar run when the clock is set to Relaxed. Also, when traits are removed, everything is fine.

What might be the issue here? How can solve it or deal with it?

I attached BEAST log screen for your reference.

Yaser
BEAST_Error.txt

HS

unread,
May 6, 2019, 6:12:55 AM5/6/19
to beast-users
Dear Andrew,

I try to use polymorphic sites only in Beast2. Unfortunately, I can't find the pattern's block there? Can you please tell if the syntax is changed in Beast2 for the pattern's block?

Best,
Hovhannes

Andrew Rambaut

unread,
May 6, 2019, 10:31:57 AM5/6/19
to beast...@googlegroups.com
I don't know how to do this in BEAST 2 but I am sure it has been mentioned on the list somewhere before. Generally there is no overlap between BEAST and BEAST2 XML.

Andrew

Remco Bouckaert

unread,
May 6, 2019, 4:34:32 PM5/6/19
to beast...@googlegroups.com
HI Hovhannes,

This may be what you were looking for: https://groups.google.com/forum/#!topic/beast-users/QfBHMOqImFE 

You can run the beast2_constsites program (https://github.com/andersgs/beast2_constsites) if you don’t want to edit the XML directly.

Cheers,

Remco

HS

unread,
May 7, 2019, 7:14:35 AM5/7/19
to beast-users
Hi Remco,

Thank you so much for your kind help!

It runs now!

Best wishes,
Hovhannes
HI Hovhannes,

To unsubscribe from this group and stop receiving emails from it, send an email to beast...@googlegroups.com.

Mark Miller

unread,
Aug 14, 2019, 11:27:23 AM8/14/19
to beast-users
Dear Yasar,
I am seeing the same error message. I wonder if you were able to solve this issue, and what worked for you?

Mark
Reply all
Reply to author
Forward
0 new messages