Help - invariant sites

812 views
Skip to first unread message

Sebastian van Hal

unread,
May 20, 2015, 7:40:40 AM5/20/15
to beast...@googlegroups.com
Hi

I am new to BEAST and have a large dataset of bacterial isolates with 20000 SNV across the core genome. When I load the entire core the program fails. 
I understand one can indicate the number of invariant sites and load the snp sequences but how does one do this - or manipulate xmfa file to indicate this.

Thanks
Sebastian 

Alexei Drummond

unread,
May 20, 2015, 7:45:51 PM5/20/15
to beast...@googlegroups.com, Remco Bouckaert
Dear Sebastian,

In BEAST2 this is a little bit fiddly for the moment. But basically you should just load your variant-only alignment into BEAUti as normal and produce an XML. Then you will need to hand-edit the XML Alignment element as follows:

  <data spec="Alignment" statecount="2" weights="1517,141,1526,487,1,1,1,1,1,1,1,1,1...">
       <sequence id="seq_C1" taxon="C1" totalcount="4">ACGT ACACA</sequence>
       <sequence id="seq_C2" taxon="C2" totalcount="4">ACGT CACAC</sequence>
...
   </data>

In the Alignment element you should add "ACGT" to the front of every sequence as shown above. 

Then add the weights="X,Y, Z,W,1,1,1,1,1,1,…" attribute to the Alignment element as shown above.

X, W, Z, W are the counts of the number of invariant sites in the genome of each of the 4 nucleotides A, C, G, T, respectively. These weights should be followed by a 1 for each of the variant sites in your alignment.

We are working on a better way to do this in the future...

Cheers
Alexei

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Alexei Drummond

unread,
May 21, 2015, 6:03:27 PM5/21/15
to beast...@googlegroups.com, vanh...@gmail.com
Hey Sebastian,

A slightly easier recipe that Remco reminded me of:

- rename the original alignment, say from id="data" to id="origData"

- add a filtered alignment below the alignment with id="data" and refer to origData in the data attribute, so it reads

<data id="data" spec="FilteredAlignment" filter="-" data="@origData" constantSiteWeights="1517 141 1526 487"/>

The attribute constantSiteWeights="1517 141 1526 487" means that you want to add constant sites made up of 1517 As, 141 Cs, 1526 Gs, and 487 Ts.

Cheers
Alexei

andersgs

unread,
Oct 26, 2017, 11:08:36 PM10/26/17
to beast-users
I have created a Python script that will do this for you, assuming you have a VCF file with the positions of the variable sites in the reference genome used for identifying SNPs:


It is on PyPI:

pip3 install b2constsites

Thank you Remco and Alexie for the suggestion.

Anders.

mark....@uky.edu

unread,
Aug 12, 2020, 12:42:57 AM8/12/20
to beast-users
For this solution, will the weight counts be properly parsed out if one filters the primary alignment into partitions? I'm trying to figure out how to add constant site information to separate data partitions.

Cheers,
Mark

mark....@uky.edu

unread,
Aug 12, 2020, 1:33:29 AM8/12/20
to beast-users
I guess I got my answer: 

Error 110 parsing the xml input file

validate and intialize error: Cannot handle site weights in FilteredAlignment. Remove "weights" from data input.

Error detected about here:
  <beast>
      <run id='mcmc' spec='MCMC'>
          <state id='state' spec='State'>
              <tree id='Tree.t:C2' name='stateNode' spec='beast.evolution.tree.Tree'>
                  <taxonset id='TaxonSet.C2' spec='TaxonSet'>
                      <alignment id='C2' spec='FilteredAlignment'>


Do I take it then, that it's not possible to do tip dating using partitions unless one uses the whole alignment? That's a bit of a problem.

Sebastien Calvignac-Spencer

unread,
Sep 9, 2020, 8:52:45 AM9/9/20
to beast-users
Hey,
I have tried both fixes in this thread on beast1 xmls and they do not work.
Adding a filtered alignment (ie <alignment id=... />, which is equivalent to <data id=... /> in beast2 apparently) results in the filtered alignment block not being recognized as properly formatted (using exactly the options suggested by Alexei).
When adding site weights to the alignment block, the analysis starts but root height estimates are the same, suggesting the invariant sites are not taken into account.
Maybe am I misunderstanding something but if I am not any additional suggestion to edit beast1 xmls to add invariant sites would be very welcome.
Best,
Seb

Philippe Lemey

unread,
Sep 9, 2020, 8:58:29 AM9/9/20
to beast...@googlegroups.com
Hi Seb,
In beast1 xmls you can add a patterns element like the following.

<mergePatterns id="patterns">
<patterns from="1" every="1">
<alignment idref="alignment"/>
</patterns>
<constantPatterns>
<alignment idref="alignment"/>
<counts>
<parameter value=" 1131630 1240200 1241860 1132375 "/>
</counts>
</constantPatterns>
</mergePatterns>

Best,
Philippe

--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages