Correcting for constant sites in BEAST2

1,430 views
Skip to first unread message

Sebastian van Hal

unread,
Jul 24, 2014, 7:03:12 AM7/24/14
to beast...@googlegroups.com
I am new to BEAST and am trying to generate a tree of 139 genomes (3Mega bases) each. I do not have sufficient memory despite changing this. An alternative which has worked is providing only the variable sites however to get an accurate output I need to provide the number of constant sites. I have found the previous entry but this pattern does not appear in the current xml file (so not sure where or how to enter this data). Any suggestions or help would be great.

Thanks
Sebastian

Look for the pattern list:

<patterns id="patterns" from="1" every="1" >
<alignment idref="alignment"/>
</patterns>

and replace it with this:

<mergePatterns id="patterns">
<patterns from="1" every="1">
<alignment idref="alignment"/>
</patterns>

<constantPatterns>
<alignment idref="alignment">
<counts>
<parameter value="555 434 543 432"/>
</counts>
</constantPatterns>
</mergePatterns>

Where the numbers in value="..." are the counts of constant sites of A, C, G & T

Remco Bouckaert

unread,
Jul 24, 2014, 10:25:17 PM7/24/14
to beast...@googlegroups.com
Hi Sebastian,

You can use a FilteredAlignment to insert constant sites and set the constantSiteWeights attribute. Say, your original alignment is called xyz, so the XML produced by BEAUti contains something like

    <data id="xyz" name="alignment">

It is easiest to rename this to say xyzOriginal,

    <data id="xyzOriginal" name="alignment">

then add another data element, just after the closing </data> element of the alignment would be a good spot, that say

   <data id='xyz' spec='FilteredAlignment' filter='-' data='@xyzOriginal' constantSiteWeights='100 200 300 400'/>

Note id='xyz' and data='@xyzOriginal' should match what you have in the XML.

The constant weights at the end add weights for DNA in order A,C,G,T, so it adds 100 constant sites with all As, 200 with all Cs etc.

In the output to screen, it should report statistics of the xyzOriginal as something like:

6 taxa
768 sites
69 patterns

followed by statistics of the filtered alignment

Filter -
6 taxa
768 sites + 1000 constant sites
69 patterns

where the total number of constant sites added are reported as well.

Hope this helps,

Remcc
Reply all
Reply to author
Forward
0 new messages