BEAST2 -- serial birth death skyline -- unexpected behavior

530 views
Skip to first unread message

Mike Famulare

unread,
Aug 6, 2013, 1:38:59 PM8/6/13
to beast...@googlegroups.com
Hi all,

In the serial birth-death-skyline model (run in v 2.0.2), the initial condition for the "origin" parameter behaves in a manner I find unexpected.  I hesitate to call it a bug in case there is a reason for the behavior I don't understand.

For example, I have a 100 sequences and a user-specified starting tree with a root at 15 years.  So, I expect that I can initialize the origin date at some number greater than 15 years in the past, such as 30. Using Beauti to set up the job, and running, I get the following error:

java.lang.RuntimeException: Error: origin (30.0) must be larger than tree height (99.0).
        at beast.evolution.speciation.BirthDeathSkylineModel.initAndValidate(Unknown Source)
        at beast.util.XMLParser.initPlugins(Unknown Source)
        at beast.util.XMLParser.parse(Unknown Source)
        at beast.util.XMLParser.parseFile(Unknown Source)
        at beast.app.BeastMCMC.parseArgs(Unknown Source)
        at beast.app.beastapp.BeastMain.main(Unknown Source)

Error 110 parsing the xml input file

validate and intialize error: Error: origin (30.0) must be larger than tree height (99.0).

Error detected about here:
  <beast>
      <run id='mcmc' spec='MCMC'>
          <distribution id='posterior' spec='util.CompoundDistribution'>
              <distribution id='prior' spec='util.CompoundDistribution'>
                  <distribution id='BirthDeathSkySerial.t:random100' spec='beast.evolution.speciation.BirthDeathSkylineModel'>


I've played with it, and the tree height (99.0) is the number of nodes in the tree, and has nothing to do with the actual tree height.   

The consequence of this is the following.  I have to set the origin at at least 100, but the rest of the model correctly interprets the origin as a date (as far as I can tell).  Because the origin date has to be started far from where it will converge to, the burn-in period takes a lot longer than it needs to.  During the burn-in, the tree height gets pulled toward the initial origin, far from where it will converge (and where my priors should put it), and it takes a long time for the tree height and clock model to walk back to where it belongs.  This problem gets worse quickly as the number of sequences goes up since tree heights don't scale especially fast, if at all, with the number of sequences.

If this isn't a bug, could someone explain to me why this behaves as it does?

Thanks,

--Mike

Denise

unread,
Dec 2, 2013, 9:31:59 AM12/2/13
to beast...@googlegroups.com
Dear Mike,

I'm sorry that none of us replied to your post earlier and I assume you've solved your problem already, but since I just ran across your question:

The behaviour is expected since the model requires the origin to be larger than the tree height. However, it sounds like your starting tree is rather 'bad'. There are a few options for starting trees (random, upgma, NJ, ...) and whatever you are using seems to create a tree with tree height 99. I'd suggest you try a upgma tree instead or something else you consider suitable. Alternatively, you can also specify a starting tree in newick format. 

Cheers

Denise

Mike Famulare

unread,
Dec 2, 2013, 7:34:07 PM12/2/13
to beast...@googlegroups.com
Denise,

Thanks for the reply.  I ended up just tolerating the burn-in period and am not currently working on this topic, but I still think there is a potential issue. I saw the above problem with a few different data sets and I don't think it's me (but I don't doubt it still could be!).  

I tried to the BD-Skyline (BDS) on a few different data sets.  For each, I constructed an NJ starting tree in matlab, output a newick string, and I edited the newick string into the XML as a starting tree by hand. I've done that before and for non-BDS models, it has behaved as expected (starting tree height is what it should be).   I can view the newick strings in FigTree and confirm that the trees have the origin dates I think they do (as in the example in my original email, 15 years lets say).  

What happens for me is that when I run the BDS model, it reports back that the initial starting tree height is the number of sequences minus 1.  Whether it be 100 sequences - 1 = 99 as in the example above, or 1350-1=1349 for my biggest dataset.  Using the same starting tree and data, but different BEAST tree models, I don't have this problem.  

If the issue isn't specific to my setup, it should be easy to reproduce.  Make an example with 30 sequences.  Make a starting tree with height 10.  Set the initial origin somewhere between 10 and 29. It should throw an error.  I'm being cagey about sharing xml because of data privacy concerns, but if you need more help to reproduce the problem, please let me know.

Thanks,

--Mike
Reply all
Reply to author
Forward
0 new messages