Re: Low ESS values, and some general *Beast questions

1,884 views
Skip to first unread message

pepster

unread,
Mar 26, 2013, 2:05:16 PM3/26/13
to beast...@googlegroups.com


On Thursday, March 21, 2013 3:38:45 PM UTC+13, Kat Dawkins wrote:
Hi all

I am a new user to Beast and I have some questions specifically about a *Beast analysis I'm trying to run.

My data set contains 2 mtDNA genes (COI and 16S) and 3 nuDNA genes (GAPDH, H3, AK).  All of these were imported as separate and unlinked partitions, however cecause I have 2 mtDNA genes, I've read that the trees for these two genes should be linked.  However, when I try to link them it won't let me until I reduce my two sequence files to contain the exact same individuals, which reduces my COI data set a lot. 

You probably want to add an "empty sequence" for the missing individuals instead. That way you could use all the data. It would have been nice if BEAST could do this automatically :)
  
I'm not sure how much information I'm losing by doing this, but the only alternative I can think of is to put blank sequences in my 16S file which I think would be just as bad, if not worse. 

Why would you think this is bad? 
 
Is there any other way of keeping all of my COI data?  Does it matter if I don't link the trees?

Also, I've been using about 50 to 100 million as my chain length in *Beast and it has been working really well for almost everything - except I am getting quite low ESS values (about 60-90) for treemodel.rootheight for all of my genes and my species tree.  At first I thought it might be because I was using a GTR model for two of my genes and it was too complicated.  I changed these both to a TN93 model instead and re-ran it but again I get quite low ESS values.  The only other things I've read is that you can try running it without a +G or +I+G and then compare the results, but I'm not really sure if that is appropriate if I get a model from modeltest with those parameters in it.  Someone also said you can reduce the stdev on the tree prior to 0.5 or 0.1 but I'm not exactly sure where to change the stdev - I think it's in the xml file but I have no idea where (this seems to be a lot easier to change in normal Beast).  Is there any alternative strategies I can try to increase my ESS values for treemodel.rootheight?

Just in case it's important, here is some other information regarding my analysis:  all of my clock models are "lognormal relaxed clock" and I've set a rate for both COI and 16S, however I've left the "estimate" boxes ticked as there have been several different rates suggested for these genes.  In the priors tab, I've set all of my ucld.mean to uniform for all genes, with a specified range for COI and 16S to incorporate the different rates that have been suggested, and a range of 0-100 for my other genes as no rate is know.  All of my trees are set to Yule, with a random starting tree (the initial root height box is blacked out here).

Any help would be greatly appreciated, and any comments if I've done something that seems to be unusual would be great as I've never used this program before.

You can't estimate *all* the rates. One rate should be fixed, or have a relatively tight prior for the analysis to converge nicely. If you send me the XML file I may be able to help. There is a bug in BEAUti which prevents it from emitting the updown operator in some cases, which can slow convergence.

-Joseph
  

Thanks,
Kat

Kat Dawkins

unread,
Mar 28, 2013, 3:35:08 AM3/28/13
to beast...@googlegroups.com
Hi Joseph,

I only just realised that you've answered some of my questions - I just posted another topic about my ESS values, but would love to follow up with you about some of the stuff you replied to as well. 

You said it would be okay to add blank sequences into my file rather than cutting out data from my COI gene.  The reason I thought this wouldn't be great is that it is introducing a lot of 'unknowns' and I thought this might reduce the ability to determine informative relationships between my species, but this was just a guess rather than based on some sort of information.  So do you believe that the blank data won't reduce the power of the analysis?

The other part was about estimating my rates.  The reason I have ticked all of the estimate boxes is because of the several different rates that have been suggested for two of my genes (COI and 16S).  For COI my rates are between 0.007-0.01/myr and 16S is 0.00265-0.0045/myr based on publications for similar organisms.  I have set these as the upper and lower limits in my "priors" tab, while putting the 'middle' rate (i.e. a rate in the middle of my upper and lower limits) in the "clocks" tab and leaving the estimate box ticked.  If I don't leave the box ticked I can no longer set upper and lower limits on the rate.  I'm not sure how to work around this?

My last question for you is sort of a general question as to whether the set-up I currently have is suitable for what I'm trying to do.  I've never used Beast before at all, so I've done as much reading as I can but some things still seem like a bit of guess work, so maybe you can help.  Basically what I'm trying to do is build a combined-marker species tree to infer the relationships between closely related organisms.  I don't know where the species boundaries lie (this is one of the questions I'm trying to answer), I don't have any calibration points for divergence times (I would like to get an estimate of when certain groups diverged), and not much is known about these organisms in general.  My data format is like this: I have 5 partitions, I've linked my mtDNA trees, my "traits" are each population (which could potentially be a separate species, though not necessarily - this is what I need to find out), my "sites" are all TN93 with one HKY (some with +I +G or just +G), my "clocks" are all lognormal relaxed, my "trees" are all Yule Process with a Piecewise linear & constant root with a Random starting tree, I've set all of my ucld.mean to uniform for all genes, and my MCMC is 100 million (which takes at least 2 days to run). 

I'm not sure if you can answer whether all of these parameters are appropriate for what I'm trying to do, but any insight would be great.  I've attached the xml file as well.

Thanks so much for your help already, and any more would be great :)

Kat
trialrun6StarBEASTLog.xml

Gustavo Sánchez

unread,
Dec 12, 2014, 11:08:23 AM12/12/14
to beast...@googlegroups.com
Dear Kat
Have you fixed your problem? 
I am having the same trouble as you, but working with only one locus. My ESS are very low but no wonder how to increase it.

Thank you 

Kat Dawkins

unread,
Dec 15, 2014, 2:05:39 AM12/15/14
to beast...@googlegroups.com
Hi Gustavo,

I ended up changing a lot of parameters in my *Beast analysis as new information came to light.  The only suggestions I can give you without knowing what you are doing, is to:
- check that you've input the most appropriate model
- increase the length the analysis is run for (I ended up doing a total of 150 million runs)
- change the stdev to represent a more probable range (exponential distribution, initial 2, mean 0.5 has been suggested in *Beast forums)
- check that the prior distribution used is the most suitable for your ucld.mean(s).  I started with a uniform distribution, but ended up changing to a lognormal.  I did this on advice from one of my colleagues, but it was so long ago that I can't remember the reasoning behind it - sorry!

Apart from that, you really just need to play around with the parameters and make sure the priors you've set are reasonable and correct for your data.

I hope this helps.  Let me know if you need any more specifics and I'll do what I can.

Kat

Gustavo Sánchez

unread,
Dec 15, 2014, 10:09:21 AM12/15/14
to beast...@googlegroups.com
Hello Kat,
Thank you a lot for your help. 
I have fixed my problem with the ucld.means as lognormal.

I want to add something about the best substitution model. I have used RB model implemented in Beast 2, this model supposed to check for the best model that fit my data (as long as I understand). However, I have tried that model with no success in my posterior results. 

Then, I used partitionfinder software which provide me a different model that the one I have obtained by jModeltest. When I set the best model and best partition according to this software (with BIC inference), my posterior and ESS increased a lot. Then I got almost 200 ESS with only 50 million mcmc. 

Hope this information could help others, otherwise maybe i am making a mistake so someone could provide his opinion.

pawan jayaswal

unread,
May 15, 2015, 6:09:24 AM5/15/15
to beast...@googlegroups.com
Hello Gustavo,

   I am facing same problem (low ESS value of posterior, prior and likelihood)  from last six months, I have been used different combination of parameters like HKY +G+I for substitution (from modeltest I have found GTR+GI is the best substition model, after getting suggestion from the forum I have changed this to HKY), 3 partion info , lognormal relaxed clock (Uncorelated) and Birth-death as a tree prior speciation. I also changed the UCLD mean from exponential to lognormal and initial value 2, mean 0.5 used for the standard deviation. But after all kind of changes the value of ESS not raised.

     Any valuable suggestion or  help would be greatly appreciated.

Thank You.

Pawan kuamr Jayaswal
Reply all
Reply to author
Forward
0 new messages