ucld.stdev sampling from prior & low ESS for treemodel.rootheight

1,138 views
Skip to first unread message

Kat Dawkins

unread,
Apr 11, 2013, 7:20:42 PM4/11/13
to beast...@googlegroups.com
Hi all,

I've been working through some issues that I've been having using *Beast and I think I'm almost at the end but have one last thing to fix and really need some help!

I've run my analysis multiple times (100million states) and was having trouble reaching convergence for my treemodel.rootheight parameters for each of the 5 genes included in my data.  In addition to that, for one of my genes the ucld.stdev suggested that I was still sampling from the prior.  I found a suggestion on the discussion board that said to try changing the ucld.stdev priors to an initial value of 2 and a mean of 0.5 (with an exponetial distribution), which I then tried for all of my genes.  This seems to have partially fixed my treemodel.rootheight problem as I am now getting ESS values above 200 for 4 genes (but not one mtDNA gene and not my species tree).  However, unfortunately changing the priors did not fix my problem with sampling from the prior for one of my genes (see picture below).

My questions are this:
1.  What is the next step that I should try to improve the last of my treemodel.rootheight ESS values?
2.  What parameters should I try altering to avoid sampling from my prior in regards to ucld.stdev?
3.  Does it actually matter that I am sampling from the prior for this gene?  What will it effect?

I have attached the xml file in case it will help.  I would really appreciate any little bit of information you can give me, even if you can't answer all the questions.


trialrun8StarBEASTLog.xml

Alexei Drummond

unread,
Apr 11, 2013, 8:18:59 PM4/11/13
to beast...@googlegroups.com
Dear Kay,

Answer to question (3):

No, it doesn't matter if you are just sampling from the prior, so long as (i) the prior is reasonable and (ii) you are not interested in learning about the particular parameter in question.

Expanding on that and to answer question (2):

If the marginal posterior distribution matches its corresponding prior that just means that there is very little information about that particular parameter in that gene. It is therefore a feature of your data and can't be "fixed" by changing operators or starting values or anything like that. 

Incidentally, which density is the prior and which is posterior of the blue and black in your attached figure? I might also try a strict clock on that gene to see whether you can get reasonable results, as the posterior has a lot of probability mass near zero. 

Finally, answer to question (1):

For a start, try two analyses: 

(1) without the problematic gene and see if the rest of them play nicely together.
(2) with the problematic gene but with a strict clock.

The results of simpler analyses often provide clues about what is going on in the big bad all-singing all-dancing analysis.

Cheers
Alexei

Kathryn Dawkins

unread,
Apr 14, 2013, 1:57:25 AM4/14/13
to beast...@googlegroups.com
Hi Alexei,

Thanks so much for your help!  I have some follow up questions if you don't mind, as I am new to Beast and want to make sure what I've done is correct.  I would really really appreciate it if I can ask you some questions as this is basically the last major step I have to do in my PhD to finally be able to sort out what story I have - I hope that's okay!

Basically, what I want to do at the end of all of this is present a species tree and be able to put some divergence times on it.  The organism I'm working on has an unknown taxonomic framework, so basically what I have done is assign sequences to distinct populations that they came from and have used these as my "species" in *Beast to see what groupings I get on the overall species tree so I can then decide which populations make up a real species.  So this is my first question - is this approach reasonable, seeing as I don't know what my real species groupings are and this is a major question of my thesis?

To give you an idea of what I have done, I'll give you a quick summary of my data.
I have run a whole bunch of analyses, and so far I have come up with one fairly good one that I am working on now.  In my analysis I have 5 genes in total, two of which I have a rough idea of their divergence rate from the literature.  What I've done is assign a 'middle' rate in the clock section to these two genes, and put a higher and lower limit on these in the prior section under ucld.mean with a uniform distribution.  So this basically represents the variability in divergence rates that have suggested in the literature.  I've left the estimate box ticked for all of my genes, and assigned them a uniform distribution and a max of 100, min of 0.  For all ucld.stdev I have assigned an initial value of 2 and a mean of 0.5, with an exponential distribution.  I have assigned models to all of my genes, and my two mtDNA genes have linked trees.  Also, all clocks have been set as lognormal relaxed.  These are all of the major things I have changed and played with along the way.  Second question - do these all seem like reasonable limits to put on my data, considering what I want as my output (i.e. spp tree + divergence estimates)?

As we discussed previously, I have one rogue gene that is giving me problems.  I have tried both of your suggestions (taking out the gene, and enforcing a strict clock on the gene) and it seems that taking it out makes no difference, and enforcing a strict clock doesn't improve things dramatically as far as I can tell; however, the gene no longer has a ucld.stdev when a strict clock is set so I'm not sure how to compare that part of it directly.  Because neither of the changes improved things much, I felt I should go back to my best analysis so far and have another look at what my problems are.

I went back to my 'best' analysis, and it's actually quite good except for 3 ESS values (speciesTree.rootheight,  a treemodel.rootheight for a mtDNA gene, and a treeLikelihood value for my 'problem' gene), and may need nothing more than to be run longer as these are all high 'orange' values.  This brings me to my third question - is running the analysis longer a valid approach to increase these ESS values, or could I be missing some underlying problem with my analysis by doing this?  I'm currently running my analysis for 100million states (which takes 2 days).

I also have a question about something you said about sampling the prior in my ucld.stdev - will this have an effect on the divergence estimate that I get for my species tree?

I think those are all of my major questions (at least for now!).  I would really appreciate any help you can give!!

Thanks,
Kat

P.S. The graphic that I posted last time for the ucld.stdev had the prior in purple and the other one in black.



--
You received this message because you are subscribed to a topic in the Google Groups "beast-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/beast-users/LF5BOLXthjw/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
----------------------------------------------------------------
Kathryn L. Dawkins
Griffith Crayfish Team
Griffith School of Environment
and
Australian Rivers Institute
Gold Coast campus
Griffith University
Queensland 4222
Australia
Mobile: 0422 928 429

E-Mail: k.da...@griffith.edu.au
--------------------------------------------------------------------------

Private and Confidential
The above e-mail message or messages (and including any attachments) may contain information that is confidential or privileged. This e-mail and any attached files are intended solely for the addressee(s). Any form of disclosure, copying, modification or distribution of the information without the author's permission is prohibited.

Kat Dawkins

unread,
Apr 22, 2013, 1:36:22 AM4/22/13
to beast...@googlegroups.com
Hi Alexei, I was just wondering if you could offer any help on what I've posted?

Thanks,
Kat

Maryam Zaman

unread,
Mar 26, 2020, 1:21:58 PM3/26/20
to beast-users
Respected seniors! Please help me I am working on the virus and want to calculate its divergence time of the sequences. I have the ess value of  treemodel.rootheight parameter is above 200 but mean value of 400.please tells me how to reduce the mean value?your little help would be highly appreciated
Reply all
Reply to author
Forward
0 new messages