Hi Alexei,
Thanks so much for your help! I have some follow up questions if you don't mind, as I am new to Beast and want to make sure what I've done is correct. I would really really appreciate it if I can ask you some questions as this is basically the last major step I have to do in my PhD to finally be able to sort out what story I have - I hope that's okay!
Basically, what I want to do at the end of all of this is present a species tree and be able to put some divergence times on it. The organism I'm working on has an unknown taxonomic framework, so basically what I have done is assign sequences to distinct populations that they came from and have used these as my "species" in *Beast to see what groupings I get on the overall species tree so I can then decide which populations make up a real species. So this is my first question - is this approach reasonable, seeing as I don't know what my real species groupings are and this is a major question of my thesis?
To give you an idea of what I have done, I'll give you a quick summary of my data.
I have run a whole bunch of analyses, and so far I have come up with one fairly good one that I am working on now. In my analysis I have 5 genes in total, two of which I have a rough idea of their divergence rate from the literature. What I've done is assign a 'middle' rate in the clock section to these two genes, and put a higher and lower limit on these in the prior section under ucld.mean with a uniform distribution. So this basically represents the variability in divergence rates that have suggested in the literature. I've left the estimate box ticked for all of my genes, and assigned them a uniform distribution and a max of 100, min of 0. For all ucld.stdev I have assigned an initial value of 2 and a mean of 0.5, with an exponential distribution. I have assigned models to all of my genes, and my two mtDNA genes have linked trees. Also, all clocks have been set as lognormal relaxed. These are all of the major things I have changed and played with along the way. Second question - do these all seem like reasonable limits to put on my data, considering what I want as my output (i.e. spp tree + divergence estimates)?
As we discussed previously, I have one rogue gene that is giving me problems. I have tried both of your suggestions (taking out the gene, and enforcing a strict clock on the gene) and it seems that taking it out makes no difference, and enforcing a strict clock doesn't improve things dramatically as far as I can tell; however, the gene no longer has a ucld.stdev when a strict clock is set so I'm not sure how to compare that part of it directly. Because neither of the changes improved things much, I felt I should go back to my best analysis so far and have another look at what my problems are.
I went back to my 'best' analysis, and it's actually quite good except for 3 ESS values (speciesTree.rootheight, a treemodel.rootheight for a mtDNA gene, and a treeLikelihood value for my 'problem' gene), and may need nothing more than to be run longer as these are all high 'orange' values. This brings me to my third question - is running the analysis longer a valid approach to increase these ESS values, or could I be missing some underlying problem with my analysis by doing this? I'm currently running my analysis for 100million states (which takes 2 days).
I also have a question about something you said about sampling the prior in my ucld.stdev - will this have an effect on the divergence estimate that I get for my species tree?
I think those are all of my major questions (at least for now!). I would really appreciate any help you can give!!
Thanks,
Kat
P.S. The graphic that I posted last time for the ucld.stdev had the prior in purple and the other one in black.
--
----------------------------------------------------------------
Kathryn L. Dawkins
Griffith Crayfish Team
Griffith School of Environment
and
Australian Rivers Institute
Gold Coast campus
Griffith University
Queensland 4222
Australia
Mobile: 0422 928 429
E-Mail: k.da...@griffith.edu.au
--------------------------------------------------------------------------
Private and Confidential
The above e-mail message or messages
(and including any attachments) may contain information that is
confidential or privileged. This e-mail and any attached files are
intended solely for the addressee(s). Any form of disclosure, copying,
modification or distribution of the information without the author's
permission is prohibited.