Hi all,I probably have some more naif questions about SNAPP, but I could not figure out by reading the manual and Bryant et al.'s 2012 paper.1) In Bryant et al.,'s paper I read that as a prior implemented in the SNAPP "the stationary allele proportions are fixed at the observed frequencies of red and green alleles in the data", and, in the manual ("A rough guide to SNAPP", october 11 2012), I read that, "given an estimate of pi_0 and pi_1", u and v can be obtained by u = 1/2pi_0 and v = 1/2p_1.
Does it mean that I can get a sensible prior by calculating the frequencies of 0 and 1 in the whole data matrix and using them as estimates of pi_0 and pi_1 to calculate u and v? In this case, mu = 2(u*v)/(u+v) = 1, right?
Moreover, am I wrong if I think that, for SNPs data, there is not much reason to think that u and v are not the same (while with AFLP data losing a site would be probably easier than acquiring it)?Â
2) assuming that mu = 1, would it make sense to scale the species trees obtained by RAD-sequencing SNPs data by an average genomic point mutation rate?I.e., if, for example, the root height is estimated at 0.02 and the mutation rate is 10^-8, the absolute time would be 0.02/10^-8 = 2,000,000.Or would the actual rate be affected by the fact that I am only using variable positions? Or anything else I did not think about.
I hope this can be a useful topic.. and not too stupid.
1) In Bryant et al.,'s paper I read that as a prior implemented in the SNAPP "the stationary allele proportions are fixed at the observed frequencies of red and green alleles in the data", and, in the manual ("A rough guide to SNAPP", october 11 2012), I read that, "given an estimate of pi_0 and pi_1", u and v can be obtained by u = 1/2pi_0 and v = 1/2p_1.Does it mean that I can get a sensible prior by calculating the frequencies of 0 and 1 in the whole data matrix and using them as estimates of pi_0 and pi_1 to calculate u and v? In this case, mu = 2(u*v)/(u+v) = 1, right?
Moreover, am I wrong if I think that, for SNPs data, there is not much reason to think that u and v are not the same (while with AFLP data losing a site would be probably easier than acquiring it)?Â
2) assuming that mu = 1, would it make sense to scale the species trees obtained by RAD-sequencing SNPs data by an average genomic point mutation rate?I.e., if, for example, the root height is estimated at 0.02 and the mutation rate is 10^-8, the absolute time would be 0.02/10^-8 = 2,000,000.Or would the actual rate be affected by the fact that I am only using variable positions? Or anything else I did not think about.
--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/groups/opt_out.
- BACK AND FORTH RATESYou say that, with SNPs, one could "easily replace 0s by 1s and vice versa for a site", and that would change the values of u and v.ÂBut I am confused by the idea of 'averaging' pi_0 and pi_1 (and thus u and v) across loci. For example, imagine I have a data matrix with just two loci. One is coded so that I have pi_0 = 0.75 and pi_1 = 0.25 (but that could be the opposite, since it is completely arbitrary). For this locus, the appropriate rates would be u = 1/(2*0.75) = 0.66 and v = 1/(2*0.25) = 2. For the other locus I could have pi_0 = 0.15 and pi_1 = 0.85, and the appropriate rates would be u = 3.33 and v = 0.59. I don't see why SNAPP should be told that the rates are the 'average' u = 1.11 and v = 0.99, which is not the 'right' value for either loci. What am I missing?
- ABSOLUTE RATESIn fact I was just trying to understand how my results can be so 'clean' with regard to thetas and times. As I see it, disentangling time and pop size is always puzzling.Nonetheless, the trees I am getting (after ESS for all parameters are > 100) have relatively narrow CIs on divergence times.However, if I try to apply a 'typical' genomic rate (ca. 10^-8 subst./(site*generation)), I get very large estimates for both Ne and times. Actually, I have done a *Beast run on the same samples, using sequences from 8 nuclear loci and a couple of mtDNA genes, and (applying 'standard' mtDNA rates) the estimated times should be at least one order of magnitude lower.I have also noticed that some estimates of divergence time (especially the root height) seem sensitive to the number of samples I use (removing a few samples I get overlapping but clearly different estimates for the root height).
--
It seems like the two of you have slightly different ideas.. David said that u = v may be reasonable for SNPs, while Remco disagreed. I have already asked one more question to Remco. I would add that I observed that, placing u = v = 1 instead of two different rates (u = 3, v = 0.6) in my data, resulted in estimates of divergence that are quite different (by a factor of ca. 2).
Plese, David, you can also look at my reply to Remco about applying an absolute rate. One more question is. What do you meeaj that, by unchecking the "non-polymorphic" box the "rate averaged over variable and constant sites". How does SNAPP knows how many constant sites are there in my data, if I use a matrix of variable sites only?
--
2) I still need some clarification about the 'non-polymorphic' checkbox.As I understand it from the manual, it should simply tell SNAPP to ignore invariant sites (when UNCHECKED) or use them to compute the likelihood (if CHECKED).Since my matrix contains ONLY variable sites, I would expect it to have no effect. But David told me it is important to check it in order to have "correct" mutation rates.In order to have some feelings, I have tried to run two otherwise identical input files with 'non-polymorphic' checked or unchecked. Actually, I got two very similar trees, differing only slightly for some pop.sizes, but with one (CHECKED) having branch lengths roughly double than the other (UNCHECKED).
3) As to the use of genomic mutation rates, somebody suggested me that I should consider that I only used variable sites selected from 85bp fragments where most sites were invariant. Therefore I should multiply the mutation rate for 85.. Actually it would make my tree much more as expected, but I am not fully convinced..What do you think?