effective population size or population size (EBSP)

655 views
Skip to first unread message

LP

unread,
Mar 29, 2015, 6:00:19 PM3/29/15
to beast...@googlegroups.com
Hello everybody.
I have a simple question, it is more a doubt. 

When I compute the EBSP, the result that I obtain is an estimation of the "effective population size" through time or "population size" through time? where "population size" is the product between Ne (effective population size) and tau (generation length).

Thank you!

Santiago Sanchez

unread,
Mar 29, 2015, 9:52:18 PM3/29/15
to beast...@googlegroups.com
Hi,

The y-axis should be the population size parameter (theta): Ne (effective population size) x mu (mutation rate). The ploidy scalar can be specified in BEAUti for each type of DNA region.

Cheers,
Santiago

Santiago Sanchez-Ramirez
Ecology and Evolutionary Biology, University of Toronto
Natural History (Mycology), Royal Ontario Museum
100 Queen's Park
Toronto, ON
M5S 2C6
Canada
--
You received this message because you are subscribed to the Google Groups "beast-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to beast-users...@googlegroups.com.
To post to this group, send email to beast...@googlegroups.com.
Visit this group at http://groups.google.com/group/beast-users.
For more options, visit https://groups.google.com/d/optout.
Message has been deleted
Message has been deleted

LP

unread,
Apr 7, 2015, 5:33:24 AM4/7/15
to beast...@googlegroups.com
Hi Santiago,

Thank you for your reply, but I don't think is the mutation rate but rather the generation length. The point is that in the manual or other papers, this is not specified. And reading several papers I have seen that there is a bit of confusion...Above all I'm trying to understand if i have divide the estimated value for the generation time or not.

Thank you to everybody

Carlo Pacioni

unread,
Apr 7, 2015, 8:53:24 AM4/7/15
to beast...@googlegroups.com
Hi Gabriele,
The unit depends on whether or not you have calibrated your analysis. If you have used a calibration, the x-axis is expressed in time and y-axis is Ne X generation time. Otherwise, the x-axis is expressed in mutation per site and the y-axis is Ne X u. As far as I'm aware, this applies to all coalescent-based demographic analysis in BEAST, with the exception of skygrid where the y-axis is in log unit.

This is explained in Alexei's original paper on BSP (Drummond, A.J., Rambaut, A., Shapiro, B., and Pybus, O.G. (2005) Bayesian Coalescent Inference of Past Population Dynamics from Molecular Sequences. Molecular Biology and Evolution 22(5), 1185-1192.) and I think Alexei has also replied a few posts on this topic a while back.

Hope this helps,
carlo
 

--

Santiago Sanchez

unread,
Apr 7, 2015, 9:01:50 AM4/7/15
to beast...@googlegroups.com
The thing is that you can scale the time (x) axis in many different ways, but it all depends on the units that you use to scale the mutation rate. You can choose the mutation rate to be in substitutions per Myr or yr or generations. Once you have any of those its easy to scale in the timeframe you are interested. In any case, the population size parameter (theta) should not be affected by the scale. Lets say you ran your analysis with a mutation rate in (for sake of an example):

2.5E-8 substitutions per site per year.
If you want your units to be in generation time (lets say 20 yr), I would just divide both your x-axis (time) and y-axis (population size theta) by a factor of 20.

Your y-axis would still be the population size parameter. For single locus BSP its easy to convert  theta to Ne, specially if you are using a fixed mutation rate. But for EBSP, theta is probably scaled across the mutation rate of all loci, so I wouldn't know how to convert precisely.

Cheers,
Santiago

Santiago Sanchez-Ramirez
Ecology and Evolutionary Biology, University of Toronto
Natural History (Mycology), Royal Ontario Museum
100 Queen's Park
Toronto, ON
M5S 2C6
Canada
--

carlo pacioni

unread,
Apr 19, 2015, 5:56:15 AM4/19/15
to beast...@googlegroups.com
Hi Santiago,
sorry the late reply. My understanding is that the y-axis is the population size parameter (theta) only if you don't have a calibration (and the problem remains because I wouldn't know either how to convert it in Ne when there are multiple loci. Yours is a good point I didn't think about before). However, when you have a calibration, then the y-axis is Ne (effective pop size) X generation time. So, if I'm correct, in that case there is no need  to rescale it by the mutation rate.

Cheers,
carlo

Santiago Sanchez

unread,
Apr 19, 2015, 10:22:47 AM4/19/15
to beast...@googlegroups.com
Hi Carlo,

First, it would be awkward to have a calibration point in EBSP. Since it is a multilocus model you would need to make a lot of assumptions, for instance if you assume two populations diverged X amount of time a go for gene Y, but maybe not for gene Z (maybe this could be true for mtDNA where you find a lot of structure, put probably not for nuclear). Also one of the assumptions of EBSP is that there is no structure, so in most of the cases, a calibration point in EBSP seams like the least feasable option. Second, for coalescent models in BEAST, calibration points would be used to scale a mutation rate, which would be in the units of time that you choose, years, generations, millions of years; the choice, I guess, depends on the scale of time your working with. So with that said, the only way the y-axis is Ne x generation time, is if your mutation rate (or calibration point as per your suggestion) is in units per generation time.

Cheers,
Santiago

Santiago Sanchez-Ramirez
Ecology and Evolutionary Biology, University of Toronto
Natural History (Mycology), Royal Ontario Museum
100 Queen's Park
Toronto, ON
M5S 2C6
Canada

carlo pacioni

unread,
Apr 20, 2015, 7:25:28 AM4/20/15
to beast...@googlegroups.com
Hey Santiago,
I agree with you, I can't think a situation where it would make sense to have a calibration on internal nodes when using a coalescent tree prior. However, I was thinking more at the use of heterochronous data (e.g. aDNA). I can't see anything wrong in using tipdates in a coalescne-based analysis. Another situation is that you may have information to calibrate the root.

As for the generation time, from Drummond et al 2002: "It is assumed genealogies are realized by the Kingman coalescent process. Our time units in this article are “calendar units before the present” [e.g., days before present (BP)], where the present is the time of the most recent leaf and set to zero. Let p denote the number of calendar units per generation and Ө=Ne p. The scale factor Ө converts “coalescent time” to calendar time and is one of two key objects of our inference. Note that we do not estimate p and Ne separately, only their product." [just to clarify, Ө is not theta, as theta is defined earlier in the paper as theta=2Ne*mu].

My understanding from this is that: while you provide a calibration in a unit of calendar year (days, year. my etc), as you say, and the coalescent time (i.e. the x-axis) is rescaled in calendar year (days, year), the estimation of the population size is in Ne*p where p is the generation time expressed in the unit you provided in your calibration. I wouldn't be able to explain to you how this is handled mathematically, but biologically, it always made lots of sense to me because genetic drift operates on a 'per generation basis'. With EBSP being an extension of BSP, I always assumed the same approach is applied.

I'm not sure whether this helps or confuses even more things...

cheers,
carlo

Drummond, A.J., Nicholls, G.K., Rodrigo, A.G., and Solomon, W. (2002) Estimating Mutation Parameters, Population History and Genealogy Simultaneously From Temporally Spaced Sequence Data. Genetics 161(3), 1307-1320.

Santiago Sánchez

unread,
Apr 20, 2015, 8:29:35 AM4/20/15
to beast...@googlegroups.com
Hi Carlo,

I agree that you could use heterochronous data to calibrate, and that a per generation estimate makes more sense biologically. With out it there is no way to accurately measure meaningful Ne. Still, in order for this to happen you would need to provide your units in number of generations (either the time units in your heterochronous dates or a per generation mutation rate). Another other way would be to rescale theta post-analysis by the number of generations. Not every organism has a known generation time, and also they vary a lot from organism to organism, and even within species. So I'm really unconvinced that BEAST can figure this out without explicit information.

Cheers,
Santiago
Santiago Sánchez-Ramírez
Department of Ecology and Evolutionary Biology, University of Toronto
Department of Natural History (Mycology), Royal Ontario Museum
100 Queen's Park
Toronto, ON
M5S 2C6
Canada

Carlo Pacioni

unread,
Apr 21, 2015, 8:34:54 AM4/21/15
to beast...@googlegroups.com
Hi Santiago,
as I said I'm not able to explain how BEAST figures out the generation time, I only note that there are at least two papers that I'm aware of (Drummond et al 2002 and Drummond et al 2005) where data are provided in days/years and the y-axis is reported as Ne*p. Hopefully, some of the knowledgeable people in this forum will chime in to clarify this point.

Cheers,
carlo

Reply all
Reply to author
Forward
0 new messages