41 views

Skip to first unread message

Sep 22, 2021, 5:32:07 AM9/22/21

to slim-discuss

Hi all,

I have two questions related to treeSeq recording and want to hear your opinion in case I’m violating some rules or assumptions

1)
When doing **nonWF** models we often want to
estimate the effective population size and figure out the N:Ne ratio. For this
I normally run a long-ish simulation to achieve near equilibrium and estimate
Ne=4/pi*mu. I want to do this step with treeseq instead, so I’m thinking:

a. Calculate generation time of simulation

b. Run a simulation with no mutations

c. Recapitate and mutate with (mu/generation time)

d. Calculate pi at time 0

e. Question, should I do:

i. Ne=4/pi*mu or

ii. Ne=4/pi*(mu/generation time)

**Do you think this is a valid approach?**

2)
My second question is about doing a bit of an unconventional
series of steps to do a **non-neutral nonWF **simulation. Neutral mutations normally take
much longer to reach equilibrium than deleterious mutations, and I want to
avoid running long simulations with both mutation types. So, I have been
doing

a. Run a simulation with no mutations

b. Recapitate and mutate with (mu/generation time)

c.
Wrote a script to take the tree at time 0 and
write a file in **slim output**

d. Go back to slim (without tree recording) and read the slim output with neutral mutations (in theory in equilibrium)

e. Include deleterious mutations from this time point and wait for those to reach ~equilibrium

**Is this a valid approach? **I’m aware I’m losing all the
genealogical information of the first run, but I’m not worried about that
because I care more about what happens from step “d” onwards.

cheers,

Hernán

Sep 22, 2021, 9:58:32 AM9/22/21

to slim-d...@googlegroups.com

Hi Hernán,

Regarding the calculation of Ne, the scheme that you propose seems to be

kind of circular: which Ne are you using for the recapitation step?

An alternative way to calculate Ne from the simulations will be from the

actual reproductive events at each time step (you can make an harmonique

mean of the Ne(t) at the end). I am not very familiar with the equations

to do this calculation in the case of overlapping generations but I

think Grimm et al (2018, doi:10.1111/2041-210X.12530) could be a good

place to look for them if you think this approach suits your needs.

Sorry, I am not really answering your first question...

For your second question, why not doing step (e) with deleterious

mutation until reach some sort of equilibrium and then doing

recapitation (if necessary) and adding neutral mutations on the tree

sequence? I do not see how your more complex approach will give

different results.

In any case, I think that if you want to start from a population with

neutral mutations, it would be more efficient and safe to simulate the

whole population with msprime >=1.0.0 (using "dtwf" model) and export

the tree sequence so it can used by SLiM as an input. This would also

allow to keep the genealogical information and that might be useful.

Miguel

On 22/09/2021 11:32, hern....@gmail.com wrote:

> Hi all,

>

> I have two questions related to treeSeq recording and want to hear your

> opinion in case I’m violating some rules or assumptions

>

> 1) When doing *nonWF* models we often want to estimate the effective

> output*

> SLiM forward genetic simulation: http://messerlab.org/slim/

> <http://messerlab.org/slim/>

> ---

> You received this message because you are subscribed to the Google

> Groups "slim-discuss" group.

> To unsubscribe from this group and stop receiving emails from it, send

> an email to slim-discuss...@googlegroups.com

> <mailto:slim-discuss...@googlegroups.com>.

> To view this discussion on the web visit

> https://groups.google.com/d/msgid/slim-discuss/02247bd2-1a4d-4838-87cd-7a8e47fcc0a8n%40googlegroups.com

> <https://groups.google.com/d/msgid/slim-discuss/02247bd2-1a4d-4838-87cd-7a8e47fcc0a8n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--

Miguel de Navascués

UMR CBGP, INRAE

Centre de Biologie pour la Gestion des Populations

755 avenue du campus Agropolis

CS30016

34988 Montferrier-sur-Lez cedex (France)

phone: +33499623370

fax: +33499623345

e-mail: miguel.navascues AT inrae.fr

Regarding the calculation of Ne, the scheme that you propose seems to be

kind of circular: which Ne are you using for the recapitation step?

An alternative way to calculate Ne from the simulations will be from the

actual reproductive events at each time step (you can make an harmonique

mean of the Ne(t) at the end). I am not very familiar with the equations

to do this calculation in the case of overlapping generations but I

think Grimm et al (2018, doi:10.1111/2041-210X.12530) could be a good

place to look for them if you think this approach suits your needs.

Sorry, I am not really answering your first question...

For your second question, why not doing step (e) with deleterious

mutation until reach some sort of equilibrium and then doing

recapitation (if necessary) and adding neutral mutations on the tree

sequence? I do not see how your more complex approach will give

different results.

In any case, I think that if you want to start from a population with

neutral mutations, it would be more efficient and safe to simulate the

whole population with msprime >=1.0.0 (using "dtwf" model) and export

the tree sequence so it can used by SLiM as an input. This would also

allow to keep the genealogical information and that might be useful.

Miguel

On 22/09/2021 11:32, hern....@gmail.com wrote:

> Hi all,

>

> I have two questions related to treeSeq recording and want to hear your

> opinion in case I’m violating some rules or assumptions

>

> population size and figure out the N:Ne ratio. For this I normally run a

> long-ish simulation to achieve near equilibrium and estimate Ne=4/pi*mu.

> I want to do this step with treeseq instead, so I’m thinking:

>

> a. Calculate generation time of simulation

>

> b. Run a simulation with no mutations

>

> c. Recapitate and mutate with (mu/generation time)

>

> d. Calculate pi at time 0

>

> e. Question, should I do:

>

> i. Ne=4/pi*mu or

>

> ii. Ne=4/pi*(mu/generation time)

>

> *Do you think this is a valid approach?*
> long-ish simulation to achieve near equilibrium and estimate Ne=4/pi*mu.

> I want to do this step with treeseq instead, so I’m thinking:

>

> a. Calculate generation time of simulation

>

> b. Run a simulation with no mutations

>

> c. Recapitate and mutate with (mu/generation time)

>

> d. Calculate pi at time 0

>

> e. Question, should I do:

>

> i. Ne=4/pi*mu or

>

> ii. Ne=4/pi*(mu/generation time)

>

>

> 2) My second question is about doing a bit of an unconventional series

> of steps to do a *non-neutral nonWF *simulation. Neutral mutations
> 2) My second question is about doing a bit of an unconventional series

> normally take much longer to reach equilibrium than deleterious

> mutations, and I want to avoid running long simulations with both

> mutation types. So, I have been doing

>

> a. Run a simulation with no mutations

>

> b. Recapitate and mutate with (mu/generation time)

>

> c. Wrote a script to take the tree at time 0 and write a file in *slim
> mutations, and I want to avoid running long simulations with both

> mutation types. So, I have been doing

>

> a. Run a simulation with no mutations

>

> b. Recapitate and mutate with (mu/generation time)

>

> output*

>

> d. Go back to slim (without tree recording) and read the slim output

> with neutral mutations (in theory in equilibrium)

>

> e. Include deleterious mutations from this time point and wait for those

> to reach ~equilibrium

>

> *Is this a valid approach? *I’m aware I’m losing all the genealogical
> d. Go back to slim (without tree recording) and read the slim output

> with neutral mutations (in theory in equilibrium)

>

> e. Include deleterious mutations from this time point and wait for those

> to reach ~equilibrium

>

> information of the first run, but I’m not worried about that because I

> care more about what happens from step “d” onwards.

>

> cheers,

>

> Hernán

>

> --
> care more about what happens from step “d” onwards.

>

> cheers,

>

> Hernán

>

> SLiM forward genetic simulation: http://messerlab.org/slim/

> <http://messerlab.org/slim/>

> ---

> You received this message because you are subscribed to the Google

> Groups "slim-discuss" group.

> To unsubscribe from this group and stop receiving emails from it, send

> an email to slim-discuss...@googlegroups.com

> <mailto:slim-discuss...@googlegroups.com>.

> To view this discussion on the web visit

> https://groups.google.com/d/msgid/slim-discuss/02247bd2-1a4d-4838-87cd-7a8e47fcc0a8n%40googlegroups.com

> <https://groups.google.com/d/msgid/slim-discuss/02247bd2-1a4d-4838-87cd-7a8e47fcc0a8n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--

Miguel de Navascués

UMR CBGP, INRAE

Centre de Biologie pour la Gestion des Populations

755 avenue du campus Agropolis

CS30016

34988 Montferrier-sur-Lez cedex (France)

phone: +33499623370

fax: +33499623345

e-mail: miguel.navascues AT inrae.fr

Sep 22, 2021, 11:08:46 AM9/22/21

to Miguel de Navascués, slim-discuss

Hi Miguel,

Thanks for your thoughts. All excellent points.

Re #1. Yes, I also thought calculating Ne based on the pedigree (or the genealogy in this case) was a good option. I want to simulate inbreeding later on and was worried it would distort too much, but should be fine to do it in the ancestral population. I agree with you this is the way to go. as a less circular approach, as you said. Although I think people simply use the initial population size as Ne for recapitation, right? I have never seen a huge effect of the Ne parameters on the recapitation function, I might be wrong.

Re #2. The reason is that I want to collect timepoints along the simulation while having both mutation types, so their different equilibrium points is a problem and also running treeseq recording with deleterious mutations + remembering individuals is incredibly slow and leads to huge tree files. So I rather doing it in two steps and avoid treeseq for the second step, this is why I came up with this solution. I gather you don't see a reason not to do it this way in terms of violating an assumption of the models.

cheers,

Hernán

To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/5b58115f-9db4-a735-86b9-3caa71803254%40inrae.fr.

Hernan

Sep 22, 2021, 5:29:22 PM9/22/21

to hern....@gmail.com, slim-discuss

Good questions!

> 1) When doing nonWF models we often want to estimate the effective population size and figure out the N:Ne ratio. For this I normally run a long-ish simulation to achieve near equilibrium and estimate Ne=4/pi*mu. I want to do this step with treeseq instead, so I’m thinking:

> a. Calculate generation time of simulation

> b. Run a simulation with no mutations

> c. Recapitate and mutate with (mu/generation time)

> d. Calculate pi at time 0

> e. Question, should I do:

> i. Ne=4/pi*mu or

> ii. Ne=4/pi*(mu/generation time)

The answer here comes from checking what the units are on all these

quantities. In the expression for neutral diversity in a randomly

mating population,

pi = 4 Ne mu,

pi is in units of mutations per bp, 4Ne is in units of generations

(since it's twice the mean coalescence time), and mu is in units of

mutations per bp per generation. Since msprime puts down mutations at

a rate of "mutations per time unit", your step (c) with mutation rate

equal to (mu / generation time) is the right one, so that mu is in

units of mutations per generation. So, I think the right answer is

(i). But, this is always confusing - you can verify this by doing this

procedure to a neutral WF model.

> 2) My second question is about doing a bit of an unconventional series of steps to do a non-neutral nonWF simulation. Neutral mutations normally take much longer to reach equilibrium than deleterious mutations, and I want to avoid running long simulations with both mutation types. So, I have been doing

> a. Run a simulation with no mutations

> b. Recapitate and mutate with (mu/generation time)

> c. Wrote a script to take the tree at time 0 and write a file in slim output

> d. Go back to slim (without tree recording) and read the slim output with neutral mutations (in theory in equilibrium)

> e. Include deleterious mutations from this time point and wait for those to reach ~equilibrium

> Is this a valid approach? I’m aware I’m losing all the genealogical information of the first run, but I’m not worried about that because I care more about what happens from step “d” onwards.

I'm not sure what your goal is? Or, what you mean by "valid"? I guess

that the goal is to run part of your simulation without tree sequence

recording turned on? This should work fine for that, I agree, but feel

like I'm missing something.

> 1) When doing nonWF models we often want to estimate the effective population size and figure out the N:Ne ratio. For this I normally run a long-ish simulation to achieve near equilibrium and estimate Ne=4/pi*mu. I want to do this step with treeseq instead, so I’m thinking:

> a. Calculate generation time of simulation

> b. Run a simulation with no mutations

> c. Recapitate and mutate with (mu/generation time)

> d. Calculate pi at time 0

> e. Question, should I do:

> i. Ne=4/pi*mu or

> ii. Ne=4/pi*(mu/generation time)

quantities. In the expression for neutral diversity in a randomly

mating population,

pi = 4 Ne mu,

pi is in units of mutations per bp, 4Ne is in units of generations

(since it's twice the mean coalescence time), and mu is in units of

mutations per bp per generation. Since msprime puts down mutations at

a rate of "mutations per time unit", your step (c) with mutation rate

equal to (mu / generation time) is the right one, so that mu is in

units of mutations per generation. So, I think the right answer is

(i). But, this is always confusing - you can verify this by doing this

procedure to a neutral WF model.

> 2) My second question is about doing a bit of an unconventional series of steps to do a non-neutral nonWF simulation. Neutral mutations normally take much longer to reach equilibrium than deleterious mutations, and I want to avoid running long simulations with both mutation types. So, I have been doing

> a. Run a simulation with no mutations

> b. Recapitate and mutate with (mu/generation time)

> c. Wrote a script to take the tree at time 0 and write a file in slim output

> d. Go back to slim (without tree recording) and read the slim output with neutral mutations (in theory in equilibrium)

> e. Include deleterious mutations from this time point and wait for those to reach ~equilibrium

> Is this a valid approach? I’m aware I’m losing all the genealogical information of the first run, but I’m not worried about that because I care more about what happens from step “d” onwards.

that the goal is to run part of your simulation without tree sequence

recording turned on? This should work fine for that, I agree, but feel

like I'm missing something.

Sep 27, 2021, 5:51:30 PM9/27/21

to Peter Ralph, slim-discuss

Hi Peter,

*import msprime, pyslim*

import numpy as np

from collections import Counter

TREE="pedigree_seed444_gen200_genTime.tree"

ts = pyslim.load(TREE)

nodes=ts.tables.nodes

parent_nodes = ts.tables.edges.parent

parentTime=nodes[parent_nodes].time

freq = Counter(parentTime)

Ne=np.array(list(freq.values())).mean()

N=len(ts.individuals_alive_at(0))

Ne/N

Thanks for the reply. Sorry for the slow reply I wanted to test a few things to report back. Yes, what you say makes sense and indeed the right match is (i), so mutating the tree with (mu/generation time) but calculating Ne=4/pi*mu.

What I tried was

(1) running the simulation with slim - result: **Ne/N = 0.6076236**

(2) with treeSeq without mutations and mutating in pyslim - result: **Ne/N = 0.5964987** (note that recapitate does not ask for Ne parameter anymore)

(3) recording 100 steps with treeSeq and asking how many parent nodes per step there are in average - result: **Ne/N = 0.5247862**

(4) recording the pedigree for 100 steps and using optiSel in R to get Ne - result: **Ne/N = 0.6283925**

So all results are sort of similar, I think solution #3 underestimates Ne because some steps come back with very few parents (effect of overlapping generations?), at least with my code, that might be wrong?:

import numpy as np

from collections import Counter

TREE="pedigree_seed444_gen200_genTime.tree"

ts = pyslim.load(TREE)

nodes=ts.tables.nodes

parent_nodes = ts.tables.edges.parent

parentTime=nodes[parent_nodes].time

freq = Counter(parentTime)

Ne=np.array(list(freq.values())).mean()

N=len(ts.individuals_alive_at(0))

Ne/N

Anyways, my question was about how appropriate was to use solution #2 and it seems that its good!

cheers,

Hernán

Hernan

Sep 27, 2021, 6:37:54 PM9/27/21

to Hernan Morales, slim-discuss

Hi, Hernán - thanks for reporting back! A few notes:

> (1) running the simulation with slim - result: Ne/N = 0.6076236

> (2) with treeSeq without mutations and mutating in pyslim - result: Ne/N = 0.5964987 (note that recapitate does not ask for Ne parameter anymore)

recapitate *does* take the Ne parameter; see the second paragraph at

https://pyslim.readthedocs.io/en/latest/python_api.html#pyslim.SlimTreeSequence.recapitate

> (3) recording 100 steps with treeSeq and asking how many parent nodes per step there are in average - result: Ne/N = 0.5247862

Counting number of parents doesn't actually get you an estimate of Ne.

(It should be in the right ballpark, though.) Really, what you want to

do is to find out how many children each individual had, and if the

vector of these number is C, compute

Ne = sum(C * (C-1))

(note that what you're computing is sum(C > 0).) To figure this out,

equate 1/2Ne to the probability that if you pick two genomes uniformly

at random that they inherit from the same parental genome in the

previous generation.

> So all results are sort of similar, I think solution #3 underestimates Ne because some steps come back with very few parents (effect of overlapping generations?), at least with my code, that might be wrong?:

*born* at each time, which is different than the number alive at each

time. (And, I'm assuming you remembered everyone in every generation;

otherwise you're missing a bunch.) Oh wait - you're counting the times

of birth of each parent, once for each distinct bit of genome they

passed on. I'm not sure what that will do.

- peter

> (1) running the simulation with slim - result: Ne/N = 0.6076236

> (2) with treeSeq without mutations and mutating in pyslim - result: Ne/N = 0.5964987 (note that recapitate does not ask for Ne parameter anymore)

https://pyslim.readthedocs.io/en/latest/python_api.html#pyslim.SlimTreeSequence.recapitate

> (3) recording 100 steps with treeSeq and asking how many parent nodes per step there are in average - result: Ne/N = 0.5247862

(It should be in the right ballpark, though.) Really, what you want to

do is to find out how many children each individual had, and if the

vector of these number is C, compute

Ne = sum(C * (C-1))

(note that what you're computing is sum(C > 0).) To figure this out,

equate 1/2Ne to the probability that if you pick two genomes uniformly

at random that they inherit from the same parental genome in the

previous generation.

> So all results are sort of similar, I think solution #3 underestimates Ne because some steps come back with very few parents (effect of overlapping generations?), at least with my code, that might be wrong?:

> parent_nodes = ts.tables.edges.parent

> parentTime=nodes[parent_nodes].time

Hm, I see that in your code you're computing the number of parents
> parentTime=nodes[parent_nodes].time

*born* at each time, which is different than the number alive at each

time. (And, I'm assuming you remembered everyone in every generation;

otherwise you're missing a bunch.) Oh wait - you're counting the times

of birth of each parent, once for each distinct bit of genome they

passed on. I'm not sure what that will do.

- peter

Dec 8, 2021, 8:59:38 AM12/8/21

to slim-discuss

Dear
Hernán
and SLiM users,

I was wondering how,
Hernán, what you ended up choosing to compute Ne in SLiM, and which Ne value you ended up using during the recapitation phase in pyslim/msprime?

I am currently dealing with similar issues and for now, I am using pedigree recording to get the demographic Ne from my nonWF population in SLiM.

(I am modelling a population that is in accordance with the Hill-Felsenstein model presented by Waples et al. (2011, doi: 10.1890/10-1796.1), so I could get the expected demographic Ne with the AgeNe software, the observed simulated Ne with Equation (2) from the Waples paper, and check the right concordance between the two values in order to be sure I am simulating everything the way I want to.)

However, I am trying to assess how accurate this would be to use that pedigree-based Ne to call pyslim/msprime for the recapitation phase.

My current thought is that it could be accurate under quite specific conditions (constant population size, low inter-individual variance in reproductive success, absence of strong subpopulation structure), otherwise I am not even sure that using the harmonic Ne across SLiMulated generations would be OK (maybe I am wrong?).

In your case, you said that you wanted to simulated inbreeding at some point: so, did you ended up using the pedigree-based Ne before inbreeding occurs to feed the recapitation model?

Another option I was thinking about is to choose recapitation-Ne myself (depending on the amont of diversity I want to generate for the ancestral msprime population) and then keep a "small burn-in" (e.g. ~2Ne) during the SLiM phase, Ne being the expected, pedigree-based Ne this time.

So, I would be very curious to know what you ended up doing in your case?

Thanks a lot :)

Chrys

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu