merging multiple tree sequences into a single demographic model

39 views
Skip to first unread message

Max Shpak

unread,
Jul 15, 2022, 5:30:00 PM7/15/22
to slim-discuss
I'm modeling independent selective sweeps in three subpopulations with genealogy {1,{2,3}} and conditioning on fixation of the beneficial allele in all three. Because this is done by rerunning the simulation each time the allele is lost, if the expected number of iterations for a single population is K, then it will be K^3 when conditioning on all three.

To get around this problem, I simulate three different tree sequences which I need to piece together into a single genealogy (followed by recapitation and mapping of neutral mutations). The difficulty I'm having is with the first step. Given three tree sequences ts1, ts2, ts3, how does one create a single ts consistent a specified demographic history?

The code below is what I would like to do, given the appropriate SOME_MERGE_FUNCTION(ts1,ts2,ts3) that would give me the merged ts. It seems that union() in tskit can do something like this, but I can't find an example of the correct syntax to implement what I need using union(). Is there another method that would work?

ts1 = tskit.load("T1.trees")
ts2 = tskit.load("T2.trees")
ts3 = tskit.load("T2.trees")

ts = SOME_MERGE_FUNCTION(ts1,ts2,ts3) ???

ts_new = pyslim.SlimTreeSequence(ts)

demog = msprime.Demography()
demog.add_population(name = "1", initial_size = 20000, initially_active=True)
demog.add_population(name = "2", initial_size = 10000, initially_active=True)
demog.add_population(name = "3", initial_size = 10000)
demography.add_population_parameters_change(time=1000, population="2", initial_size=5000)
demog.add_population_split(time=1000, derived=["3"], ancestral="2")
demog.add_population_split(time=2000, derived=["2"], ancestral="1")

rts = pyslim.recapitate(ts_new, recombination_rate = 1.14e-8, gene_conversion_rate = 5.9e-8, gene_conversion_tract_length = 518, ancestral_Ne=20000, random_seed = ancestry_seed)

mts = msprime.mutate(rts, rate = 1.22e-8, keep=True, random_seed = mutation_seed, model=msprime.SLiMMutationModel(type=0))

Peter Ralph

unread,
Jul 19, 2022, 5:24:32 PM7/19/22
to Max Shpak, slim-discuss
Hi, Max! Good question - and, I think it's answered here:
... in that example, we work through simulating independent branches of a phylogeny separately, then pasting them back together.

Peter


From: slim-d...@googlegroups.com <slim-d...@googlegroups.com> on behalf of Max Shpak <shpa...@gmail.com>
Sent: Friday, July 15, 2022 2:29 PM
To: slim-discuss <slim-d...@googlegroups.com>
Subject: merging multiple tree sequences into a single demographic model
 
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/7c7acfbd-d692-4781-a10d-9750b8b704d4n%40googlegroups.com.

Max Shpak

unread,
Jul 19, 2022, 5:33:31 PM7/19/22
to slim-discuss
I found that example, but I don't think that the approach will work in my case. Namely, the union() function in tskit requires a node_map input argument. In my case, I do not have any nodes, I'm generating ts1,ts2, and ts3 as separate populations in Slim and just want to joint them into a single tree sequence, subsequently mapping on the demographic history.

So using something like
ts = ts1.union(ts2, node_map)
ts = ts.union(ts3,node_map)

will not work for me, because I have no node map, and setting node_map = None throws an error. Is there any other way to combine three tree sequences into a single ts while keeping the labels as subpopulations, and then applying the demographic history via the recapitation step?

Peter Ralph

unread,
Jul 19, 2022, 5:41:35 PM7/19/22
to Max Shpak, slim-discuss
Hi, Max - ah, that is a bit different. Let's see - from the
documentation (https://tskit.dev/tskit/docs/stable/python-api.html#tskit.TreeSequence.union),
"the node_mapping argument should be an array of length equal to the
number of nodes in other and whose entries are the ID of the matching
node in self, or tskit.NULL if there is no matching node. ". So, in
your case you have no shared nodes, i.e., all entries of the node map
should be tskit.NULL. So, this works, for instance:

ts1.union(ts2, [tskit.NULL for _ in ts2.nodes()])

Perhaps this should be an example somewhere!

* peter
> To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/638a61a9-29c3-4709-ab5a-270b0f613ca5n%40googlegroups.com.
Message has been deleted

Peter Ralph

unread,
Jul 22, 2022, 5:08:16 PM7/22/22
to Max Shpak, slim-discuss
Hm - there will be in SLiM v4 - but, can't you just specify your
demography in msprime using the names they have, like
demog.add_population(name = "p1", initial_size = 20000, initially_active=True)
instead?

On Fri, Jul 22, 2022 at 1:18 PM Max Shpak <shpa...@gmail.com> wrote:
>
> I successfully implemented the union function with the null argument. However, I have one problem: I'm merging three subpopulations, e.g.
> tsnew = ts1.union(ts2, NULL...)
> tsnew = tsnew.union(ts3, NULL...)
>
> I named the populations corresponding to ts1,ts2,ts3 p1, p2, p3 in SLiM. However, when I specify the demographic history, I cannot use numeric designations,
>
> demog = msprime.Demography()
> demog.add_population(name = "A", initial_size = 20000, initially_active=True)
> demog.add_population(name = "B", initial_size = 10000, initially_active=True)
> demog.add_population(name = "C", initial_size = 10000)
> demog.add_population_parameters_change(time=1000, population="B", initial_size=5000)
> demog.add_population_split(time=1000, derived=["C"], ancestral="B")
> demog.add_population_split(time=2000, derived=["B"], ancestral="A")
>
> # recapitation
> rts = pyslim.recapitate(ts_new, recombination_rate = 1.14e-8, gene_conversion_rate = 5.9e-8, gene_conversion_tract_length = 518, ancestral_Ne=20000, random_seed = ancestry_seed)
>
> Is there some way to set the demography in such a way that ts1 gets assigned to population A, ts2 to B, ts3 to C? I need to keep track of this because different subpopulations experience distinct evolutionary histories.
> To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/d5348619-7d35-4c1b-bacb-671c82bba6c9n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages