clarification on ancestry tracking with tree sequences (SLiM Manual 16.5)

156 views
Skip to first unread message

Bernard Kim

unread,
Jul 3, 2018, 3:33:32 AM7/3/18
to slim-discuss

Hi Ben,

Great new release of SLiM! There are a ton of new features that I am enjoying playing with.

One thing I've been fiddling around with is the admixture mapping with tree sequences. I was hoping to get a bit of clarification on its usage.

If I am understanding things correctly in the example in the manual (16.5), you generate two new separate subpopulations (p1, p2) and admix them into a new (p3) population. This means that all the trees in p3 have an origin from p1 or p2, or in other words all the roots are traceable back to p1 or p2, and we can use that to map local ancestry in an admixture model.

For many simulations that I'm hoping to do, I'm trying to simulate admixture between two recently divergent populations, where p1 splits into p1 and p2. After some time t I admix the two populations to form p3. If I call sim.treeSeqRememberIndividuals(sim.subpopulations.individuals) in the first generation, all genealogies are recorded as originating in p1, regardless of whether the ancestry originated in p1 or p2 at the time of admixture. For example, the following is a plot generated following the Python script in 16.5 with a 50-50 admixture proportion. All the local ancestry is tagged as originating in p1 because all the trees coalesce in p1, despite 50% of the ancestry coming from p2.


Is it possible to somehow use the tree sequences to get the previous population the lineages were in rather than the origin population? Being able to map local ancestry in this way would be amazing for a lot of simulations we are working on.


Thanks,

Bernard


Ben Haller

unread,
Jul 10, 2018, 4:58:38 PM7/10/18
to slim-discuss
Ah, I didn't notice that this question was sent to the list; I replied by private email.  Hmm, well, here are quotes from the email replies I sent to Bernard; we went back and forth a few times, and I won't try to quote all of it, but here are the most relevant bits that I wrote.  Sorry, the whole discussion ought to have been on-list.  :-O

If I understand correctly, you start with p1, it splits into p1/p2, goes for a while separately, then rejoins into p3, and you want to determine whether individuals in the final population trace their ancestry through p2 or not.  That should be quite straightforward to do.  First of all, you’d want to call treeSeqRememberAncestors() at a point in time when p1 and p2 are separate.  If they exchange no migrants, you could presumably make this call any time between when they split and just before they rejoin; I can’t see that it would make any difference precisely when.  This would ensure that each ancestry tree would contain a remembered node that would belong to either p1 or p2.  Then, when doing the final analysis, rather than looking at the subpopulation of each tree root (which might be before the p1/p2 split), you would do a walk upward through the ancestry tree, checking the subpopulation for each node visited along the way up.  If you see a p2 node on the way up, that location in that genome traces through p2; if not, not.  The appropriate APIs for doing this should all be documented in the msprime manual, I think.  If you ran long enough that coalescence occurred at every chromosome location after the time at which you remembered the p1/p2 nodes, you wouldn’t even need to do this walk upwards, I think; in that case the tree root would be directly usable, since it would be one of the remembered nodes.  So then the cookbook recipe would be more directly usable.

...

This is why you walk up the node chain rather than just looking at the subpopulation of the tree root.  The genome that descends from p2 might have a p1 node as its root, if it coalesces further back in time that the migration event, as you say.  But if you remembered all of the nodes in p2 for one generation, those remembered nodes will be in the ancestor chain.  As you walk up the chain, if you ever see a p2 node then the node you started walking upward from descends from p2 (at the given chromosome position); when you see that, you can stop the node chain walk, as you have your answer.  If you reach the root of the ancestry tree without ever seeing a p2 node, then the node you started walking upward from does not descend from p2 (at the given chromosome position).  Make sense?

...

The treeSeqRememberIndividuals() method actually causes the two genomes of each individual to be remembered, as “nodes” in msprime parlance (nodes == genomes).  The “individuals” themselves are not actually remembered; i.e., they will not appear as entries in the individuals table when you’re looking at the .trees file in Python with pyslim.  Perhaps the name of that method is therefore a bit unfortunate, in hindsight; at the time we chose that method name there was not even such a thing as an “individuals table” in msprime, and when we added the individuals table I guess it didn’t occur to us to revisit the name of that method.  But of course you don’t need the ancestral individuals to be remembered anyway, just the ancestral nodes.

  No doubt those quotes seem a bit disjointed without Bernard's end of the conversation, but they should provide all the relevant information for anyone else trying to do this sort of thing.  :->

Cheers,

Benjamin C. Haller
Messer Lab
Cornell University

dachun...@gmail.com

unread,
Dec 23, 2018, 3:03:17 AM12/23/18
to slim-discuss
Hi Ben,

I had the same question and I got an error ("ERROR (EidosInterpreter::Evaluate_Call): method treeSeqRememberAncestors() is not defined on object element type SLiMSim") after adding treeSeqRememberAncestors() with slim version 3.2
The script was shown below:

initialize() {
defineConstant("L", 1e6); 
initializeMutationRate(1e-8);
initializeMutationType("m1", 0.5, "f", 0.0);
initializeGenomicElementType("g1", c(m1), c(1));
initializeGenomicElement(g1, 0, L-1);
initializeRecombinationRate(3e-7);
        initializeTreeSeq(checkCoalescence=T);
}
1 late() { 
sim.addSubpop("p1", 5000);
}
50000 late() {
sim.treeSeqRememberAncestors();
sim.addSubpopSplit("p2", 2000, p1);
}


By the way, I did not find the Migrations in the tree sequence recording  tables

Thanks,
Chunyuan


在 2018年7月3日星期二 UTC+8下午3:33:32,Bernard Kim写道:

Peter Ralph

unread,
Dec 23, 2018, 12:18:25 PM12/23/18
to dachun...@gmail.com, slim-discuss


> I had the same question and I got an error ("ERROR (EidosInterpreter::Evaluate_Call): method treeSeqRememberAncestors() is not defined on object element type SLiMSim") after adding treeSeqRememberAncestors() with slim version 3.2

The function is treeSeqRememberIndividuals, not Ancestors, and its argument is the individuals, so you should have something like
  sim.treeSeqRememberIndividuals(p1.individuals);


> By the way, I did not find the Migrations in the tree sequence recording  tables

That's right; we have not implemented the ability to record migrations. We'd like to, and would accept help in doing this.

Ben Haller

unread,
Dec 24, 2018, 9:39:55 AM12/24/18
to slim-discuss
Yes; "treeSeqRememberAncestors()" was a typo in my reply above, sorry about that.  I think that was a name for the method that I considered originally, but in the end we decided "treeSeqRememberIndividuals()" was a better name.

Also note that the remark in my earlier post, "the “individuals” themselves are not actually remembered", is no longer accurate; as of SLiM 3.2, if I recall correctly, the individuals themselves, as well as their genomes, are in fact remembered in the individuals table in the .trees file.  Old threads like this can be useful in some ways, but should always be taken with a grain of salt since the software may have changed in the meantime.  :->

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University
Reply all
Reply to author
Forward
0 new messages