Is it possible to somehow use the tree sequences to get the previous population the lineages were in rather than the origin population? Being able to map local ancestry in this way would be amazing for a lot of simulations we are working on.
Thanks,
Bernard
If I understand correctly, you start with p1, it splits into p1/p2, goes for a while separately, then rejoins into p3, and you want to determine whether individuals in the final population trace their ancestry through p2 or not. That should be quite straightforward to do. First of all, you’d want to call treeSeqRememberAncestors() at a point in time when p1 and p2 are separate. If they exchange no migrants, you could presumably make this call any time between when they split and just before they rejoin; I can’t see that it would make any difference precisely when. This would ensure that each ancestry tree would contain a remembered node that would belong to either p1 or p2. Then, when doing the final analysis, rather than looking at the subpopulation of each tree root (which might be before the p1/p2 split), you would do a walk upward through the ancestry tree, checking the subpopulation for each node visited along the way up. If you see a p2 node on the way up, that location in that genome traces through p2; if not, not. The appropriate APIs for doing this should all be documented in the msprime manual, I think. If you ran long enough that coalescence occurred at every chromosome location after the time at which you remembered the p1/p2 nodes, you wouldn’t even need to do this walk upwards, I think; in that case the tree root would be directly usable, since it would be one of the remembered nodes. So then the cookbook recipe would be more directly usable.
This is why you walk up the node chain rather than just looking at the subpopulation of the tree root. The genome that descends from p2 might have a p1 node as its root, if it coalesces further back in time that the migration event, as you say. But if you remembered all of the nodes in p2 for one generation, those remembered nodes will be in the ancestor chain. As you walk up the chain, if you ever see a p2 node then the node you started walking upward from descends from p2 (at the given chromosome position); when you see that, you can stop the node chain walk, as you have your answer. If you reach the root of the ancestry tree without ever seeing a p2 node, then the node you started walking upward from does not descend from p2 (at the given chromosome position). Make sense?
The treeSeqRememberIndividuals() method actually causes the two genomes of each individual to be remembered, as “nodes” in msprime parlance (nodes == genomes). The “individuals” themselves are not actually remembered; i.e., they will not appear as entries in the individuals table when you’re looking at the .trees file in Python with pyslim. Perhaps the name of that method is therefore a bit unfortunate, in hindsight; at the time we chose that method name there was not even such a thing as an “individuals table” in msprime, and when we added the individuals table I guess it didn’t occur to us to revisit the name of that method. But of course you don’t need the ancestral individuals to be remembered anyway, just the ancestral nodes.