Recapitation with recombination map ends with multiple roots

72 views
Skip to first unread message

Guillaume Lan-Fong

unread,
Nov 3, 2021, 6:28:59 AM11/3/21
to slim-discuss

Hello everyone,

First of all, thanks a lot for the amazing software that is SLiM!
I’m having an issue, quite similar to this one (https://groups.google.com/g/slim-discuss/c/W8wEYj9UNnw/m/ozG3MM73BQAJ).

I’m running some forward SLiM simulation (neutral simulations for now, but I’ll add some beneficial mutation later on) using a recombination map in order to emulate chromosomes and recapitate the trees using pyslim. However, I can’t seems to get the trees to get down to a single root and I always end up with 2 roots after recapitation.

I’ve included both my SLiM script and my python script. As they’re both part of a bigger pipeline, they might need some tweaks in order to be used for test purpose – even though I don’t think they’re especially original or hard to modify.

Oh, and here's an example of a rec_map.txt :

200_rec_map.txt

chr_size    rec_rate
17000    1e-11
3000    1e-10
44000    1e-7
44000    1e-8

- SLiM script -

//// A HELPER FUNCTION FOR SETTING CONSTANTS THAT MIGHT BE CONFIGURED VIA COMMAND LINE.

function (void) defineCfgParam(string$ name, lifs value) {

if (!exists(name))

defineConstant(name, value);

}

// set up a simple neutral simulation

initialize() {

initializeTreeSeq();

defineCfgParam("sim_id", "test1");

defineCfgParam("Ne", 500);

defineCfgParam("mu", 1e-8);

defineCfgParam("s", 0.5);

initializeMutationRate(0);

// m1 mutation type: neutral

initializeMutationType("m1", 0.5, "f", 0.0);

// g1 genomic element type: uses m1 for all mutations

initializeGenomicElementType("g1", m1, 1);

// create "chromosomes" from "../parameters/sim_id_rec_map.txt" file

lines = readFile("../parameters/" + asString(sim_id) + "_rec_map.txt");

header = strsplit(lines[0], "\t");

if (header[0] != "chr_size"

| header[1] != "rec_rate") {

stop("Unexpected format!");

}

x = 0;

rates = NULL;

ends = NULL;

nchrs = length(lines) - 1;

for (line in lines[1:nchrs]) {

components = strsplit(line, "\t");

ends = c(ends, x + asInteger(components[0]), x + asInteger(components[0]) +1);

rates = c(rates, asFloat(components[1]), 0.5);

x = x + asInteger(components[0]) +1;

}

// remove the last base of the chr (if not, last base with 0.5 recombination rate)

rates = rates[0:(length(rates)-2)];

ends = ends[0:(length(ends)-2)];

// check the rates and ends

print(rates);

print(ends);

initializeGenomicElement(g1, 0, ends[length(ends)-1]);

initializeRecombinationRate(rates, ends);

}

// create a population of Ne individuals with selfing rate = s

1 {

sim.addSubpop("p1", Ne);

p1.setSelfingRate(s);

}

// output samples of 10 genomes periodically, all fixed mutations at end

//1000 late() { p1.outputSample(10); }

//2000 late() { p1.outputSample(10); }

2000 late() {

sim.treeSeqOutput("../results/" + asString(sim_id) + ".trees");

}

- python script -

# Keywords : python, tree-sequence recording, tree sequence recording, SLiM, ms, recombination, recombination map

# Input : SLiM simulation (.trees file, tree-seq recording)

# Output : sample of diploids in ms format

def trees2ms(sim_id):

# load the .trees file created by SLiM after the forward simulation

ts = pyslim.load("../results/" + str(sim_id) + ".trees")

ends = []

rates = []

x = 0

# use the recombination_map.txt file in order to create a recombination map "in ms format"

with open("../parameters/" + str(sim_id) + "_rec_map.txt", "r") as file:

header = file.readline().strip().split("\t")

assert(header[0] == "chr_size" and header[1] == "rec_rate")

for line in file:

components = line.split("\t")

ends.append(x + float(components[0]))

ends.append(x + float(components[0]) + 1)

rates.append(float(components[1]))

rates.append(0.5)

x = x + float(components[0]) + 1

# add a 0 at the beginning of positions lists : ms != SLiM when it comes to the way positions and ends are used

ends.insert(0, 0)

# add a 0 at the end of rates, as we've got one more "position"

rates.append(0.0)

# create the actual recombination map used by the .recapitate() function

recomb_map = msprime.RecombinationMap(ends, rates)

## see "https://pyslim.readthedocs.io/en/latest/tutorial.html#recapitation-with-a-nonuniform-recombination-map" for more info

rts = ts.recapitate(recombination_map=recomb_map, Ne=Ne)

rts.dump("../results/" + str(sim_id) + "_recap.trees") # save the recapitated trees

#assert(max([t.num_roots for t in rts.trees()]) == 1) # Il trouve 2 roots :/

rts = pyslim.SlimTreeSequence(msprime.mutate(rts, rate=mu, keep=True))

rts.dump("../results/" + str(sim_id) + "_recap_mut.trees") # save the recapitated & mutated trees

# Sample 20 individuals from the population

samp = 20

alive = rts.individuals_alive_at(0)

sample_inds = np.random.choice(alive, samp, replace=False)

# Simplify tree sequence to contain only 20 individuals

keep_nodes = []

for i in sample_inds:

keep_nodes.extend(rts.individual(i).nodes)

srts = rts.simplify(keep_nodes)

# Write the ms output

ms_output = open("../results/" + str(sim_id) + "_ms.txt", "w")

tskit.write_ms(srts, ms_output)

ms_output.close()

return()

if __name__ == '__main__' :

import msprime, tskit, pyslim, sys

import numpy as np

if (len(sys.argv) != 5) :

sys.exit("Argument missing")

else :

for arg in sys.argv:

if "=" in arg :

arg = arg.split("=")

if arg[0]== "sim_id":

sim_id = str(arg[1])

if arg[0] == "Ne":

Ne = int(arg[1])

if arg[0] == "mu":

mu = float(arg[1])

if arg[0]== "samp":

samp = int(arg[1])

trees2ms(sim_id)


I have tried to use the development version of msprime but it doesn’t seem to fix the issue (I had quite a difficult time setting this version up, so it might just be an error on my end regarding msprime) so here I am.

If that's not it, I really don’t see what might be the issue so if anyone could help, it would be very much appreciated !

Cheers,
G. L-F

Peter Ralph

unread,
Nov 5, 2021, 1:20:18 AM11/5/21
to Guillaume Lan-Fong, slim-discuss
Hi, Guillaume - I've run your code - as well as I can, since I've not
got all the required files (so, commenting out the recombination map
bits), and I don't get any trees with multiple roots. Are you sure? I
could debug more if you make an example that I can just run that
produces the problem and is as simple as possible?

- peter
> --
> SLiM forward genetic simulation: http://messerlab.org/slim/
> ---
> You received this message because you are subscribed to the Google Groups "slim-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/f9c17c46-cd98-4b46-84fb-604bdad11192n%40googlegroups.com.

Guillaume Lan-Fong

unread,
Nov 9, 2021, 5:18:14 AM11/9/21
to slim-discuss
Hi Peter,

Sorry for not answering sooner, I hadn't seen your response.

I think the issue might actually come from the recombination map bits, as I don't get any trees with multiple roots without it on my end either.

So I have created a little example folder for anyone interrested in trying to understand the issue.
While inside the "script" folder, you can simply launch it by running self.sh file and it should make the simulations and the recapitation (the scripts are certain

For example, if I run it with parameters like this : bash self.sh 100 0.5 1e-7 500 20, I get 2 roots about half of the time.
Oh, and there's an assertion in the trees2ms.py script which should output an error if more than 1 root is found, but I also print the number of roots before just in case.

G. L-F
debug_example.tar.gz

Peter Ralph

unread,
Nov 10, 2021, 11:56:06 AM11/10/21
to Guillaume Lan-Fong, slim-discuss
> For example, if I run it with parameters like this : bash self.sh 100 0.5 1e-7 500 20, I get 2 roots about half of the time.

Hm - I don't get more than one root, and I've run it about 100 times,
with the current stable releases of SLiM, msprime, tskit, and pyslim.
I think you've got an out-of-date package slipped in there somewhere.
My first guess is that it's an older SLiM (which is on 3.6 now - I
didn't think 3.5 would cause this to happen, but I may be wrong), but
making sure your python packages are updated and you're using the
versions you expect is a good idea.

Let us know if that doesn't fix the problem (probably off-list).

- peter

Guillaume Lan-Fong

unread,
Nov 16, 2021, 8:40:17 AM11/16/21
to slim-discuss
Hello Peter,

Thank you for you patience. Trying to update each package independently didn't solve the issue, but I have setup an entirely new environment with the lastest versions of SLiM, msprime, tskit and pyslim as you've suggested and everything seems to work perfectly now!

I'm still quite curious as to why things weren't running with SLiM 3.5 as the 3.6 update doesn't seems to add, change or remove anything related to my issue... plus I think that SLiM 3.6 is still supposed to be retrocompatible with SLiM 3.5 isn't it?

Anyway, thanks a lot for your help,

G. L-F

Ben Haller

unread,
Nov 16, 2021, 9:53:55 AM11/16/21
to slim-discuss
Hi Guillaume.  It will probably remain a mystery.  Of course updating each package independently ought to be exactly the same thing as setting up a new environment with new versions of everything, so something must have been out of sync with your previous setup, but who knows what.  It may be that you had multiple installs of something on your machine, and that the version you thought you were using was not, in fact, the version you were getting due to search path weirdness or who knows what; that seems to happen to me with Python packages with some regularity.  SLiM 3.6 is backward-compatible with 3.5, yes, but that doesn't mean they are exactly equivalent from, e.g., pyslim's perspective.  They do save different file format versions of .trees files, for example, which pyslim might interpret in different ways.  Of course a version mismatch that presents problems for interoperability ought to give an error message rather than producing incorrect results, so there is a bug somewhere, but without even knowing which package had the incorrect version, it is very hard to guess where!  :-O  It's very hard to deal with all the different internal file versions and assure that mixing and matching different versions within the software stack always behaves as it ought to; the safe thing for end users is to always update your whole setup to the latest version of everything simultaneously, and then don't touch it until you're ready to update your whole stack again.  Unfortunately this is a problem across software engineering, which nobody seems to have found a really good solution for.

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Peter Ralph

unread,
Nov 16, 2021, 4:06:24 PM11/16/21
to Ben Haller, slim-discuss
Glad it worked! In earlier versions of SLiM the entire first generation was retained, to allow recapitation, while in later versions we didn't need to do that any more, and only retain those that are ancestors of the final generation. This caused the sort of thing you are observing, so it's my guess you were using that version of SLiM. It *is* all backwards-compatible, in that the genotypes and relationships among the samples at the final generation would all be the same; it's just that in earlier versions of SLiM there were extra individuals hanging around that led to confusing things like this. (Notice that "the root of the tree" is affected by the presence of extra individuals, while "the MRCA of all my samples" is not.)

-peter

--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.

Guillaume Lan-Fong

unread,
Nov 22, 2021, 9:17:58 AM11/22/21
to slim-discuss
Thanks to both of you for your detailled answers and your advices :)

Cheers,
G. L-F
Reply all
Reply to author
Forward
0 new messages