Question about the randomization of mating

Zuxi Cui

unread,

May 25, 2022, 7:57:26 AM5/25/22

to slim-discuss

Hi there,

I have some questions about the random pattern of mating within SliM.

1. With fixed random seed, sample size, and number of generation runs, does SliM use the same

mating pattern, even using nucleotide data of different chromosomes?

2. Is "setSeed" in the script the same function as "-seed" on the command line? From the manual, SliM seems to have this only option to set a global random seed. Is there a way to set seeds for mating, mutation, and recombination separately?

I'm working on a whole-genome simulation and need to use the same mating pattern for all chromosomes.

Best,

Terry

Ben Haller

unread,

May 25, 2022, 8:46:28 AM5/25/22

to Zuxi Cui, slim-discuss

Hi Terry!

If I understand your question (1) correctly, the answer is yes. SLiM is (like most software) completely deterministic; given the same random number seed, it will reproduce exactly the same run in every detail unless your script does something to change that (like setting the seed based upon the system clock, or some such external source of non-deterministic behavior).

As for (2), yes, setSeed() sets the same random number generator seed, in the same way, as the `-seed` command-line option. The only difference is that you can call setSeed() at any time, not just at the start of simulation.

And no, there is no way to set seeds for mating, mutation, and recombination separately. SLiM uses a single source of random numbers for all purposes. I'm not sure what that would even mean, in general; for example, if a change in the "mating seed" meant that an additional offspring was produced, compared to the original "mating seed", then that would mean that more random number draws would be needed to govern the mutation and recombination for that additional offspring, so the pattern of mutation and recombination would necessarily diverge as well. But since you say that your goal is to force a particular "mating pattern", perhaps section 16.12 of the manual will be of interest? If not, please be more specific about what it is you are trying to achieve; I'm not sure what you mean by a "mating pattern", since the pattern of mating – who mates with whom – would naturally be the same for all chromosomes in a multiple-chromosome SLiM simulation anyway.

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

Zuxi Cui wrote on 5/25/22 7:57 AM:

--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/slim-discuss/CAB03vn8%2BFom1zVU2-05ZoEhpb_R9zPM4eDyRbFXpy3U%2BcxUJ4g%40mail.gmail.com.

Zuxi Cui

unread,

May 25, 2022, 9:41:54 AM5/25/22

to Ben Haller, slim-discuss

Hi Ben,

The mating pattern I mentioned is referring to the pedigree of each simulated sample

For example, if I use samples i1, i2...iN as input and simulate several offspring o1, o2... oM.

I want the mating to be fixed as

o1 being offspring of i1, i2

o2 being offspring of i1, i3

o3 being offspring of i4, i3

...

for separate runs with different numbers of mutations, recombinations, and within different chromosomes.

Is there an example of a multiple-chromosome simulation of nucleotide data? I haven't tried it before.

I used to simulate by chromosomes. My concern is that the data size is huge for whole-genome genotype simulations.

I was trying to have different runs of simulation by chromosomes but add some control to keep the same mating pattern.

I'm targeting up to 300 generations and 30,000 sample sizes, and I'm not sure if 16.12 does what I want.

Terry

Ben Haller

unread,

May 25, 2022, 9:59:26 AM5/25/22

to Zuxi Cui, slim-discuss

Hi Terry,

Ah, I see – you need to simulate each chromosome with a separate SLiM simulation to keep the memory footprint reasonable, and so you need to force the same mating pattern across all of those simulations (and, as you say, also for other parameter variations). I think the recipe of section 16.12 ought to do what you want; I don't see why not. The pedigree files emitted by the first stage will probably be very large, though, and the step where recipe 16.12 narrows down to the events of the current tick might be quite slow. You might need to redesign 16.12 to dump a separate file for each generation, or something, to alleviate those issues; but that shouldn't be hard to do.

An alternative approach would be to use the techniques of 16.12 to force a particular pattern of mating, but have that pattern come from some kind of algorithmic process that you can run in-script, rather than from a mating plan read from a file. This is where having a separate random number generator that you could use to control mating would come in handy; I see what you mean. It is conceivable that I could add a new Eidos class that would wrap a separate random number generator, with its own seed and a method for drawing new samples from it; you could use that class to control the mating process in your model, getting a reproducible mating pattern regardless of what SLiM's random number generator is doing. Interesting idea.

Multiple chromosomes with a nucleotide-based model would work just the same way as multiple chromosomes with a regular model, as far as SLiM's mechanics are concerned. You would just combine the techniques of section 6.1.4 with whatever else you are doing. :->

So. Why don't you explore adapting recipe 16.12 to your purposes; it is really intended to solve exactly the sort of problem that you are grappling with. If it turns out to be unworkable for a model of the scale you're simulating, even with a separate file for each generation (which would be surprising to me), then I'll consider adding an Eidos RNG class that you can use to reproducibly control the mating pattern; but that will be a fair bit of work, and I'm not sure it will be useful for anything else besides this specific problem really, so I'd prefer not to. (If anybody else reading this would also find a standalone RNG class in Eidos to be useful for some reason, please do weigh in.) Sound good?

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University

Zuxi Cui wrote on 5/25/22 9:41 AM:

Zuxi Cui

unread,

May 25, 2022, 10:03:35 AM5/25/22

to Ben Haller, slim-discuss

Sure. I'll play with 16.12 and reach back to you if I have further questions.

Terry

Zuxi Cui

unread,

Jul 6, 2022, 1:41:56 PM7/6/22

to Ben Haller, slim-discuss

Hi Ben,

I still cannot fully understand and make it work with mateChoice() callback. Can you show a simple example code so I can follow it?

Say, I have 5 samples at the beginning tagged as s0.1-s0.5, where s0 represents the sample in generation 0 (reference) and the number after the dot is the sample id in that generation.

I want to customize the mating by 3 generations with a mating matrix as following:

generation1:

s0.1+s0.2,s0.1+s0.3,s0.1+s0.4,s0.2+s0.5,s0.3+s0.5

generation2:

s1.1+s1.2,s1.1+s1.3,s1.1+s1.4,s1.2+s1.5,s1.4+s1.5

generation3:

s2.1+s2.2,s2.1+s2.3,s2.1+s2.4,s2.2+s2.5,s2.4+s2.5

where the first simulated sample in generation 1 is the offspring of sample 1 and 2 in the reference.

What would the code be with mateChoice() and how should I input the mating matrix?

Terry

Ben Haller

unread,

Jul 6, 2022, 5:22:58 PM7/6/22

to Zuxi Cui, slim-discuss

Hi Terry. Recipe 16.12, which I pointed you to before, is a nonWF model and does not use a mateChoice() callback; in fact, mateChoice() callbacks can only be used in WF models. In nonWF models, reproduction is instead handled by reproduction() callbacks. You might want to have another read through recipe 16.12 to understand all of this better; but here is an example model that arranges matings as you described:

initialize() {
    initializeSLiMModelType("nonWF");
    initializeMutationType("m1", 0.5, "f", 0.0);
    m1.convertToSubstitution = T;
    initializeGenomicElementType("g1", m1, 1.0);
    initializeGenomicElement(g1, 0, 99999);
    initializeMutationRate(1e-7);
    initializeRecombinationRate(1e-8);
}
function (void)mateTaggedIndividuals(i$ parent1Tag, i$ parent2Tag, i$ childTag) {
    parent1 = p1.subsetIndividuals(tag=parent1Tag);
    parent2 = p1.subsetIndividuals(tag=parent2Tag);
    child = p1.addCrossed(parent1, parent2);
    child.tag = childTag;
}
2 reproduction() {
    // generation1: s0.1+s0.2,s0.1+s0.3,s0.1+s0.4,s0.2+s0.5,s0.3+s0.5
    mateTaggedIndividuals(1, 2, childTag=1);
    mateTaggedIndividuals(1, 3, childTag=2);
    mateTaggedIndividuals(1, 4, childTag=3);
    mateTaggedIndividuals(2, 5, childTag=4);
    mateTaggedIndividuals(3, 5, childTag=5);
    self.active = 0;
}
3 reproduction() {
    // generation2: s1.1+s1.2,s1.1+s1.3,s1.1+s1.4,s1.2+s1.5,s1.4+s1.5
    mateTaggedIndividuals(1, 2, childTag=1);
    mateTaggedIndividuals(1, 3, childTag=2);
    mateTaggedIndividuals(1, 4, childTag=3);
    mateTaggedIndividuals(2, 5, childTag=4);
    mateTaggedIndividuals(4, 5, childTag=5);
    self.active = 0;
}
4 reproduction() {
    // generation3: s2.1+s2.2,s2.1+s2.3,s2.1+s2.4,s2.2+s2.5,s2.4+s2.5
    mateTaggedIndividuals(1, 2, childTag=1);
    mateTaggedIndividuals(1, 3, childTag=2);
    mateTaggedIndividuals(1, 4, childTag=3);
    mateTaggedIndividuals(2, 5, childTag=4);
    mateTaggedIndividuals(4, 5, childTag=5);
    self.active = 0;
}
1 early() {
    sim.addSubpop("p1", 5);
    p1.individuals.tag = 1:5;
}
survival() {
    // kill the parental generation
    if (individual.age > 0)
        return F;
    return NULL;
}
4 late() { }

Of course this could be modified so that the mating pattern is driven by tables instead of being hard-coded in the script itself; that is essentially what recipe 16.12 demonstrates. But I've made the mating code explicit above to make it more clear what is going on. Also, this code uses tag values, but one could design the code to use pedigree IDs instead, which might be particularly convenient if one were using tree-sequence recording. I used a user-defined function to do the tag lookups and such, but there's no particular need for that, and it's a bit inefficient; with a table-driven design you probably wouldn't do that any more. As usual with scriptability, there are many ways to achieve the same goal, but I hope this points you in the right direction.