Simulating two loci incompatibility

88 views

Skip to first unread message

maud duranton

unread,

Apr 19, 2021, 5:08:33 AM4/19/21

to slim-discuss

Hi everyone,

I am trying to implement an admixture model with two populations hybridizing to create a third one. In each population there are neutral mutations that are incompatible with a certain mutation present in the other population. The idea is to have pairs of incompatibility, so mutations are deleterious only when present in the same genome with another specific mutation. To do so I am creating different types of mutations using, for example:

     initializeMutationType("m2", 0.5, "f", 0);
     m2.mutationStackPolicy = "l";

     initializeMutationType("m3", 0.5, "f", 0);
     m3.mutationStackPolicy = "l";

Then I randomly place each mutation along the genome of my populations using :

Indp2.genomes.addNewDrawnMutation(m2,rdunif(1,22500002,119000003));

Indp1.genomes.addNewDrawnMutation(m3,rdunif(1,22500002,119000003));

Finally, for the two mutations to be deleterious when present in the same individual I am using a fitness function counting the number of mutations of type m2 and m3 present in the same individual :

fitness(m2){
Nb_m2=asInteger(genome1.countOfMutationsOfType(m2)) + asInteger(genome2.countOfMutationsOfType(m2));
Nb_m3=asInteger(genome1.countOfMutationsOfType(m3)) + asInteger(genome2.countOfMutationsOfType(m3));

and then depending on the type of mutations and their number, the fitness of the individual decrease of a certain amount.

I am using a script to write the slim code depending on the number of incompatibility I want to simulate and the fitness reduction I want to have. I ran the simulations for 100 incompatibility (meaning 200 different mutations type in total) and it worked fine. However, I am now trying to increase the number of incompatibility and I always get an error :

line 49: 975979 Segmentation fault (core dumped) slim Model.slim

If I understand this correctly, this is due to memory issues. I was wondering if there is a limit to the number of mutations types that we can define or if there is another function that I may use instead of initializeMutationType.

Thanks in advance,

Best wishes,

Maud

Ben Haller

unread,

Apr 19, 2021, 1:20:59 PM4/19/21

to slim-discuss

Hi Maud!

Your approach is not unreasonable, but probably doesn't scale well, yeah. You don't say how many mutation types you're trying to create. There's no limit hard-coded into SLiM (if there were, you'd probably get an error message rather than a segfault when you try to create a very large number), but certainly each new mutation type uses additional memory, and at some point, if your system has a process memory limit, you'd hit that limit. I tried a test model with a million mutation types, and the memory usage just for those mutation types was about 305 MB, so if you're trying to create 10 million or more, that might cause you to exceed a 4 GB memory limit all by itself. Then there will also be the memory usage of all of the mutations you create, of course; if you have a large population size and you're giving a large number of mutations to each individual, that will add up too. And there is other less obvious memory overhead to creating so many mutation types, such as the Eidos symbols for all of them (m1, m2, m3, ...), which does add up (looks like about 64 MB in my test with a million mutation types). You can see where memory is getting used with the profiling feature of SLiMgui, or by using the sim.outputUsage() method if you're running at the command line; see section 20.6 of the manual.

So, the question is, what's a better approach?

I would recommend not using separate mutation types for each individual pair of epistatic interactions. That requires a huge number of mutation types, as you've found. It also makes calculating the epistatic interactions rather cumbersome; for each such interaction you're doing four calls to countOfMutationsOfType(), and each of those calls has to search through the whole target genome looking for mutations of the given type, so if you have a large number of these interactions you're doing an extraordinary number of these genome searches, and it will become extremely slow. (Also, incidentally, I don't think you want to be doing that work in a fitness(m2) callback; that will perform all of the work for *every* m2 mutation, over and over, for each m2 mutation in each individual. It will be massively slow, and I don't think it will give you the right answer, either, unless I'm misunderstanding how your code snippets fit together. See below for a better approach.)

Instead, I would define just two mutation types for all of your epistatic interactions: let's call them m2 and m3. Then use the tag property on the mutations you create, to designate which pairs of m2/m3 mutations "match up" and create epistasis. When you create your epistatic pairs, which I imagine you do in a for loop, you could just use the integer index of the for loop as the tag value. The addNewDrawnMutation() method returns the mutation object created; so you'd do something like (just typing into email here):

for (index in seqLen(10000))

{

m2mut = Indp2.genomes.addNewDrawnMutation(m2,rdunif(1,22500002,119000003));

m3mut = Indp1.genomes.addNewDrawnMutation(m3,rdunif(1,22500002,119000003));

m2mut.tag = index;

m3mut.tag = index;

}

Now you have a matching pair that you can find by tag value. Next, you want to implement the fitness effect of all this. I'd use the fitnessScaling property to assign that fitness effect, in an early() or late() event as appropriate for your model type (you don't say whether your model is WF or nonWF). Assuming nonWF, it goes in an early() event, and might look something like this (again, just typing into email; test and debug this):

early() {

for (ind in p3.individuals) {

m2tags = ind.genomes.mutationsOfType(m2).tag;

m3tags = ind.genomes.mutationsOfType(m3).tag;

epistaticEffects = sum(match(m2tags, m3tags) >= 0);

ind.fitnessScaling = someFunctionOf(epistaticEffects);

}

The call to match() will very efficiently (especially in SLiM 3.6) tell you which m2tags in the focal individuals have a match in the m3tags. Doing >= 0 converts the result of match into logical (see the documentation for match() in the Eidos manual for discussion of this). And then sum() gives you the total number of epistatic matches. Note that these are the *first* matches in the genomes of the individual, so I *think* it would be treating the epistatic mutations as fully dominant in this implementation (but double-check that). You could modify the code to have them work in other ways, with a bit of head-scratching, I think; you might want to count matches specifically between genome1 and genome1, versus matches between genome1 and genome2, versus matches between genome2 and genome1, versus matches between genome2 and genome2, I think? And maybe use which() and tabulate() function to summarize the results into a number of interactions for each epistatic pair, based on the zygosity of each? I leave reasoning all that out as an exercise for the reader; it will be more complicated, but should be reasonably tractable and fast, I think. :-> If you do want fully dominant epistatic mutations, the Individual method uniqueMutationsOfType() might also be worth using for greater efficiency; I think it would be better than ind.genomes.mutationsOfType(), for performance reasons (less work for match() to do).

So, there's some work to be done to fill in the gaps, but I think this approach ought to work, and should scale up to a large number of mutations *much* better than your existing approach. Of course at some scale you will still run out of memory; if you're running on a cluster, there is usually some way to request a higher memory limit for your process.

If you get stuck with this, feel free to follow up on-list if you think it will interest others, or off-list if not. Happy modelling!

Cheers,

-B.

Benjamin C. Haller

Messer Lab

Cornell University

Reply all

Reply to author

Forward

0 new messages