assign gene conversion positions

34 views
Skip to first unread message

Sebastian Ramos-Onsins

unread,
Feb 24, 2025, 9:12:14 AM2/24/25
to 'Ben Haller' via slim-discuss
Dear Ben,

I want to assign new gene conversion events using the recombination() callback. 
I find that the pattern obtained using the simple definition 

initializeRecombinationRate(1e-7);

initializeGeneConversion(1.0,500,1.0); //only gene conversion, mean 500 and only simple gc events.


is very different that if I use recombination() callback to assign breakpoint positions (in that case, the pattern seems more compatible to crossing-over events than to gene conversion events).

I would like to know how can I obtain the same pattern using recombination() callback.

For example, if I use this code: (that is, I include the recombination() callback, but I use only the breakpoints assigned by the application to obtain the gene conversion events, in fact is the same that no using the recombination() callback)

initialize() {

//mutation rate

initializeMutationRate(1e-7);

// m1 mutation type: neutral

initializeMutationType("m1", 0.5, "f", 0.0);

// g1 genomic element type: uses m1 for all mutations

initializeGenomicElementType("g1", m1, 1.0);

// chromosome of length 1 Mb with uniform gene conversion

initializeGenomicElement(g1, 0, 999999);

initializeRecombinationRate(1e-7);

initializeGeneConversion(1.0,500,1.0);

}

recombination() {

n_new_breakpoints=rpois(1,2*1e-7*1e6);

new_breakpoints=asInteger(runif(n_new_breakpoints,0,999999));

//breakpoints=sort(new_breakpoints);

breakpoints=sort(breakpoints);

return T;

}

// create a population of 1000 individuals

1 early() {

sim.addSubpop("p1", 1000);

}

//end

10000 late() {

sim.simulationFinished();

}


the pattern of frequencies is very linked:


Screenshot 2025-02-24 at 14.55.13.png


Instead, If I use new breakpoints with the same rate and the same average number of events, the pattern is completely different:


initialize() {

//mutation rate

initializeMutationRate(1e-7);

// m1 mutation type: neutral

initializeMutationType("m1", 0.5, "f", 0.0);

// g1 genomic element type: uses m1 for all mutations

initializeGenomicElementType("g1", m1, 1.0);

// chromosome of length 1 Mb with uniform gene conversion

initializeGenomicElement(g1, 0, 999999);

initializeRecombinationRate(1e-7);

initializeGeneConversion(1.0,500,1.0);

}

recombination() {

n_new_breakpoints=rpois(1,2*1e-7*1e6);

new_breakpoints=asInteger(runif(n_new_breakpoints,0,999999));

breakpoints=sort(new_breakpoints);

//breakpoints=sort(breakpoints);

return T;

}

// create a population of 1000 individuals

1 early() {

sim.addSubpop("p1", 1000);

}

//end

10000 late() {

sim.simulationFinished();

}


The frequencies of the positions are very unlinked.
Screenshot 2025-02-24 at 14.59.54.png


Additionally, I would like to know (i) if there is a possibility to know (in the case of having a proportion of crossing-overs and gene conversion events) what position suffers gene conversion and what suffers recombination, and (ii) if there is a possibility to assign to a position a gene conversion and not a recombination event (or the inverse).

Best,
Sebastian

Ben Haller

unread,
Feb 24, 2025, 9:33:37 AM2/24/25
to Sebastian Ramos-Onsins, 'Ben Haller' via slim-discuss
Hi Sebastian!

OK, just to give a bit of background, section 26.6 (the reference section for recombination() callbacks) states:

In the present design, the callback receives “crossover breakpoints” information only, in the breakpoints pseudo-parameter; it receives no information about gene conversion. However, recombination() callbacks can still be used with the “DSB” recombination model; at the point when the callback is called, the pattern of gene conversion tracts will have been simplified down to a vector of crossover breakpoints. “Complex” gene conversion tracts, however, involving heteroduplex mismatch repair, are not compatible with recombination() callbacks, since there is presently no way for them to be specified to the callback.

I'm not exactly sure what you're trying to achieve – I had a bit of trouble following your email, perhaps because I'm presently sick and running a bit of a fever (!).  But the first model you provide uses gene conversion, and the recombination() callback makes no changes to the breakpoints vector, so you get a pattern of recombination that looks like what you'd expect from gene conversion.  The second model you provide replaces all of the breakpoints that SLiM drew with new breakpoints, and so you get a pattern of recombination that looks like what you'd expect without gene conversion – whatever SLiM might have done with gene conversion, you replace it with your own recombination breakpoints, effectively disabling SLiM's efforts.  That's not surprising, right?  So I'm a bit puzzled as to why you're puzzled, I guess :->.

Regarding your questions (i) and (ii), no, there is no way to get or modify this information in the present design.  (Feel free to open a new GitHub issue, but the reason this information is not exposed in recombination() callbacks right now is that I couldn't think of a good, clean design for doing so; so if you do open an issue on this, please don't just ask for the feature, but also propose a concrete design for how you think it ought to work.)

However, you are always free to just generate whatever breakpoints you want, with whatever algorithm you want, in your recombination() callback.  If you want more control over gene conversion than SLiM provides, you can implement whatever gene conversion algorithm you want in your recombination() callback.  The only thing you won't be able to get, with your own custom gene conversion algorithm, is complex gene conversion tracts with heteroduplex mismatch repair.  For that, you have to use SLiM's built-in gene conversion algorithm; but in your example models it looks like you're not trying to use that feature anyway.  If you want to replicate SLiM's gene conversion algorithm, but with your own custom modifications, the algorithm is described in pretty complete detail in section 1.5.6.

I hope this helps; happy modeling!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Sebastian Ramos-Onsins wrote on 2/24/25 8:11 AM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/21B3FC08-D8B6-4114-82D7-0945344E070C%40gmail.com.

Sebastian Ramos-Onsins

unread,
Feb 24, 2025, 10:39:59 AM2/24/25
to 'Ben Haller' via slim-discuss

Thank you very much! I hope you have a fast recovery!

Simplifying:

1. The recombination() callback only provides new crossover breakpoints. Once the callback finishes, any new breakpoints that were not initially defined by SLiM will result in crossover events.

2. I don’t fully understand how to implement the following: "If you want more control over gene conversion than SLiM provides, you can implement any gene conversion algorithm you want in your recombination() callback."

Could you provide a small example or some help? It is unclear to me how this callback interacts with SLiM's functions. It seems that the only modifiable value inside recombination() is breakpoints. I don’t see a way to implement gene conversion within it. For example, functions like removeMutations() or addMutations(), which could be useful for transferring mutations between strands, are not allowed inside the recombination() callback.

Thanks!

Sebastian


<image.png>
<image.png>


Additionally, I would like to know (i) if there is a possibility to know (in the case of having a proportion of crossing-overs and gene conversion events) what position suffers gene conversion and what suffers recombination, and (ii) if there is a possibility to assign to a position a gene conversion and not a recombination event (or the inverse).

Best,
Sebastian

Ben Haller

unread,
Feb 24, 2025, 11:06:12 AM2/24/25
to Sebastian Ramos-Onsins, 'Ben Haller' via slim-discuss
Hi Sebastian!

Apart from complex gene conversion tracts, which as I noted before cannot be implemented by the user with a recombination() callback, the DSB model – SLiM's "gene conversion" model – reduces down to a particular way of drawing crossover breakpoints.  Those crossover breakpoints are then used to recombine the two parental genomes.  The way in which that works is described in detail in section 1.5.6.  It shows an example wherein the recombination rate map is used to draw initial breakpoints; it says:

As in the “crossover breakpoints” model, in the “DSB” model when SLiM is generating a gametic haplosome it copies from a “copy strand”, one of the two homologous parental strands.  Similarly, too, breakpoints are chosen based upon the rate or rate map; in a hypothetical example that we will follow through the recombination process, the breakpoints might look like this:

Then is described how each of these breakpoints – places where a double-stranded break or DSB occurred – is turned into a gene conversion tract by drawing tract lengths before and after the DSB point:

A gene conversion tract is constructed around each DSB by drawing an extent for the tract to the left of the DSB, and independently drawing an extent for the tract to the right of the DSB.  These two draws are each taken from a geometric distribution with mean λ/2; the expected length of the gene conversion tract as a whole, then, is λ (where λ is a model parameter), but the length distribution is not geometric (it is negative binomial, in fact), and the tract may not be centered on the DSB.  The gene conversion tracts around the DSBs shown above might look like this:

And then a new set of crossover breakpoints is generated that reflects those gene conversion tracts, producing the correct pattern of recombination that reflects the presence of the tracts, as shown:

Continuing our example, then, we can now actually construct the resulting gametic haplosome, which looks like this:...Note that the first DSB, which was designated as a crossover, switched the copy strand at its beginning and did not switch it back.  The other two DSBs, designated as non-crossovers, switched the copy strand at both their beginning and their end.

That section provides a complete algorithmic description of how SLiM's gene conversion facility (except for complex gene conversion tracts with heteroduplex mismatch repair) is implemented based upon the more fundamental crossover-breakpoints model.  If you wish to replace that algorithm, you will need to replicate it in Eidos code in a recombination() callback.  The model code you posted already takes the first step towards doing that, by generating its own set of DSB points using rpois() and runif().  Next you would decide, for each of those DSB points, what type of gene conversion tract forms around that DSB point, and draw the lengths of the tract to the left and the right, and check for collisions between the tracts, and so forth – precisely the algorithm described in section 1.5.6.  Once you have that working, you can then modify the algorithm in whatever way you wish.

For example, suppose your initial rpois()/runif() calls generates a single DSB somewhere in the middle.  You then decide, probabilistically, that that DSB is a simple non-crossover gene conversion tract, like the middle tract shown in section 1.5.6.  You would therefore then draw a length to the left and generate a crossover breakpoint there, and draw a length to the right and generate a crossover breakpoint there, resulting in a short tract that switches the copy strand and then switches it back – a simple non-crosser gene conversion tract.  You'd return that new vector of crossover breakpoints to SLiM, and SLiM would do what you told it, resulting in the pattern of gene conversion you specified.  Again, you can follow this logic through section 1.5.6; I can't explain it any better than I did when I wrote the manual.

I can't really help you any more than that without doing all of the work for you, I think.  You need to read section 1.5.6 very carefully and fully understand every step that it describes, and then implement that algorithm – or whatever algorithm you want! – in your own recombination() callback.  It won't be particularly easy – the code for SLiM's DSB model is somewhat complex – but since it is your script, you can write whatever code you want to write, and make it do whatever you want it to do.  Happy modeling!


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University



Sebastian Ramos-Onsins wrote on 2/24/25 9:39 AM:

Sebastian Ramos-Onsins

unread,
Feb 24, 2025, 11:58:53 AM2/24/25
to 'Ben Haller' via slim-discuss
Thanks Ben,

Well. I know how to do a gene conversion function (that is, outside the recombination() callback). Here you have the function that I did -simplified by only doing gene conversion from right to left and including a threshold to avoid overlapping-. The function works. 

The problem is the speed (is too slow). That’s why I was asking for using another way (from the manual I thought that my code was able to be included inside the recombination() callback). The function I did is slow because the calls to assign the individual and the mutations that have to be replaced are very slow.

Sorry for the missunderstanding.

Sebastian

// set up a simple neutral simulation

initialize() {

setSeed(123456);

defineConstant("L",1e6);

defineConstant("mut_rate",1e-6);

defineConstant("gc_rate",1e-7);

defineConstant("gc_rate_genome",gc_rate*L);

defineConstant("mean_len_gc",500);

defineConstant("Ne1",500);

initializeSLiMOptions(nucleotideBased=T);

initializeAncestralNucleotides(randomNucleotides(L));


// m1 mutation type: neutral

initializeMutationTypeNuc("m1", 0.5, "f", 0.0);

// g1 genomic element type: uses m1 for all mutations

initializeGenomicElementType("g1", m1, 1.0, mmJukesCantor(mut_rate/3));

// uniform chromosome of length 100 kb with uniform recombination

initializeGenomicElement(g1, 0, L-1);

//rec map separated in three regions: high-no-high recombination

rates = c(1e-7,0,1e-7);

ends = c(25e4,75e4,10e5-1);

initializeRecombinationRate(rates, ends);

initializeGeneConversion(0,0,1.0);


//writeFile(filePath="breakpoints_v4.txt",contents=(""),append=F);

//writeFile(filePath="gene_conversion_v4.txt", contents=(""), append=F);

}


function (void)do_uniform_gene_conversion(numeric Nep) {

//calculate the number of possible gene conversion events per generation

num_gc_events = rpois(Nep,2*gc_rate_genome); //number of simple gc events at each individual

ngc_no0=which(num_gc_events>0); //Look only at individuals that have gc events

//look at gene conversion per individual

for(i in ngc_no0) {

// code to check that gc positions are not overlapping (assuming threshold 10x mean len)

if(num_gc_events[i]>1) {

ml=0;

do {

pos_gc_events_sort = sort( asInteger(runif(num_gc_events[i],0,L))); // sorted positions of gc/individual

x=0;

ml = pos_gc_events_sort[x+1]-pos_gc_events_sort[x] <= mean_len_gc*10; //1 means overlapped

while(x<num_gc_events[i]-1 & ml == 0) {

ml = pos_gc_events_sort[x+1]-pos_gc_events_sort[x] <= mean_len_gc*10 ;

x=x+1;

}

}

while(ml==1);

}

else pos_gc_events_sort = sort(asInteger(runif(num_gc_events[i],0,L))); // sorted positions of gc/individual

//define length of gc events (simply one step, only forward), define donor and receptor genomes

len_gc = rgeom(num_gc_events[i],1/mean_len_gc);//lengths gc events

dc = asInteger(rbinom(num_gc_events[i],1,0.5)); //donor chain for each gc event

ndc=1-dc; //receptor

//writeFile(filePath="gene_conversion_v4.txt", contents=(paste(asInteger(pos_gc_events_sort),sep="\n")),append=T);


//replace mutations from one chain to another. Look at each gc event (and per individual)

//This is many times slower than recombination() callback!

//But, how to separate rec from gc using recombination()?

for( x in 0:(num_gc_events[i]-1)) {

//find donor and receptor, mutations and positions:

donor=sim.subpopulations.individuals[i].genomes[dc[x]]; //very slow!

receptor=sim.subpopulations.individuals[i].genomes[ndc[x]]; //very slow!

mut_donor=donor.mutations;

mut_receptor=receptor.mutations;

mut_donor_position=mut_donor.position;

mut_receptor_position=mut_receptor.position;

init_position=pos_gc_events_sort[x];

max_position=pos_gc_events_sort[x] + len_gc[x];

//positions of mutations implicated in gc from donor //very slow!

mut_gc_donor = mut_donor[mut_donor_position>=init_position & mut_donor_position < max_position];

//positions of mutations implicated in gc from receptor //very slow!

mut_gc_receptor = mut_receptor[mut_receptor_position >=init_position & mut_receptor_position < max_position];

if(length(mut_gc_receptor)) {

receptor.removeMutations(mut_gc_receptor);

}

if(length(mut_gc_donor)) {

receptor.addMutations(mut_gc_donor);

}

}

}

}


//recombination() {

// breakpoints=sort(c(breakpoints));

// writeFile(filePath="breakpoints_v4.txt", contents=(paste(asInteger(breakpoints),sep="\n")),append=T);

// return T;

//}


// create a population of Ne1 individuals

1 early() {

sim.addSubpop("p1", Ne1);

}

// add ONLY uniform gene conversion and not recombination

1:2000 early() {

do_uniform_gene_conversion(Ne1);

}

2000 late() {

sim.simulationFinished();

}


Ben Haller

unread,
Feb 24, 2025, 12:13:40 PM2/24/25
to Sebastian Ramos-Onsins, 'Ben Haller' via slim-discuss
Hi Sebastian,

Yes – I just told you how to do it in a recombination() callback, and yes, it would be faster (and would be compatible with tree-sequence recording, and other benefits).  You do not need to add/remove mutations yourself.  All you need to do is write a recombination() callback that returns the appropriate set of crossover breakpoints to implement the gene conversion tracts that you have drawn.  SLiM does the stitching together of the parental genomes for you.  As I wrote, the existing model code that you already posted does the first step.  You just need to return a *different* set of crossover breakpoints that represent the pattern of recombination that you want, based upon the gene conversion tracts that you have drawn.  Again, read section 1.5.6 for a complete description of such an algorithm.  If you implement that algorithm in your recombination() callback, you will have your own customizable gene conversion algorithm in SLiM, and it will be fast (or much faster than the approach you've posted just now, anyway).  I'm sorry, I don't know how to explain this any other way.  The script in your recombination() callback is *your* script.  Write it to do whatever you want it to do, by returning the crossover breakpoints you want, including (but not limited to) crossover breakpoints that represent gene conversion tracts!  :->


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Sebastian Ramos-Onsins wrote on 2/24/25 10:58 AM:

Sebastian Ramos-Onsins

unread,
Feb 24, 2025, 12:23:49 PM2/24/25
to 'Ben Haller' via slim-discuss
Ah! OK!, thanks!!
Sebastian
Reply all
Reply to author
Forward
0 new messages