Questions on treeseq and application on genome with different genomic element types

32 views
Skip to first unread message

Rose Huang

unread,
Mar 12, 2025, 2:22:48 AM3/12/25
to slim-discuss
Hi slim developers and users, 

I am wondering how to implement tree-seq when I have specific setting of genomic elements and ratios of neutral vs deleterious mutations for my simulations (as in recipe 7.3 with intron and exon). In 17.3, the recipe provided way to only simulate non-neutral mutations, but there is only one genomic element type with a single type of mutation. From my shallow understanding is that tree-seq can reduce burn-in runtime by avoiding the simulation of neutral mutations, but it seems impossible (to me as a beginner in slim) to generate the non-coding g3 or intron g2 without defining and simulating neutral m1 based on the recipe 7.3.

Here is my simulation design for the background of my question:
• A simple bottlenecked WF model to study the ROH pattern of populations in different time snaps.
• The genome consists only neutral and slightly deleterious mutations. 
• The genome was build to mimic the chromosome organization (non-coding region, exon-intron pair) as in recipe 7.3.
• A massive burn-in period to build the pre-bottleneck population. Here is where I want the tree-seq to come in if I understand this function correctly.

Thank you for your time and help!

Best
Rose

Ben Haller

unread,
Mar 12, 2025, 10:03:45 AM3/12/25
to Rose Huang, slim-discuss
Hi Rose!

You can remove neutral mutations even from a model with a complex genetic structure like recipe 7.3.  Basically, for each genomic element type, you (1) calculate the fraction of the mutations that are neutral, (2) remove the neutral mutation component from that mutation type, and (3) construct a mutation rate map, following the genomic element structure, that compensates for the removal of the neutral mutations.

A simple example: suppose you gave three genomic element types, g1, g2, and g3.  g1 is non-coding – 100% neutral mutations.  g2 is 50% neutral mutations (and 50% something else; it doesn't matter what, for the purposes of the example).  g3 is 0% neutral mutations (100% something else).  That's step 1.  Step 2: recode the model without the neutral mutations.  So now g1 is removed from the model entirely (or unused), whereas g2 is recoded to have 100% non-neutral mutations, removing its neutral component, and g3 is unchanged.  Step 3: adjust the mutation rate map.  So suppose you initially had a constant mutation rate of 2e-8 across the whole chromosome, and you had a genomic element structure like this:

[g1][g2][g3][g2][g1]

I'm leaving out the lengths of the genomic elements, but they have lengths, of course, and those lengths can be whatever you want them to be.  So originally, your mutation rate map was constant, 2e-8, so for each of those genomic elements it looked like:

[2e-8][2e-8][2e-8][2e-8][2e-8]

Yes?  And now we need to make a new mutation rate map that produces the same mutation rate for the non-neutral mutations as before, compensating for the fact that we have removed neutral mutations from the model.  That mutation rate map would look like:

[0.0][1e-8][2e-8][1e-8][0.0]

In the g1 regions we now use a mutation rate of 0.0, since there are no non-neutral mutations to be generated there.  In g2 regions the rate is half what it was (because half of the original rate was for neutral mutations, which we're no longer generating).  In g3 regions the rate is unchanged (because there were no neutral mutations being generated there in the first place).  This should produce the correct simulation dynamics in SLiM.

When you go over to Python to recapitate and then do neutral mutation overlay, note that you will want to use the *opposite* mutation rate map for the overlay step, because at that point you're *only* generating neutral mutations – the opposite of SLiM.  So then, for neutral mutation overlay in Python, you'll want to use a mutation rate map of:

[2e-8][1e-8][0.0][1e-8][2e-8]

where, as before, the length of each region in the mutation rate map follows the lengths of the genomic elements in your model.

You can do this even if the original mutation rate map was not a uniform rate, it just gets more complicated.  :->  I hope this makes sense.  Happy modeling!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Rose Huang wrote on 3/12/25 12:22 AM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/40de5c02-cc30-44b2-a69c-d60c0fb8c319n%40googlegroups.com.

Message has been deleted
Message has been deleted

Curro Campuzano

unread,
Mar 14, 2025, 9:46:09 PM3/14/25
to slim-discuss
Hi, 
I think I did something similar at some point. Here you have some example code for recipe 7.3. It's not double-tested, but perhaps it's a useful starting point for you. 

In the SLiM code, I would suggest keeping track of the mutation rates (as suggested by Ben) while defining the genomic regions and encoding them in the metadata of the tree sequence. 
```slim
initialize() {
initializeTreeSeq();
defineConstant("MU", 1e-7);
initializeMutationType("m1", 0.5, "f", 0.0); // neutral
initializeMutationType("m2", 0.1, "g", -0.03, 0.2); // deleterious
initializeMutationType("m3", 0.8, "e", 0.1); // beneficial
//initializeGenomicElementType("g1", c(m1,m2,m3), c(2,8,0.1)); // exon
initializeGenomicElementType("g1", c(m2,m3), c(8,0.1)); // exon
defineConstant("MU_G1", MU * sum(c(8,0.1)) / sum(c(2,8,0.1)));
//initializeGenomicElementType("g2", c(m1,m2), c(9,1)); // intron
initializeGenomicElementType("g2", c(m2), 1); // intron
defineConstant("MU_G2", MU * 9 / sum(c(9, 1)));
initializeGenomicElementType("g3", c(m1), 1); // non-coding
defineConstant("MU_G3", 0.0);
// Generate random genes along an approximately 100000-base chromosome
base = 0;
ends = c();
mutRates = c();
while (base < 100000) {
// make a non-coding region
nc_length = rdunif(1, 100, 5000);
pos = base + nc_length - 1;
ends = c(ends, pos);
mutRates = c(mutRates, MU_G3);
initializeGenomicElement(g3, base, pos);
base = base + nc_length;
// make first exon
ex_length = asInteger(rlnorm(1, log(50), log(2))) + 1;
pos = base + ex_length - 1;
ends = c(ends, pos);
mutRates = c(mutRates, MU_G1);
initializeGenomicElement(g1, base, pos);
base = base + ex_length;
// make additional intron-exon pairs
do
{
in_length = asInteger(rlnorm(1, log(100), log(1.5))) + 10;
pos = base + in_length - 1;
ends = c(ends, pos);
mutRates = c(mutRates, MU_G2);
initializeGenomicElement(g2, base, pos);
base = base + in_length;
ex_length = asInteger(rlnorm(1, log(50), log(2))) + 1;
pos = base + ex_length - 1;
ends = c(ends, pos);
mutRates = c(mutRates, MU_G1);
initializeGenomicElement(g1, base, pos);
base = base + ex_length;
}
while (runif(1) < 0.8); // 20% probability of stopping
}
// final non-coding region
nc_length = rdunif(1, 100, 5000);
pos = base + nc_length - 1;
ends = c(ends, pos);
mutRates = c(mutRates, MU_G3);
initializeGenomicElement(g3, base, pos);
// single recombination rate
initializeRecombinationRate(1e-8);
// Landscape of mutation rate
defineConstant("POSITIONS", c(0, ends));
defineConstant("RATES", mutRates);
initializeMutationRate(mutRates, ends);
}
1 early() { sim.addSubpop("p1", 5000); }
10000 early() {
sim.treeSeqOutput(
'out.trees',
metadata=Dictionary(
"POSITIONS", POSITIONS,
"RATES", RATES
)
);
}
```

Then, in the python script, you do as usual but with a custom mutation rate map:

```python
import msprime, tskit, pyslim
import numpy as np
orig_ts = tskit.load("out.trees")
ts = pyslim.recapitate(orig_ts,
            recombination_rate=1e-8,
            ancestral_Ne=5000, random_seed=5
)
counts = [0, 0, 0]
for m in ts.mutations():
    for md in m.metadata["mutation_list"]:
        counts[md["mutation_type"]-1] += 1
print("Before overlaying mutations:")
print(
    f"Mutations: neutral={counts[0]}, deleterious={counts[1]}, beneficial={counts[2]}"
)
metadata = ts.metadata["SLiM"]["user_metadata"]
positions = np.array(metadata["POSITIONS"])
positions[-1] += 1
# Here, we want the 'opposite', so we add mutations to regions
# where we did not add mutations before
rates_slim = np.array(metadata["RATES"])
baseline = 1e-7
rates_msprime = baseline - rates_slim
mut_map = msprime.RateMap(position=positions, rate=rates_msprime)
# Overlay neutral mutations
next_id = pyslim.next_slim_mutation_id(ts)
ts = msprime.sim_mutations(
           ts,
           rate=mut_map,
           model=msprime.SLiMMutationModel(type=1, next_id=next_id),
           keep=True,
)
# Print the number of mutations of each type
counts = [0, 0, 0]
for m in ts.mutations():
    for md in m.metadata["mutation_list"]:
        counts[md["mutation_type"]-1] += 1
print("After overlaying mutations:")
print(
    f"Mutations: neutral={counts[0]}, deleterious={counts[1]}, beneficial={counts[2]}"
)
# Locate selected sites
sel_coeffs = np.array([
        sum(md["selection_coeff"] for md in m.metadata["mutation_list"])
        for m in ts.mutations()
])
which_max = np.argmax(sel_coeffs)
m = ts.mutation(which_max)
print(ts.site(m.site))
```

Best,
Curro

Rose Huang

unread,
Mar 14, 2025, 9:46:13 PM3/14/25
to slim-discuss
Thank you so much Ben, this is very helpful and exactly what I need!

Best,
Rose
Reply all
Reply to author
Forward
0 new messages