HELP! Simulating Inbreeding and Evolution with Full Chromosome Data

58 views
Skip to first unread message

Francisco Ceballos

unread,
Feb 27, 2025, 10:44:11 AMFeb 27
to slim-discuss

I am simulating long-term inbreeding using SLiM’s non-Wright-Fisher model. The goal is to track the effects of extreme inbreeding over 100 generations and generate complete chromosome 1 sequences in VCF format for downstream analysis in PLINK.

Simulation Design
  • Step 1: Initialize a nonWF model, enable tree sequence recording, and define chromosome 1 (248,956,422 bp).
  • Step 2: Create two outbred siblings (founders) with neutral mutations at distinct locations.
  • Step 3: Force sibling mating to produce an inbred descendant.
  • Step 4: Introduce the inbred descendant into three new populations (p3 = 10, p4 = 100, p5 = 1000 individuals).
  • Step 5: Allow random mating while capping population sizes.
  • Step 6: Every 25 generations, export VCF files from tree sequences for PLINK analysis.
Issues Encountered
  1. Mutations and Positioning Issues

    • SLiM was throwing errors related to float vs integer positions when defining mutations.
    • Fixed using asInteger() to ensure valid mutation positions.
  2. Tree Sequence to VCF Conversion Issues

    • The VCF export process required Python’s tskit, but position transformation was incorrect.
    • Fixed by applying np.maximum(1, x.astype(int)) to ensure valid genomic coordinates.
  3. Reproduction and Population Structure Errors

    • Errors due to incorrect population indexing (subscript -1 out of range).
    • Fixed by properly referencing POPULATION_IDS[i].

Here you can find the script:

// Step 1: Initialize the model and parameters
initialize() {
    initializeSLiMModelType("nonWF"); // Non-Wright-Fisher model
    initializeTreeSeq(); // Enable tree sequence recording

    defineConstant("VCF_INTERVAL", 25); // Generation interval for exporting VCF
    defineConstant("TOTAL_GENERATIONS", 100); // Total number of generations
    defineConstant("CHROMOSOME_LENGTH", 248956422); // Length of human chromosome 1 in bp

    // Define population names and sizes
    defineConstant("POPULATION_IDS", c("p3", "p4", "p5"));
    defineConstant("POPULATION_SIZES", c(10, 100, 1000));

    initializeMutationRate(1e-8); // Mutation rate
    initializeMutationType("m1", 0.5, "f", 0); // Neutral mutations
    initializeGenomicElementType("g1", m1, 1.0);
    initializeGenomicElement(g1, 0, CHROMOSOME_LENGTH - 1); // Define chromosome 1
    initializeRecombinationRate(1e-8); // Recombination rate
}

// Step 2: Create two sibling individuals without inbreeding
1 early() {
    sim.addSubpop("p1", 2); // Create a population with two individuals (siblings)

    // Assign mutations at specific positions (ensuring they are integers)
    p1.individuals[0].genomes[0].addNewMutation(m1, 1, 0);
    p1.individuals[0].genomes[1].addNewMutation(m1, asInteger(CHROMOSOME_LENGTH / 4) + 1, 0);
    p1.individuals[1].genomes[0].addNewMutation(m1, asInteger(CHROMOSOME_LENGTH / 2) + 1, 0);
    p1.individuals[1].genomes[1].addNewMutation(m1, asInteger((3 * CHROMOSOME_LENGTH) / 4) + 1, 0);

    // Create a subpopulation for inbred descendants
    sim.addSubpop("p2", 0);
}

// Step 3: Generate the inbred descendant through forced reproduction
2:2 reproduction(p1) {
    inds = p1.individuals;
    brother1 = inds[0];
    brother2 = inds[1];

    // Mate the siblings and add their descendant to subpopulation p2
    p2.addCrossed(brother1, brother2);
}

// Step 4: Create populations of different sizes and migrate the inbred descendant
3 early() {
    for (i in 0:(size(POPULATION_IDS) - 1)) {
        pop_name = POPULATION_IDS[i];
        pop_size = POPULATION_SIZES[i];
        sim.addSubpop(pop_name, pop_size);

        // Transfer the inbred descendant if it exists
        if (p2.individualCount > 0) {
            sim.subpopulations[i + 2].takeMigrants(p2.individuals);
        }
    }
}

// Step 5: Control reproduction to prevent excessive growth
4:TOTAL_GENERATIONS reproduction() {
    for (i in 0:(size(POPULATION_IDS) - 1)) {
        pop = sim.subpopulations[i + 2]; // p3, p4, p5
        target_size = POPULATION_SIZES[i];

        // If the population has reached its target size, stop reproduction
        if (pop.individualCount >= target_size) {
            next;
        }

        inds = pop.individuals;
        num_offspring = min(asInteger(pop.individualCount / 2), target_size - pop.individualCount); // Do not exceed the limit

        for (j in seqLen(num_offspring)) {
            parent1 = inds[asInteger(runif(1, 0, size(inds)))];
            parent2 = inds[asInteger(runif(1, 0, size(inds)))];

            if (parent1 != parent2) {
                pop.addCrossed(parent1, parent2);
            }
        }
    }
}

// Step 6: Export VCF and clean up mutations
4:TOTAL_GENERATIONS late() {
    // Clean up unnecessary mutations every 10 generations
    if (sim.cycle % 10 == 0) {
        sim.treeSeqSimplify();
    }

    // Export VCF every 25 generations ensuring all positions are valid
    if (sim.cycle % VCF_INTERVAL == 0) {
        for (pop in sim.subpopulations) {
            if (pop.individualCount > 0) {
                filename = "output_pop" + pop.id + "_gen" + sim.cycle + ".trees";
                sim.treeSeqOutput(filename);

                // Create a Python script in a single line
                python_script = "convert_to_vcf.py";
                script_content = "import tskit, numpy as np\n"
                               + "ts = tskit.load('" + filename + "')\n"
                               + "with open('" + filename + ".vcf', 'w') as vcf_file:\n"
                               + "    ts.write_vcf(vcf_file, position_transform=lambda x: max(1, int(x)))\n";

                // Save the script to a file in the system
                writeFile(python_script, script_content);

                // Execute the script
                system("python3 " + python_script);
            }
        }
    }
}

Could you help me to fix it????

Francisco Ceballos

unread,
Feb 27, 2025, 10:51:36 AMFeb 27
to slim-discuss
When I run the above script I get this error:

File "/home/rembukai/BIOSOFT/SIMULATIONS/convert_to_vcf.py", 
line 4, in <lambda> 
ts.write_vcf(vcf_file, position_transform=lambda x: max(1, int(x))) 
TypeError: only length-1 arrays can be converted to Python scalars Traceback (most recent call last): 

File "/home/rembukai/BIOSOFT/SIMULATIONS/convert_to_vcf.py", 
line 4, in <module> ts.write_vcf(vcf_file, position_transform=lambda x: max(1, int(x))) 

File "/home/rembukai/miniconda3/lib/python3.12/site-packages/tskit/trees.py", 
line 6427, in write_vcf writer = vcf.VcfWriter()

 File "/home/rembukai/miniconda3/lib/python3.12/site-packages/tskit/vcf.py", 
line 87, in __init__ position_transform(tree_sequence.tables.sites.position), dtype=int

Ben Haller

unread,
Feb 27, 2025, 12:52:43 PMFeb 27
to Francisco Ceballos, slim-discuss
Hi Francisco!

In your first email, you list three issues, all of which you say are fixed.  Commenting briefly on those:

1) Yes, positions along the chromosome in SLiM must be integers.  You're getting a float because you're using the / operator, which produces a float rather than an integer.  Using asInteger() is fine; a cleaner design would be to use the integerDiv() function to do the division in the first place.  It is documented in the Eidos manual.

2) See below.

3) Sounds like you just had a bug and have fixed it; nothing for me to comment on?

In your second email, you provide some errors from the tskit function write_vcf().  I have no idea what might be going on there; I don't work on tskit, just SLiM :->.  I'd recommend that you file an issue with them on GitHub, I suppose.  This applies also to (2) in your first email.

But more broadly: I wonder whether you need to be using tree-sequence recording and tskit at all?  It sounds like the final output you want is VCF for PLINK.  SLiM can output VCF directly; why not just do that?  See the methods outputVCF() and outputVCFSample().  That would greatly simplify your model, and it would run much faster, too.

Let me know if you have other questions.  I didn't look closely at your script, since it seemed like what I've written above addressed the questions you raised.  Happy modeling!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University




Francisco Ceballos wrote on 2/27/25 9:51 AM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/40903448-d143-436f-ae8c-e28bac4fa06bn%40googlegroups.com.

Francisco Ceballos

unread,
Feb 27, 2025, 2:11:48 PMFeb 27
to Ben Haller, slim-discuss
Hi Ben,
Thank you so much for your help—I really appreciate it!
I wanted to provide details on the issues I encountered and how I attempted to fix them, as I'm not entirely sure if my solutions are correct. Despite my efforts, I'm still having trouble running the script and can't get it to execute properly.
Regarding the VCF output, I used outputVCFSample() to generate VCF files for the consanguineous individual and all its descendants (but not the entire population). However, I'm encountering the following error:

// RunInitializeCallbacks():
initializeSLiMModelType(modelType = 'nonWF');
initializeTreeSeq();
initializeMutationRate(1e-08);
initializeMutationType(1, 0.5, "f", 0);
initializeGenomicElementType(1, m1, 1);
initializeGenomicElement(g1, 0, 248956421);
initializeRecombinationRate(1e-08);
#WARNING (Species::RunInitializeCallbacks): with tree-sequence recording enabled and a non-zero mutation rate, a neutral mutation type was defined and used; this is legal, but usually undesirable, since neutral mutations can be overlaid later using the tree-sequence information.

// Starting run at tick <start>:
1

ERROR (EidosInterpreter::Evaluate_Call): method outputVCFSample() is not defined on object element type Species.

Error on script line 70, character 16:

            sim.outputVCFSample(filename, pop.individuals);


Any insights you could share would be greatly appreciated!
Best,
Francisco.



From: Ben Haller <bha...@mac.com>
Sent: Thursday, February 27, 2025 18:52
To: Francisco Ceballos <ceballo...@gmail.com>
Cc: slim-discuss <slim-d...@googlegroups.com>
Subject: Re: HELP! Simulating Inbreeding and Evolution with Full Chromosome Data
 
                               + "with open('" + filename + ".vcf', 'w') as vcf_[file:\n]file:\n"

                               + "    ts.write_vcf(vcf_file, position_transform=lambda x: max(1, int(x)))\n";

                // Save the script to a file in the system
                writeFile(python_script, script_content);

                // Execute the script
                system("python3 " + python_script);
            }
        }
    }
}

Could you help me to fix it????

Ben Haller

unread,
Feb 27, 2025, 2:20:54 PMFeb 27
to Francisco Ceballos, slim-discuss
Hi Francisco!

Yes, outputVCFSample() is a method defined on Subpopulation, and outputVCF() is defined on Genome, neither is a method of Species.  See the documentation in the manual (or the online help in SLiMgui) for more details.  There are plenty of examples of the usage of these methods, or of closely related methods like outputSample(), outputMS(), etc.

Perhaps it's time to take a step back and ask: have you done the SLiM Workshop yet?  It's the best way to learn basic, foundational stuff like this for SLiM, and it's available for free online (and is offered in person from time to time, as well).  If you haven't done it yet, I'd recommend that you do it now; it will save you a lot of struggle with stuff like this.


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Francisco Ceballos wrote on 2/27/25 1:11 PM:

Francisco Ceballos

unread,
Feb 27, 2025, 2:24:10 PMFeb 27
to Ben Haller, slim-discuss

Hi Ben,

Thank you very much! I’ll take a step back and revisit the basics as you suggested. To be honest, I didn’t attend the SLiM workshop, so that’s probably a good place to start. I really appreciate your advice!

Best regards,
Francisco

Francisco Ceballos

unread,
Mar 10, 2025, 1:45:37 PMMar 10
to slim-discuss

Dear Ben,

Thank you for all your suggestions! I thoroughly reviewed the SLiM materials and successfully completed the script.

However, I’ve encountered a problem that I haven’t been able to resolve on my own. I managed to fix all previous issues, but now, when I try to run the simulation, it gets killed due to insufficient memory (a typical OOM error). I have allocated 48 GB of RAM and 32 GB of swap to my WSL2 environment, but it’s still not enough.

I am simulating 300K SNPs on chromosome 1 over 100 generations, and I believe I need to make my script more memory-efficient.

Could you provide any advice on optimizing it? Here is my script:

// Step 1: Initialize the model and parameters
initialize() {
    initializeSLiMModelType("nonWF");

    defineConstant("VCF_INTERVAL", 50);
    defineConstant("TOTAL_GENERATIONS", 100);
    defineConstant("CHROMOSOME_LENGTH", 248956422);
    defineConstant("NUM_INITIAL_SNPS", 300000);


    // Define population names and sizes
    defineConstant("POPULATION_IDS", c("p3", "p4", "p5"));

    defineConstant("POPULATION_SIZES", c(10));
    // Mutation and recombination settings
    initializeMutationRate(1e-8);

    initializeMutationType("m1", 0.5, "f", 0);

    initializeGenomicElementType("g1", m1, 1.0);
    initializeGenomicElement(g1, 0, CHROMOSOME_LENGTH - 1);

    initializeRecombinationRate(1e-8);
}
// Step 2: Create the initial population and introduce 300K SNPs efficiently


1 early() {
    sim.addSubpop("p1", 2); // Create a population with two individuals (siblings)

    positions = asInteger(runif(NUM_INITIAL_SNPS, 1, CHROMOSOME_LENGTH));
    for (pos in positions) {
    p1.individuals.genomes.addNewMutation(m1, pos, 0);


    }
    // Create a subpopulation for inbred descendants
    sim.addSubpop("p2", 0);
}

// Step 3: Generate the inbred descendant through forced reproduction
2:2 reproduction(p1) {
    inds = p1.individuals;
    brother1 = inds[0];
    brother2 = inds[1];
    // Mate the siblings and add their descendant to subpopulation p2
    p2.addCrossed(brother1, brother2);
}

// Step 4: Create populations of different sizes and migrate the inbred descendant
3 early() {
    for (i in 0:(size(POPULATION_IDS) - 1)) {
        pop_name = POPULATION_IDS[i];
        pop_size = POPULATION_SIZES[i];
        sim.addSubpop(pop_name, pop_size);

        // Transfer only a limited number of inbred descendants
        if (p2.individualCount > 0) {
            migrating_inds = sample(p2.individuals, min(10, p2.individualCount), replace=F);
            sim.subpopulations[i + 2].takeMigrants(migrating_inds);


        }
    }
}

// Step 5: Control reproduction to prevent excessive growth
4:TOTAL_GENERATIONS reproduction() {
    for (i in 0:(size(POPULATION_IDS) - 1)) {
        pop = sim.subpopulations[i + 2]; // p3, p4, p5
        target_size = POPULATION_SIZES[i];
        // If the population has reached its target size, stop reproduction
        if (pop.individualCount >= target_size) {
            next;
        }
        inds = pop.individuals;
        num_offspring = min(asInteger(pop.individualCount / 2), target_size - pop.individualCount);

        for (j in seqLen(num_offspring)) {
            parent1 = inds[asInteger(runif(1, 0, size(inds)))];
            parent2 = inds[asInteger(runif(1, 0, size(inds)))];
            if (parent1 != parent2) {
                pop.addCrossed(parent1, parent2);
            }
        }
    }
}

// Step 6: Efficient VCF Export (single file)
4:TOTAL_GENERATIONS late() {
    // Export VCF every 100 generations


    if (sim.cycle % VCF_INTERVAL == 0) {

        all_inds = c();
        for (pop in sim.subpopulations) {
            all_inds = c(all_inds, pop.individuals);
        }
        // Single file export for all individuals
        sim.outputVCF(filePath="/home/rembukai/BIOSOFT/SIMULATIONS/VCF/combined_output.vcf");
    }
}

Best regards,

Francisco.

Ben Haller

unread,
Mar 10, 2025, 2:18:35 PMMar 10
to Francisco Ceballos, slim-discuss
Hi Francisco!  I'm happy to hear of your progress!

Looking at your script, a side note is that you should use the rdunif() function instead of asInteger(runif()).  :->

Also, instead of "2:2 reproduction()" you can simply write "2 reproduction()" to schedule that callback only in tick 2.

But the main issue, and the reason your script takes so long to run, is that you're using the "big bang" reproduction strategy to reproduce the entire population in a single callout to your reproduction() callback, but you're not doing "self.active = 0;" at the end to tell SLiM that that's what you're doing.  So it's calling out to your reproduction() callback for each individual in the population, and for each individual in the population, you're reproducing the entire population.  For a population size of N, rather than making N offspring you're making N*N offspring.  And so your simulation is taking a very long time!

The right way to do "big bang" reproduction is shown in section 15.3 of the current manual, with a long explanation of what exactly the "self.active = 0;" line does.  But this is such a common point of confusion (you are FAR from the first!) that for the next version of SLiM I've written a new subsection for the manual that is only about big bang reproduction, and explains it in even more detail.  If you'd like me to, I can send you a draft PDF of that subsection off-list.  (Actually, I'd really appreciate someone giving it a read-through to tell me if it makes sense or not, if you're willing!)

If you fix that issue, I suspect your script will work much better, although I stopped reading it when I saw this error, since it is clearly the problem.  :->  Happy modeling!


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Francisco Ceballos wrote on 3/10/25 11:45 AM:

Francisco Ceballos

unread,
Mar 10, 2025, 6:19:28 PMMar 10
to Ben Haller, slim-discuss
Dear Ben,
Thank you so much for your invaluable help, I truly appreciate it!
I will carefully study all your comments.
I'm really excited about the new SLiM version and would love to read the updated manual. However, I may not be able to provide feedback right away, as I’m currently swamped with work.
Thanks again for your support!
Best
Francisco.




From: Ben Haller <bha...@mac.com>
Sent: Monday, March 10, 2025 19:18
Reply all
Reply to author
Forward
0 new messages