Using tree sequence file to start a nucleotide-based SLiM simulation

58 views
Skip to first unread message

John Michael Basaca

unread,
Apr 11, 2025, 6:09:34 PMApr 11
to slim-discuss
Hello everyone!

I want to consult regarding using a tree-sequence file of a coalescent simulation to start a nucleotide-based non-neutral SLiM simulation for my PhD research project. I was able to successfully do it with a non-nucleotide-based nonWF simulation using the guide scripts in the SLiM manual but unfortunately, I am not able to figure out how to modify said script so it can work on a nucleotide-based nonWF simulation. I have went over pyslim documentation and I reckon it has something to do with the following:
  • using pyslim.annotate to modify the default "nucleotide_based=False" into True; and 
  • using "pyslim.generate_nucleotides(ts, reference sequence=None, keep=True)" to return a modified tree sequence with nucleotide information.
However, I think there are still missing pieces as I cannot figure out how the tree file can successfully connect to the nucleotide-based simulation, particularly with respect to calling the nucleotide sequence information of the population in the the modified tree sequence file as the "ancestral nucleotide sequence" of the nucleotide-based SLiM simulation. 

I hope you can help me on this matter.

Sincerely,

John Michael Basaca
PhD Biology student, Institute of Biology
University of the Philippines - Diliman

Peter Ralph

unread,
Apr 14, 2025, 11:00:40 AMApr 14
to John Michael Basaca, slim-discuss
Hi, John - good question. There's just one minor thing to do, and you're almost there, but the way to do it is hard to figure out. The additional step is setting the "nucleotide_based" property in top-level metadata.


# generate an msprime simulation
demog = msprime.Demography()
demog.add_population(initial_size=1000)
ts = msprime.sim_ancestry(
            samples=200,
            demography=demog,
            recombination_rate=1e-8,
            sequence_length=1e6,
            random_seed=5)
# annotate, add mutations
ts = pyslim.annotate(ts, model_type="nonWF", tick=1)
ts = msprime.sim_mutations(
                ts, rate=1e-8,
                model=msprime.SLiMMutationModel(type=0),
                random_seed=9
)
# generate nucleotides
ts = pyslim.generate_nucleotides(ts)
# set top-level metadata property
tables = ts.dump_tables()
md = tables.metadata
md['SLiM']['nucleotide_based'] = True
tables.metadata = md
ts = tables.tree_sequence()
# write out
ts.dump("initialize_nonWF_nuc.trees")

and then the following SLiM script:

initialize()
{
   initializeSLiMModelType("nonWF");
   initializeSLiMOptions(nucleotideBased=T);
   // we must generate ancestral nucleotides, but they will be unused,
   // replaced by the reference sequence in the .trees file loaded below
   initializeAncestralNucleotides(randomNucleotides(1e6));
   initializeTreeSeq();
      initializeMutationTypeNuc("m0", 0.5, "f", 0.0);
      initializeGenomicElementType("g1", m0, 1.0, mmJukesCantor(1e-7));
   initializeGenomicElement(g1, 0, 1e6-1);
   initializeRecombinationRate(1e-8);
   defineConstant("K", 1000);
}

reproduction(p1) {
   subpop.addCrossed(individual,
                     subpop.sampleIndividuals(1));
}

1 early() { 
   sim.readFromPopulationFile("initialize_nonWF_nuc.trees");
   catn("Loaded " + length(sim.subpopulations)
        + " populations from a file; now in generation " + sim.cycle);
   catn("Population sizes: " + paste(sim.subpopulations.individualCount));
}

2: early() {
   p0.fitnessScaling = K / p0.individualCount;
}

10 early() {
   catn("Done.");
   sim.simulationFinished();
}

I've noticed this would not have been an issue if `generate_nucleotides`had set that top-level metadata property; I've filed an issue for this: https://github.com/tskit-dev/pyslim/issues/368

Thanks, and happy slimulations!
--peter


From: slim-d...@googlegroups.com <slim-d...@googlegroups.com> on behalf of John Michael Basaca <john....@gmail.com>
Sent: Friday, April 11, 2025 3:09 PM
To: slim-discuss <slim-d...@googlegroups.com>
Subject: Using tree sequence file to start a nucleotide-based SLiM simulation
 
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/2688bf43-8bc2-4378-8819-b6fed04e8749n%40googlegroups.com.
Message has been deleted

Peter Ralph

unread,
Jul 28, 2025, 12:06:02 AMJul 28
to John Michael Basaca, slim-discuss
Ah ha - you've encountered an incompatibility between the way that msprime adds SLiM mutations (which always stacks mutations) and how SLiM outputs VCF files (which, reasonably, does not want stacked mutations). So - the problem mutations are - I'm pretty sure - those added by msprime not SLiM, so modifying the SLiM model set-up won't change anything. There's good reasons msprime does it this way, FYI. I think to resolve this you'd need to do some lower-level adjusting of mutations, either with pyslim or possibly in SLiM. It's possible that writing out the VCF from python instead of SLiM would do what you want (but I'd have to check how that code path deals with stacked mutations).

-peter

From: John Michael Basaca <john....@gmail.com>
Sent: Sunday, July 27, 2025 5:40 PM
To: Peter Ralph <p...@uoregon.edu>
Cc: slim-discuss <slim-d...@googlegroups.com>
Subject: Re: Using tree sequence file to start a nucleotide-based SLiM simulation
 
Thanks for this, Peter! 

Using this script as a guide, I was able to make the simulation work until I tested the code to the actual size of the genome I wanted to simulate (i.e., increased the size from 100 to 5000 independent loci for SNPs) . Particularly, I encountered this code when I saved the output of the simulation as a VCF file: " ERROR (Population::PrintGenomes_VCF): more than one nucleotide-based mutation encountered at the same position (97353851) in the same genome; the nucleotide cannot be called." I tried addressing this problem by specifying "m0.mutationStackPolicy = "l";" in the initialize part of the code or even added the "simplifyNucleotides=T" in the VCFoutput codes but the error still persists. 

I was wondering if there was an alternative way to address the error that I have not previously considered. It might be related to how I coded for the genomic elements to come up with the independent loci but I'm not entirely sure. Here is how I coded for it: "for (index in 0:4999)
initializeGenomicElement(g1, index*25000, index*25000 + 999);"

Let me know if there are any parts of the simulation that you need.

Sincerely,

JM

P.S. I apologize for the late response. Got caught up with my other deliverables so I had to park this one for a while. Thank you again for always entertaining my inquiries!

Message has been deleted

John Michael Basaca

unread,
Jul 28, 2025, 4:27:08 AMJul 28
to slim-discuss
Thank you, Peter! I'm not that well versed in pyslim especially without any script to look into that I can study and make modifications but I did try prompting to generative AI how your suggestion can be done and it gave the code below and suggested to run it after adding the mutations from msprime. 

import tskit

# Remove stacked mutations from the tree sequence
def remove_stacked_mutations(ts):
    tables = ts.dump_tables()
    sites = tables.sites
    mutations = tables.mutations

    new_mutations = tskit.MutationTable()
    seen = dict()  # key = (site, node)

    for i, mut in enumerate(mutations):
        key = (mut.site, mut.node)
        if key not in seen:
            seen[key] = i
            new_mutations.append(mut)
        # else: skip the duplicate mutation

    tables.mutations.clear()
    for mut in new_mutations:
        tables.mutations.append(mut)
   
    return tables.tree_sequence()

# Apply de-stacking
ts = remove_stacked_mutations(ts)

I am not sure how to navigate the codes as it is from tskit but it purportedly scans all mutations in the tree sequence produced, keeps the first mutation (via discarding consecutive mutations) for each genome position per haploid, mimicking the "first-wins" stacking policy in SLiM. I am hesitant to apply such modification without consulting you. Do you think this would be adequate? If not, do you have any further tips that could be helpful for me?

Sincerely,

JM

Peter Ralph

unread,
Jul 28, 2025, 1:50:12 PMJul 28
to John Michael Basaca, slim-discuss
Hm, well, that code looks like it keeps the first mutation *on each branch*. (I assume that you checked that it runs; I haven't.) However, that doesn't fix the problem - that's not what stacked mutations are. If a mutation appears above node A; and then another mutation appears at the same site in a descendant of A, then these two will be stacked, and are each on separate branches. Have a read of
for description of what is going on with stacked mutations.

We're not going to have a good suggestion for what to do, or be able to answer "will this work", without a careful explanation of what you need to do (and why) - for instance, this code just plain removes some mutations, and you ask "Do you think this would be adequate?"  A good question (if a bit vague), but you're the one who can answer that question, based on what it is you're trying to do.

But, my first step if I were you would be to take my previous suggestion and try writing out the tree sequence from SLiM and using the write_vcf method of the tree sequence, and investigating whether that does what you want.

Happy slimulating,
-peter


From: slim-d...@googlegroups.com <slim-d...@googlegroups.com> on behalf of John Michael Basaca <john....@gmail.com>
Sent: Monday, July 28, 2025 1:11 AM
To: slim-discuss <slim-d...@googlegroups.com>

Subject: Re: Using tree sequence file to start a nucleotide-based SLiM simulation

Peter Ralph

unread,
Jul 30, 2025, 12:33:04 PMJul 30
to John Michael Basaca, slim-discuss
Follow-up for the list: offline, we determined that the missing step was applying pyslim.convert_alleles( ), which copies the nucleotide state (stored in metadata) into the ancestral/derived state positions (where it needs to be to end up in the right columns in the VCF). This is necessary for annoying technical reasons (too-briefly described here: https://tskit.dev/pyslim/docs/latest/tutorial.html#writing-out-genotypes-to-vcf); the tldr; is that to write out a valid VCF you need to (a) do pyslim.generate_nucleotides( ) if you've not got a nucleotide-based simulation, and then (b) pyslim.convert_alleles( ).

-peter

From: John Michael Basaca <john....@gmail.com>
Sent: Tuesday, July 29, 2025 7:30 PM
To: Peter Ralph <p...@uoregon.edu>
Cc: slim-discuss <slim-d...@googlegroups.com>

Subject: Re: Using tree sequence file to start a nucleotide-based SLiM simulation
 
Thank you for your guidance, Peter! You're right! I was able to successfully produce a VCF file via write_vcf method from the SliM tree sequence. However, the nucleotide information (i.e., ANC/REF) is absent already and was planning to use that one for downstream analyses. I checked the SLiM tree metadata though and it was still recognized as nucleotide-based. I think this has something to do with the stacked mutation condition again but could not think of a way to resolve this.

John Michael Basaca

unread,
Jul 31, 2025, 10:31:29 AMJul 31
to Peter Ralph, slim-discuss
Thank you for your guidance, Peter! You're right! I was able to successfully produce a VCF file via write_vcf method from the SliM tree sequence. However, the nucleotide information (i.e., ANC/REF) is absent already and was planning to use that one for downstream analyses. I checked the SLiM tree metadata though and it was still recognized as nucleotide-based. I think this has something to do with the stacked mutation condition again but could not think of a way to resolve this.


On Tue, Jul 29, 2025 at 1:50 AM Peter Ralph <p...@uoregon.edu> wrote:
Reply all
Reply to author
Forward
0 new messages