how to read in empirical SNPs for multiple chromosomes

22 views
Skip to first unread message

Natalie van Dis

unread,
Oct 14, 2025, 7:35:42 AMOct 14
to slim-discuss
Hi Ben,

I've been working on a SLiM script to simulate neutral drift with empirical SNPs, started during your Helsinki workshop last summer! Because I also have SNPs on the Z sex chromosome, you recommended me to simulate multiple chromosomes.

However, I'm running into the problem of how to read in the empirical SNPs for multiple chromosomes, and I haven't found a solution yet in the SLiM manual or google group.

Using 
// Read in haplosomes from the VCF file

p1.haplosomes.readHaplosomesFromVCF(VCF, mutationType = m1);


Throws error: "readHaplosomesFromVCF() requires that all target haplosomes are associated with the same chromosome"


How would I go about populating my multiple chromosomes (31 in total) with empirical SNPs from a VCF? I copy-paste the model initiation (which runs) and first tick cycle (that throws the error) below for reference.


Thanks in advance for your help!

Natalie


// SLiM script to run Wright-Fisher neutral simulation to simulate null hypothesis (H0) for empirical butterfly populations

// to obtain allele frequency (AF) trajectories when no selection and unequal drift due to Ne fluctutations between timepoints


/// **Set up the model**

initialize() {

// define some variables

defineConstant("Seed_number", getSeed()); // keep track of which simulation is which

defineConstant("POP_NAME", "pop1"); // population being simulated

defineConstant("VCF", "my_file.vcf")); // VCF containing the observed SNPs for generation 1 individuals

defineConstant("OUTPUT_FILE", paste0("results/SLiM_neutral_", POP_NAME, "_sim", Seed_number, ".vcf")); // output file to record final state of each mutation in each individual


// will have two sexes, because have SNPs on Z chromosome

initializeSex();


// m1 mutation type: neutral

initializeMutationType("m1", 0.5, "f", 0.0); // will be SNPs read in from VCF below

m1.convertToSubstitution = F; // want to keep track of all SNPs, also when fix


// g1 genomic element type: uses m1 for all mutations

initializeGenomicElementType("g1", m1, 1.0); // assume all SNPs are neutral, no selection


// define chromosomes

my_chroms = readFile("chrom_info.csv"); // file format = each row contains one variable

for (id in c(0:30), 

chrom_len in strsplit(my_chroms[1], ",")[0:30],

type in strsplit(my_chroms[2], ",")[0:30], // two autosomes and one sex chroms (type = "Z")

symbol in strsplit(my_chroms[3], ",")[0:30] )

{

chrom_len_int = asInteger(chrom_len);


initializeChromosome(id = id, length = chrom_len_int, type = type, symbol = symbol);

initializeGenomicElement(g1);

initializeMutationRate(0); // no mutation rate, because only looking at standing genetic variation

initializeRecombinationRate(4e-8); // Heliconius based genome average rate of recombination for males ❗still need to model no recombination in females

}


// read in empirical Ne measures to use in the simulation

defineConstant("Ne", readFile( paste0("src/Ne/", POP_NAME, "_Ne.txt") ) ); // vector storing the sizes to use below, first row has the number of individuals sampled for generation 0

// NB: first line is header

}



///

/// **Create the initial population**

1 early() {

Ne_size1 = asInteger( strsplit(Ne[1], "\t")[3] );

sim.addSubpop("p1", Ne_size1); // start with number of sampled individuals


// Read in haplosomes from the VCF file

p1.haplosomes.readHaplosomesFromVCF(VCF, mutationType = m1); // ❗DOESN'T WORK YET, error "readHaplosomesFromVCF() requires that all target haplosomes are associated with the same chromosome"


// create population at first sample point

Ne_size2 = asInteger( strsplit(Ne[2], "\t")[3] );

p1.setSubpopulationSize(Ne_size2); // NB: SNP frequencies will be taken from vcf file!

}

Ben Haller

unread,
Oct 14, 2025, 8:07:32 AMOct 14
to Natalie van Dis, slim-discuss
Hi Natalie!

The SLiM manual is the resource you need here, I think.  The documentation for readHaplosomesFromVCF() says:

These restrictions boil down to the fact that readHaplosomesFromVCF() only reads data for a single chromosome.  If you wish to read multi-chromosome VCF data into a multi-chromosome SLiM model, the readIndividualsFromVCF() method provided by the Individual class supports that functionality (because it can work at the level of individuals, rather than haplosomes, making it possible to match calls to the corresponding haplosomes in a reasonable way).  Alternatively, you can call readHaplosomesFromVCF() multiple times to read data for different chromosomes one by one.

And the doc for readIndividualsFromVCF() provides additional detail on how it works.  Let me know if you have followup questions!  :->

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Natalie van Dis wrote on 10/14/25 7:35 AM:
--
SLiM forward genetic simulation: http://messerlab.org/slim/
Before posting, please read http://benhaller.com/slim/bugs_and_questions.html
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/ba67b6f5-9117-458d-aee5-dc8334ea1a25n%40googlegroups.com.

Natalie van Dis

unread,
Oct 17, 2025, 9:27:19 AM (13 days ago) Oct 17
to Ben Haller, slim-discuss

Ben Haller

unread,
Oct 17, 2025, 9:43:24 AM (13 days ago) Oct 17
to Natalie van Dis, slim-discuss
Hi Natalie,

Glad to hear you've got your script working.  Regarding your questions:

- I would guess that the SNPs that are missing are ones that were lost, not one that were fixed.  (Let me know if you are convinced otherwise, though!)  If you need the information about the lost SNPs, you'll need to do that yourself; SLiM doesn't keep mutations that have been lost, in general, and even in the cases where it does, it would not write them out to VCF.  But it would probably be reasonably straightforward for you to detect which SNPs have been lost, and perhaps even add empty call lines to the VCF yourself, with a Python script or similar.

- I'd recommend that, similar to reading with readIndividualsFromVCF(), you use outputIndividualsToVCF() for your output.  That will output all of your chromosomes to a single VCF file.  The outputVCFSample() method you are trying to use is much more limited; it is mostly provided for backward compatility, really, although sometimes it remains useful in simple models.  The beginning of chapter 28 summarizes all of the output methods available, and their pros and cons; it has gotten a bit complicated over the years!  I'd recommend that you read that to get a more complete picture.

Happy modeling!


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Natalie van Dis wrote on 10/17/25 9:27 AM:

Natalie van Dis

unread,
Oct 17, 2025, 9:51:44 AM (13 days ago) Oct 17
to Ben Haller, slim-discuss
Hi Ben,

Thanks for the quick reply and the helpful tips! 

Ah yes I see the point about the difference between fixed and lost SNPs. It made me realize that I have to use outputIndividualsToVCF() in every generation that I also sampled in my empirical dataset to get the allele frequency trajectories I need to construct the null hypothesis. Then it will indeed be relatively straightforward to pick out the alleles that were lost in between (and get some idea of where in the timeline they got lost too!).

Thanks again!
Natalie
Reply all
Reply to author
Forward
0 new messages