Using population-level sequencing data in SLiM for neutral evolution simulations

21 views
Skip to first unread message

Burak Demirbas

unread,
Jan 6, 2026, 11:22:27 AM (4 days ago) Jan 6
to slim-discuss

Hi! I have a question regarding using population sequence data in SLiM.

My initial goal for a project is to model neutral evolution in a population of ~500 individuals over N generations using standard Wright-Fisher simulations. The project is supposed to complement experimental work, and hence I want to use real sequencing data in the form of .vcf files containing information on SNPs.

The problem is that I only have population sequence data (for a population of ~500 individuals). It does contain information on average allele frequencies, but there is no individual-level data. As far as I can tell from the SLiM manual there is no straightforward way to load in and use population-level .vcf files directly in SLiM (it seems it always assumes it is individual-level data), is this correct?

What I do now as a ‘workaround’ is to use the allele frequency information from the population-level data to estimate how many individuals you should expect to have each allele in a given population. I generate a number of artificial individuals in a new .vcf file (in Python) and then load this into SLiM to run simulations. This seems to work, in that the simulations do successfully run.

My main question is, does my workaround make sense (i.e. am I making some kind of mistake)? And is there an easier way to implement this in SLiM that I am missing? 

Ben Haller

unread,
Jan 6, 2026, 11:40:30 AM (4 days ago) Jan 6
to slim-d...@googlegroups.com
Hi Burak!

I think what you're doing makes sense (of course your simulated genetic configuration will not capture the phasing/linkage patterns present in the real system, but if you don't have that information, that's life).  And no, at present there is no easier way to implement what you're doing; your approach sounds exactly right.

I'm intrigued by the idea of a VCF file that contains population-level data rather than individual-level data; I was not aware that the VCF format supported that.  Could you possibly open a new issue on SLiM on GitHub (https://github.com/MesserLab/SLiM/issues/new/choose) that describes what you're trying to do and provides the VCF file that you want to import?  This question of how to start off from a particular genetic state comes up fairly often (understandably), and I haven't been sure exactly how SLiM might support it; but if there is such a thing as a population-level VCF file, that would provide a clear path forward for SLiM to provide better built-in support for this sort of genetic configuration.

Thanks!

Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University
--
SLiM forward genetic simulation: http://messerlab.org/slim/
---
You received this message because you are subscribed to the Google Groups "slim-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to slim-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/slim-discuss/57dc57c3-e84d-48bd-a4d4-688201be873an%40googlegroups.com.

Burak Demirbas

unread,
Jan 6, 2026, 1:22:49 PM (4 days ago) Jan 6
to slim-discuss
Hi Ben, thank you your quick reply! It's reassuring to hear that my approach makes sense. 

That sounds like a good idea. I will double check with my experimental collaborator to make sure I am describing the format and the problem exactly right and will then open a new issue on GitHub. It would be great if this information is useful for future iterations of SLiM!

Kind regards,
Burak


Op dinsdag 6 januari 2026 om 17:40:30 UTC+1 schreef Ben Haller:

Gregor Gorjanc

unread,
Jan 7, 2026, 9:12:35 AM (3 days ago) Jan 7
to slim-discuss
You could try to estimate a demographic model from such data and simulate the past, say using msprime, generate a VCF and import that. In this way you would get chromosomes with linkage, instead of random/no-linkage chromosomes. 

Ben Haller

unread,
Jan 7, 2026, 9:32:51 AM (3 days ago) Jan 7
to slim-d...@googlegroups.com
I love that suggestion, thanks Gregor!

I have a naive followup question.  Is there a way to do the sort of procedure you suggest, but such that the end result exactly fits the present-day mutational data that Burak has in VCF?  I.e., with mutations at the same positions, at the same frequencies as specified by the VCF?  Can one start with the known VCF data at the present day, and in some way do a coalescence-with-mutations process building plausible ancestry backwards from that anchor?  Or would the end result of your suggested procedure necessarily be new data drawn from the estimated model, and thus having mutations at different positions/frequencies than the original dataset (but statistically "similar" in some sense)?  Sorry – I know very little about such things!  :->


Cheers,
-B.

Benjamin C. Haller
Messer Lab
Cornell University


Gregor Gorjanc

unread,
Jan 7, 2026, 1:56:35 PM (3 days ago) Jan 7
to slim-discuss
What I proposed would generate a completely new dataset, which would only share the loose demographic model with the observed data. The upside is that one can simulate multiple such starting points, possibly generalising the study to more settings, but the downside is that it is not generating the exact observed data.

I am not familiar with inferring a demographic model from allele frequencies (that is surely possible using tools like dadi/moments/momi/...) AND "ensuring" that the model produces the observed data (seems impossible since the simulation from such a demographic model is random).

gg 

Reply all
Reply to author
Forward
0 new messages