Expected 2D-SFS missing high-frequency corners for specific population pairs in a ring species model

13 views
Skip to first unread message

fred jesse

unread,
Feb 26, 2026, 10:36:33 AM (13 days ago) Feb 26
to fastsimcoal2
Dear Laurent and Fastsimcoal2 users,

I am seeking advice on a persistent issue regarding the expected (simulated) 2D-SFS in my demographic modeling.

I have built a 5-population model (Pop 0 to Pop 4, representing different subspecies along the ring) that includes historical isolation and a recent secondary contact event between the terminal populations (Pop 0 and Pop 4).

Because I don't have a reliable outgroup, I am using the folded SFS (Minor Allele Frequency). My observed 2D-SFS heatmaps are complete and continuous, showing valid SNPs in the top-left and bottom-right corners (representing high-frequency minor alleles / private high-frequency alleles).

The Problem: While the overall likelihood scores seem reasonable, the expected (simulated) 2D-SFS heatmaps show completely empty/blank corners for specific population pairs, failing to capture the high-frequency alleles present in my observed data.

Interestingly, this missingness follows a very strict and consistent pattern:

  • Missing corners (top-left / bottom-right blank): Pop 0-1, 0-2, 1-2, 0-3, 1-3, and 2-3.

  • Complete/Normal heatmaps: Pop 0-4, 1-4, 2-4, and 3-4.

In other words, any pair involving Pop 4 is simulated completely fine. But any pair among the other four populations completely fails to simulate high-frequency alleles. I also tested a 4-population model and found the exact same structural missingness for pairs not involving the terminal western population.

Data Context: To achieve comprehensive geographic sampling, my dataset is a merged VCF of WGS (Whole Genome Sequencing) and GBS data. Specifically, Pop 1 and Pop 2 contain 10 WGS individuals each, while the rest of the individuals and populations are sequenced via GBS.

What I have checked:

  1. I am aware that folded SFS naturally has a blank half (since MAF > 0.5 is impossible). The missingness I am referring to is in the valid half of the heatmap where the observed data clearly has SNPs.

  2. I have rigorously constrained the .est and .tpl files to ensure that size changes (T_CHG) strictly occur during the historical isolation period and do not overlap with divergence times (TDIV) or the secondary contact period (T_SC_END).

My Questions:

  1. Why would the optimization algorithm completely fail to generate high-frequency alleles for specific population pairs, while perfectly simulating them for pairs involving Pop 4?

  2. Could the massive number of low-frequency alleles (singletons) from the WGS samples in Pop 1 and Pop 2 be "hijacking" the likelihood optimization, forcing the model to overestimate recent migration rates among Pop 0, 1, 2, and 3, thereby wiping out the corners?

  3. Is there a recommended way in fastsimcoal2 to handle this?

I have attached my .tpl and .est files, as well as the PDF of the 2D-SFS heatmaps for your reference.

Any insights into the underlying mechanics of this issue or suggestions for parameterization would be greatly appreciated!

Best regards,

Lei

final_pure_snps_no_outgroup.tpl
final_pure_snps_no_outgroup_2DMAFs.pdf
final_pure_snps_no_outgroup_maxL.par.pdf
final_pure_snps_no_outgroup.est
Reply all
Reply to author
Forward
0 new messages