2D data SFS with very low entries - problems with input for inference?

Maria Paula Rodriguez

unread,

Mar 3, 2026, 3:59:10 PM (9 days ago) Mar 3

to dadi-user

Hi Ryan,

I've been working with dadi only for a while so thanks for this google group! it has been useful to clarify many things.

I would like to hear your thoughts about the 2D SFS of my dataset (attached figures a and b were plotted with vmin=1 and 1e-4, respectively; the 1D spectra of each pop is also attached – log(SNP count) is included to ease visualization). My main concern is to know if the data looks good enough to conduct demographic modeling analyses, as my interpretation of my 2D spectrum is that most frequency bins have really low entries. Most of the variants are singletons (which is what we expect), but the rest of the frequency bins have really low values, so I don't know if the dataset might be "uninformative" for an accurate inference. I couldn’t find a similar SFS to mine in the group, but apologies if I missed a post related to this.

A bit of background of my study system:

I'm working with two fish populations in which the divergence is suspected to be very recent (within a century), although the Fst calculated in dadi is 0.117. Divergence between them is suggested by PCA, structure-type and phylogenetic analyses. I’m working with ddRAD data.

I generated the spectra with easySFS with -a --proj=224, 239. The input vcf contains 73,536 strictly variable SNP's that resulted from retaining variants with the following filtering pipeline: only biallelic sites, min. site quality of 20, min. GT quality of 20, max. mean depth of 40 (as a way to exclude paralogs, but I also had to exclude sites with a mean heterozygosity > 0.6 since my initial model runs in dadi showed the 0.5 frequency bin populated), and a max. allowed site missingness of 30%. Regarding this latter aspect, the mean site missingness was 0.18, which was even lower after applying the filter, so I don't think that this might be a strong factor causing the low entries in the SFS during the down-projection (but I might be wrong).

For calculating L, I'm considering all initial variant as well invariant sites, and the proportion of final variant sites compared to all the variants present before filtering, as suggested in previous posts.

What I've tried so far in dadi

I've just run the split with migration model from the dadi website (split_diff_mig_model), since I want to infer the divergence time, sizes of both populations, and migration rates (which I'm allowing to be asymmetrical). Population sizes could have changed, but I decided to start with a basic model to see how it went.

The inference was really fast (~18 min), but I received the 'Model is < 0 where data is not masked' warning throughout the optimization. In another post you said it means that "dadi is struggling to compute the SFS for a given parameter value. Early in parameter optimization, that’s okay because it typically happens in corners of parameter space that are unlikely" and because I'm getting it at the end I imagine that extrapolation didn't work. In fact, the model and data look very different, the model is overestimating shared variation (diagonal) and underestimating the differences between populations (that is my interpretation, I’d really appreciate if you can correct me in case I'm wrong). Although I know I cannot trust the results from this initial run, I want to say that the optimized parameters didn't hit the bounds, which makes me think that I'm not experiencing a problem in this sense, at least for now:

params= [nu1, nu2, T, m21, m21]

optimized params= [1614.99, 597.12, 0.0294, 3.33, 3.36]

low.bound= [1e-2, 1e-2, 1e-5, 1e-5, 1e-5]

upp.bound= [2000, 2000, 2, 10, 10]

Based on this result, I was thinking about changing the initial guess of the parameters as I might be in a very improbable parameter space. In particular, I thought about decreasing the allowed migration and increase the divergence time (as the model is overfitting shared variation between the populations) and maybe decrease the population sizes to allow more drift on each (as differences are being underfitted).

However, before tweaking the parameters, I want to know if it's possible that the input looks weird/is insensitive. Although the divergence might be small and recent, I thought that the number of SNP's could be enough for demographic inference, and the 1D of each population look fine to me. I'm particularly interested in estimating the divergence time between the two populations, but maybe my dataset can be limited for this(?).

I am happy to provide more code/information as needed. Thank you in advance for your help.

Maria

a.Down224.Ups239_vmin1.png

b.Down224.Ups239_vmin1e-4.png

c. 1D.folded.downs.PNG

d. 1D.folded.ups.PNG

Maria Paula Rodriguez

unread,

Mar 3, 2026, 4:18:58 PM (8 days ago) Mar 3

to dadi-user

I forgot to attach the model output, but here it is: 2. Down-Up_SplitDifMigMod_allParamsFree_noGPU.png

Ryan Gutenkunst

unread,

Mar 5, 2026, 4:49:07 PM (6 days ago) Mar 5

to dadi...@googlegroups.com

Hello Maria,

I don’t see any obvious issues with your data plots. I’m not sure why that model run struggled. Was that the result of a single fit or multiple? It would probably be best to set nu1 and nu2 upper bounds to be smaller, like 10ish. It’s likely that the model got stuck in a very “flat” region of the likelihood landscape, because very high nus imply very slow drift, so the SFS doesn’t change at all after divergence.

Best,

Ryan

On Mar 3, 2026, at 2:18 PM, Maria Paula Rodriguez <mrodrig...@gmail.com> wrote:

I forgot to attach the model output, but here it is: <2. Down-Up_SplitDifMigMod_allParamsFree_noGPU.png>

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/9ed5c845-c0eb-4b15-b7a7-9c1316849ee9n%40googlegroups.com.
<2. Down-Up_SplitDifMigMod_allParamsFree_noGPU.png>

Maria Paula Rodriguez

unread,

Mar 6, 2026, 11:15:32 AM (6 days ago) Mar 6

to dadi-user

Hi Ryan,

Thanks for your fast reply!

That was the result of a single fit. What you say about the nu's and drift makes a lot of sense. I will conduct more runs and decrease the sizes as you suggested.

Maria

Reply all

Reply to author

Forward