Split Time Interpretation

41 views
Skip to first unread message

Sarah Babaei

unread,
Mar 9, 2026, 4:56:54 PMMar 9
to dadi-user
Hello,

I was wondering if I could get some feedback on how I'm interpreting the optimized parameter for time since split. 

I used the Portik 2D (2 population) pipeline, and the parameter is described as: T1: Time in the past of split (in units of 2*Na generations). The value is 0.1759. 

Based on other threads in this Google group and Github questions, it seems that Time (years) = T1(parameter) x 2Nref x mutationrate x L.

Nref seems to be calculated using 4Nref=theta(from dadi optimized model)/(mutationrate x L).

To calculate L, I did the following: My data are from RADseq, and after running Stacks de novo and populations, filtering to write only 1 SNP per locus and for maximum heterozygosity of 0.8, I retained 14,249SNPs. My loci are on average 60bp long. So I calculated L = 14,249 x 60. I then downprojected this data in easySFS before plugging into dadi, but as I understand it this doesn't affect my L value. 

Apologies for the long question, any advice on whether I'm doing this right or wrong would be super helpful!

Thank you,
Sarah

Ryan Gutenkunst

unread,
Mar 10, 2026, 7:20:52 PMMar 10
to dadi...@googlegroups.com
Hello Sarah,

Corrections…

Time (years) = T1(parameter) x 2Nref x generation time

The calculation of Nerf is correct.

Down projecting doesn’t affect your L value, but filtering to select only 1 SNP per locus does. If that filter, for example, removes 1/2 of your SNPs, then that reduces L by 1/2.

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/e01367b3-d635-4109-9958-31537156c647n%40googlegroups.com.

Sarah Babaei

unread,
Mar 11, 2026, 12:23:33 PMMar 11
to dadi-user
Hi Ryan,

Thank you for the reply, I really appreciate it! 

I'm still a little confused with calculating L, apologies. If I retained 14,249 SNPs post filtering, should I calculate L by multiplying it by the average size of my loci (assembled during de novo)? I see how filtering would reduce my L, but wouldn't that be taken into consideration since I'm using the final number of SNPs in the calculation? Or would I have to figure out how many SNPs I lost during all my filtering steps and divide it, like so: L=(size of locus) * ((final snps)/(total snps, before filtering))? 

Additionally, I wasn't sure if I should use the average size of locus or the average genotyped sites per locus, as my stacks populations output states:
Kept 555806 loci, composed of 33558642 sites; 10460 of those sites were filtered, 14249 variant sites remained.
Mean genotyped sites per locus: 24.65bp (stderr 0.04).
Would 33558642 be the (total snps, before filtering) to use in my L calculation?

Apologies again for the many questions!

Thank you,
Sarah

Ryan Gutenkunst

unread,
Mar 11, 2026, 7:43:06 PMMar 11
to dadi-user
Hi Sarah,

For L, what we’re trying to calculate is the amount of sequence from which SNPs could have entered the SFS you’re analyzing. So we want to use genotyped sites. And then we account for any additional filtering on top of that.

I’ve not used stacks before, but my interpretation is that the total number of bases that were genotyped was 24.65 * 555806 = 13.7e6. Then 14249 SNPs remained, after filtering out 10460. So the effective L would be 1.37e6 * 14249/(14249+10460) = 7.6e6. It’s a bit confusing, but I’m assuming that those 10460 sites that were filtered were variant sites. You should check the stacks documentation to confirm that.

Best,
Ryan

Sarah Babaei

unread,
Apr 10, 2026, 1:25:34 PMApr 10
to dadi-user
Hi Ryan,

Thank you for your advice! So far I've run 1 and 2 population models on my data, and the values for times and population sizes seem very reasonable/biologically plausible. The values even line up quite nicely with my 1 population stairway plots (done using stairwayplot2). That is all to say that everything about the fits and models seem to make sense. 

For my 2 population models, I'm having trouble calculating the number of migrants per generation, as the way I'm doing it right now gives extremely high values (almost equal to the population sizes). For reference, for the model is sec_contact_asym_mig_size, which estimates m12 and m21. In the model description is describes m12: Migration from pop 2 to pop 1 (2*Na*m12).

If my Nref is 1.91E+05, generation time is 6.5 years, and the optimized m12 parameter is 0.462083052, how should I calculate the number of migrants? Based on the 2*Na*m12, the value is upwards of 100,000 individuals, which doesn't make sense. 

Thank you so much again for all your help thus far, its really made a difference in my work!

Sarah

Ryan Gutenkunst

unread,
Apr 14, 2026, 4:54:11 PMApr 14
to dadi...@googlegroups.com
Hi Sarah,

The migration parameter is confusing… You’ve inferred M12 = 0.462, which is 2*Nref*m12, where m12 is the proportion of individuals in population 1 that are new migrants each generation. So if Nref = 2e5, and m12 = 0.462, then roughly 1 in a million individuals is a new migrant each generation. You can multiply by the actual estimated size of population 1 to get the estimated *number* of new migrants.

Best,
Ryan

Reply all
Reply to author
Forward
0 new messages