2D sfs and results interpretation

36 views
Skip to first unread message

Sarah Babaei

unread,
Jun 24, 2025, 8:43:33 AMJun 24
to dadi-user
Dear Ryan and group,

Apologies for the potentially naive questions, I am very new to dadi and generally sfs-related modeling.

I'm using dadi through the dadi-pipeline wrapper by Portik et al. 2017 to find time since divergence and the best model for two recently diverged Amazonian fish species. Based on clustering analyses, there are hybrids in the dataset and some population structure within one of the species. I have 583 snps from 3RAD sequencing (1 snp per locus), obtained using stacks de novo and further filtered using populations and vcftools. I made my sfs using easySFS (https://github.com/isaacovercast/easySFS). 

I've run the pipeline for a few models now, and the best one seems to be two epochs with symmetrical migration. However, I'm having a very difficult time interpreting my sfs and results and determining if the model actually fits well. I can't seem to find an sfs online that looks at all like mine, which makes me think I may have done something wrong. 

For reference I've included the results summary for all the models I've tested and the result plots for 3 of the models. These include the data's sfs and the simulated ones under the different models.

Apologies again and thank you for any advice you can provide,
Sarah
Upstream_Downstream_no_mig.pdf
Results_Summary_Extended.txt
Upstream_Downstream_sym_mig_twoepoch.pdf
Upstream_Downstream_sym_mig.pdf

Ryan Gutenkunst

unread,
Jul 1, 2025, 2:05:48 PMJul 1
to dadi...@googlegroups.com
Hello Sarah,

Your spectra are unusual in that they show a deficit of rare variants in each population compared to intermediate-frequency variants. It’s very difficult to get a non-monotonic SFS from population genetics. What is the coverage like in your data set? Low coverage can cause biases in inferring rare variants. We recently published a paper on accounting for those biases in dadi: https://doi.org/10.1093/molbev/msaf002 .

Best,
Ryan

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/4a294643-d331-45b6-9806-49747fe20062n%40googlegroups.com.
<Upstream_Downstream_no_mig.pdf><Results_Summary_Extended.txt><Upstream_Downstream_sym_mig_twoepoch.pdf><Upstream_Downstream_sym_mig.pdf>

Sarah Babaei

unread,
Jul 1, 2025, 2:20:54 PMJul 1
to dadi-user
Hi Ryan,

Thanks for your reply! This is 3RAD data from a species without a reference genome, so we aren't sure about coverage. However, I believe its safe to assume its very low coverage. I will look into using the low-pass correction.

There actually turned out to be a third population in the dataset, so I will have to update my dataset to include 3 populations (2 upstream, 1 downstream) for the rest of my dadi analyses. Could this also have been a reason for the odd SFS? 

I also have a quick question regarding hybrids. Ordination and admixture analyses seem to suggest 3 populations and 2 hybrid zones (1: mix of pops 1+2, 2: mix of pops 2+3) in my dataset. I have read some conflicting information about whether to include or remove hybrids when running demographic modeling. Since I want to estimate direction of gene flow/migration, I was thinking that including the hybrids would be important, but I am unsure about how to assign populations to these hybrids and was hoping to get your thoughts and recommendations. 

Thank you again for taking the time to read my questions and apologies again for how naive they may be! 

Kindest regards,
Sarah

Ryan Gutenkunst

unread,
Jul 1, 2025, 3:07:34 PMJul 1
to dadi...@googlegroups.com
Hi Sarah,

Replies below

On Jul 1, 2025, at 11:20 AM, Sarah Babaei <sarahba...@gmail.com> wrote:

Thanks for your reply! This is 3RAD data from a species without a reference genome, so we aren't sure about coverage. However, I believe its safe to assume its very low coverage. I will look into using the low-pass correction.

There actually turned out to be a third population in the dataset, so I will have to update my dataset to include 3 populations (2 upstream, 1 downstream) for the rest of my dadi analyses. Could this also have been a reason for the odd SFS? 

Yes, if there is substructure in your samples, that can also cause an excess of intermediate frequency alleles and violate model assumptions.

I also have a quick question regarding hybrids. Ordination and admixture analyses seem to suggest 3 populations and 2 hybrid zones (1: mix of pops 1+2, 2: mix of pops 2+3) in my dataset. I have read some conflicting information about whether to include or remove hybrids when running demographic modeling. Since I want to estimate direction of gene flow/migration, I was thinking that including the hybrids would be important, but I am unsure about how to assign populations to these hybrids and was hoping to get your thoughts and recommendations. 

Interesting question about hybrids. My instinct would be to remove recent hybrids from the population samples, to get better estimates of long-term rates of gene flow. The models assume random mating with populations, which will be violated by recent hybrids.

Best,
Ryan


Thank you again for taking the time to read my questions and apologies again for how naive they may be! 

Kindest regards,
Sarah

On Tuesday, July 1, 2025 at 2:05:48 PM UTC-4 Ryan Gutenkunst wrote:
Hello Sarah,

Your spectra are unusual in that they show a deficit of rare variants in each population compared to intermediate-frequency variants. It’s very difficult to get a non-monotonic SFS from population genetics. What is the coverage like in your data set? Low coverage can cause biases in inferring rare variants. We recently published a paper on accounting for those biases in dadi: https://doi.org/10.1093/molbev/msaf002 .

Best,
Ryan

On Jun 23, 2025, at 11:46 AM, Sarah Babaei <sarahba...@gmail.com> wrote:

Dear Ryan and group,

Apologies for the potentially naive questions, I am very new to dadi and generally sfs-related modeling.

I'm using dadi through the dadi-pipeline wrapper by Portik et al. 2017 to find time since divergence and the best model for two recently diverged Amazonian fish species. Based on clustering analyses, there are hybrids in the dataset and some population structure within one of the species. I have 583 snps from 3RAD sequencing (1 snp per locus), obtained using stacks de novo and further filtered using populations and vcftools. I made my sfs using easySFS (https://github.com/isaacovercast/easySFS). 

I've run the pipeline for a few models now, and the best one seems to be two epochs with symmetrical migration. However, I'm having a very difficult time interpreting my sfs and results and determining if the model actually fits well. I can't seem to find an sfs online that looks at all like mine, which makes me think I may have done something wrong. 

For reference I've included the results summary for all the models I've tested and the result plots for 3 of the models. These include the data's sfs and the simulated ones under the different models.

Apologies again and thank you for any advice you can provide,
Sarah

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/4a294643-d331-45b6-9806-49747fe20062n%40googlegroups.com.
<Upstream_Downstream_no_mig.pdf><Results_Summary_Extended.txt><Upstream_Downstream_sym_mig_twoepoch.pdf><Upstream_Downstream_sym_mig.pdf>


--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages