dadi-cli 3D model plotting issue

60 views
Skip to first unread message

Skye Davis

unread,
May 20, 2024, 10:47:47 AMMay 20
to dadi-user
Hi Ryan and dadi team,

I'm using dadi-cli (very streamlined, thanks for bringing this out!) but I'm encountering an issue when trying to plot results for a 3d model (split_nomig).

I used an unfolded SFS generated with ANGSD/realSFS - then folded it in dadi (generating folded in ANGSD gave errors with dadi-cli, it did not recognize it as folded. The plot is the same though, so I'm not sure what the issue was).
all_pop_3d_sfs_folded_plot.png
Next in dadi-cli I ran:
>dadi-cli InferDM --fs ./all_pop_3d_sfs_folded.fs --output-prefix EP_WP_AI --optimizations 100 --check-convergence 10 --global-optimization --maxeval 400 --nomisid --model split_nomig --lbounds 1e-3 1e-3 1e-3 1e-3 0 0 --ubounds 100 100 100 100 1 1 --output ./Output/3pop_split_nomig_optimization.params

I added 'global-optimization' and 'maxeval' because convergence was failing without those and it was suggested in the manual.

This works (though with lots of 'model is <0 where data is not masked' errors) and I get the # Converged results and # Top 100 results in the output file. It converges, although the first T parameter is at the lower boundary (0), for all the results.

E.g., top result:
# Converged results
# Log(likelihood) nu1 nuA nu2 nu3 T1 T2 theta
-31490.36291078844 1.0973973171374933 0.004730289727100075 8.198105924385347 0.465853564914095 0.0 0.2366751079720142 4519.67105582473

But then I try to run the code to plot the model against the data:
>dadi-cli Plot --fs ./all_pop_3d_sfs_folded.fs --demo-popt ./Output/3pop_split_nomig_optimization.params.InferDM.bestfits --output ./Output/3pop_data_vs_model_split-nomig.pdf --model split_nomig

For which I get this error:
[...] portik_models_3d.py", line 28, in split_nomig
nu1, nuA, nu2, nu3, T1, T2 = params
ValueError: not enough values to unpack (expected 6, got 5)

I'm new to dadi-cli, so not sure how to fix this issue? Is it a bug, or an issue with my data?

Thanks so much for all your help!

Cheers,

Skye

Ryan Gutenkunst

unread,
May 21, 2024, 2:54:32 PMMay 21
to dadi-user, Struck, Travis Jared - (tjstruck)
Hello Skye,

This is almost certainly a bug in dadi-cli. We’ll look into it and get back to you. If you’re comfortable with it, it would be helpful for you to add it as an issue here, with the Label “Bug”: https://github.com/xin-huang/dadi-cli/issues .

Best,
Ryan
all_pop_3d_sfs_folded_plot.png
all_pop_3d_sfs_folded_plot.png

Travis Struck

unread,
May 21, 2024, 7:25:29 PMMay 21
to dadi...@googlegroups.com
Hi Skye,

For now, you need to add "--nomisid" to your Plot subcommands when a model is involved (vs. just data). By default, dadi-cli assumes a model will be for unfolded data, and excepts an extra parameter for misidentification.

As part of the error, it should suggest adding/removing --nomisid when there is a mismatch in the number of parameters given vs. expected. I'll look into why it didn't.

I hope this helps,
Travis

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/c5228ff0-2636-49c1-8060-ab16878e3a60n%40googlegroups.com.

Skye Davis

unread,
May 28, 2024, 9:28:22 AMMay 28
to dadi-user
Thanks so much Travis and Ryan, and for the speedy reply! Adding --nomisid worked.
I'm curious - are my spectra truly folded? I thought the upper diagonal should be empty - but whether I generate folded in angsd, or unfolded in angsd and fold in dadi, I get the same output as the plot I sent earlier. Is this normal?

Cheers,

Skye

Ryan Gutenkunst

unread,
May 28, 2024, 10:07:20 PMMay 28
to dadi-user
Hello Skye,

Yes, your spectra are folded. It’s a little confusing because what is plotted here are marginals coming from the full 3D spectrum. In your case, with 46*3 = 138 total haplotypes between all three populations, the maximum minor allele frequency is 138/2 = 69. So in these 2D marginal spectra, you could see alleles that had, for example, the minor allele at frequency 46 in WP, 0 in EP, and 23 in ASC. But you could not have 46 in WP, 0 in EP, and 24 in ASC (because then it wouldn’t be the minor allele). So you see pure white in the upper corner of the plots, but it’s not 1/2 as you might expect, because we’re considering the minor allele across all three populations, not a particular population pair.

Best,
Ryan

> On May 27, 2024, at 9:41 PM, Skye Davis <k.skye...@gmail.com> wrote:
>
> Thanks so much Travis and Ryan, and for the speedy reply! Adding --nomisid worked.
> I'm curious - are my spectra truly folded? I thought the upper diagonal should be empty - but whether I generate folded in angsd, or unfolded in angsd and fold in dadi, I get the same output as the plot I sent earlier. Is this normal?
>
> Cheers,
>
> Skye
>
> On Wednesday, May 22, 2024 at 9:25:29 AM UTC+10 Travis Struck wrote:
> Hi Skye,
>
> For now, you need to add "--nomisid" to your Plot subcommands when a model is involved (vs. just data). By default, dadi-cli assumes a model will be for unfolded data, and excepts an extra parameter for misidentification.
>
> As part of the error, it should suggest adding/removing --nomisid when there is a mismatch in the number of parameters given vs. expected. I'll look into why it didn't.
>
> I hope this helps,
> Travis
>
> On Mon, May 20, 2024 at 7:47 AM Skye Davis <k.skye...@gmail.com> wrote:
> Hi Ryan and dadi team,
>
> I'm using dadi-cli (very streamlined, thanks for bringing this out!) but I'm encountering an issue when trying to plot results for a 3d model (split_nomig).
>
> I used an unfolded SFS generated with ANGSD/realSFS - then folded it in dadi (generating folded in ANGSD gave errors with dadi-cli, it did not recognize it as folded. The plot is the same though, so I'm not sure what the issue was).Next in dadi-cli I ran:
> >dadi-cli InferDM --fs ./all_pop_3d_sfs_folded.fs --output-prefix EP_WP_AI --optimizations 100 --check-convergence 10 --global-optimization --maxeval 400 --nomisid --model split_nomig --lbounds 1e-3 1e-3 1e-3 1e-3 0 0 --ubounds 100 100 100 100 1 1 --output ./Output/3pop_split_nomig_optimization.params
>
> I added 'global-optimization' and 'maxeval' because convergence was failing without those and it was suggested in the manual.
>
> This works (though with lots of 'model is <0 where data is not masked' errors) and I get the # Converged results and # Top 100 results in the output file. It converges, although the first T parameter is at the lower boundary (0), for all the results.
>
> E.g., top result:
> # Converged results
> # Log(likelihood) nu1 nuA nu2 nu3 T1 T2 theta
> -31490.36291078844 1.0973973171374933 0.004730289727100075 8.198105924385347 0.465853564914095 0.0 0.2366751079720142 4519.67105582473
>
> But then I try to run the code to plot the model against the data:
> >dadi-cli Plot --fs ./all_pop_3d_sfs_folded.fs --demo-popt ./Output/3pop_split_nomig_optimization.params.InferDM.bestfits --output ./Output/3pop_data_vs_model_split-nomig.pdf --model split_nomig
>
> For which I get this error:
> [...] portik_models_3d.py", line 28, in split_nomig
> nu1, nuA, nu2, nu3, T1, T2 = params
> ValueError: not enough values to unpack (expected 6, got 5)
>
> I'm new to dadi-cli, so not sure how to fix this issue? Is it a bug, or an issue with my data?
>
> Thanks so much for all your help!
>
> Cheers,
>
> Skye
>
>
> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/c5228ff0-2636-49c1-8060-ab16878e3a60n%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/69a797b3-1604-493b-8a90-6903a32af9d0n%40googlegroups.com.

Skye Davis

unread,
May 30, 2024, 10:33:16 PMMay 30
to dadi-user

Amazing, thank you so much for explaining that Ryan - your continued support on this forum is really appreciated! I have a few questions about how to proceed post optimization - happy to create a new forum question if that's better, otherwise posting here because the context for my analysis is above.

1) Should I be concerned one of the parameters of divergence time is hitting the lower bound of 0?

2) Since I generated an sfs in ANGSD - I don't have a vcf file to do the stats tests in dadi-cli described here (https://dadi-cli.readthedocs.io/en/latest/userguide/stat/). I'm wanting confidence intervals around my parameter estimates, but assuming my loci are linked so can't use the straightforward Fisher method etc.

Ryan Gutenkunst

unread,
Jun 4, 2024, 7:12:11 PMJun 4
to dadi-user
Hello Skye,

> On May 30, 2024, at 5:20 PM, Skye Davis <k.skye...@gmail.com> wrote:
>
> Amazing, thank you so much for explaining that Ryan - your continued support on this forum is really appreciated! I have a few questions about how to proceed post optimization - happy to create a new forum question if that's better, otherwise posting here because the context for my analysis is above.
> 1) Should I be concerned one of the parameters of divergence time is hitting the lower bound of 0?

I see that as biologically interpretable, that you don’t have any evidence for a time above zero. So I wouldn’t be too concerned.

> 2) Since I generated an sfs in ANGSD - I don't have a vcf file to do the stats tests in dadi-cli described here (https://dadi-cli.readthedocs.io/en/latest/userguide/stat/). I'm wanting confidence intervals around my parameter estimates, but assuming my loci are linked so can't use the straightforward Fisher method etc.

I’m not sure there’s a real easy way here. You could do the genomic region bootstrapping on your mapped BAM files, then generate each bootstrap sfs in ANGSD. That would be pretty bioinformatics-intensive, but it might be the only way through ANGSD.

Best,
Ryan
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/8238401e-5b61-4d7a-a2a0-fd06f2875449n%40googlegroups.com.

Skye Davis

unread,
Jul 3, 2024, 5:11:16 PM (14 days ago) Jul 3
to dadi-user
Hi Ryan,

I have generated the bootstrapped sfs in ANGSD as you suggested, to use with the StatDM function in dadi-cli, but all my parameters are 'nan' in the resulting godambi.ci files - for every 1d model I optimized params for, regardless of whether they hit bounds or not (example below is with a model that didn't hit bounds). I can't figure out why all the uncertainties and 95% CI's are nan.

This is what I've done so far:

1) Optimize model parameters - did 1000 optimizations across 5 independent runs (also tried 5000 opts across 10 independent runs - results converge on the same values as in the 1000 opts runs)

dadi-cli InferDM --fs ./ALL_pop_n60_1d_sfs_folded_unmaskSingle.fs --output-prefix ALL_pop_n60 --optimizations 1000 --check-convergence 500 --global-optimization --nomisid --model bottlegrowth_1d --lbounds 1e-3 1e-3 0 --ubounds 100 100 1 --output ./Output/ALL_pop_n60_1d_bottlegrowth_optimize.params

# Example of bestfits output file:
# Converged results
# Log(likelihood) nuB nuF T theta
-444.7261843940661 44.64392900366994 0.6642586638175025 0.3763656605291387 9746.460184023845
-444.7261843940739 44.643891257931124 0.6642586650636733 0.37636548689043875 9746.460562884124
-444.7261843941201 44.64398034780311 0.6642587381623472 0.37636577122046017 9746.45955746676

2) Generate 100 bootstrap sfs in ANGSD (exact same way as empirical sfs - generate as unfolded in angsd, then fold in dadi). Empirical and bootstrap sfs are very similar and have same dimensions and masks.

3) Run StatDM for each model, e.g.: (I have 60 samples, so currently set grids to 120 140 160 - but have tried up to 200 240 280)

dadi-cli StatDM --fs ./ALL_pop_n60_1d_sfs_folded_unmaskSingle.fs --model bottlegrowth_1d --grids 120 140 160 --nomisid --demo-popt ./Output/ALL_pop_n60_1d_bottlegrowth_optimize.params.InferDM.bestfits  --bootstrapping-dir ./ALL_pop_n60_1d_boostrap_sfs/ --output ./Output/ALL_pop_n60_1d_bottlegrowth.bestfit.params.godambe.ci

# Example godambe.ci output file"

Estimated 95% uncerts (theta adj), with step size 0.1): [nan nan nan]
Lower bounds of 95% confidence interval : [nan nan nan]
Upper bounds of 95% confidence interval : [nan nan nan]

Estimated 95% uncerts (theta adj), with step size 0.01): [nan nan nan]
Lower bounds of 95% confidence interval : [nan nan nan]
Upper bounds of 95% confidence interval : [nan nan nan]

Estimated 95% uncerts (theta adj), with step size 0.001): [nan nan nan]
Lower bounds of 95% confidence interval : [nan nan nan]
Upper bounds of 95% confidence interval : [nan nan nan]
Reply all
Reply to author
Forward
0 new messages