dadi-cli 3D model plotting issue

233 views
Skip to first unread message

Skye Davis

unread,
May 20, 2024, 10:47:47 AM5/20/24
to dadi-user
Hi Ryan and dadi team,

I'm using dadi-cli (very streamlined, thanks for bringing this out!) but I'm encountering an issue when trying to plot results for a 3d model (split_nomig).

I used an unfolded SFS generated with ANGSD/realSFS - then folded it in dadi (generating folded in ANGSD gave errors with dadi-cli, it did not recognize it as folded. The plot is the same though, so I'm not sure what the issue was).
all_pop_3d_sfs_folded_plot.png
Next in dadi-cli I ran:
>dadi-cli InferDM --fs ./all_pop_3d_sfs_folded.fs --output-prefix EP_WP_AI --optimizations 100 --check-convergence 10 --global-optimization --maxeval 400 --nomisid --model split_nomig --lbounds 1e-3 1e-3 1e-3 1e-3 0 0 --ubounds 100 100 100 100 1 1 --output ./Output/3pop_split_nomig_optimization.params

I added 'global-optimization' and 'maxeval' because convergence was failing without those and it was suggested in the manual.

This works (though with lots of 'model is <0 where data is not masked' errors) and I get the # Converged results and # Top 100 results in the output file. It converges, although the first T parameter is at the lower boundary (0), for all the results.

E.g., top result:
# Converged results
# Log(likelihood) nu1 nuA nu2 nu3 T1 T2 theta
-31490.36291078844 1.0973973171374933 0.004730289727100075 8.198105924385347 0.465853564914095 0.0 0.2366751079720142 4519.67105582473

But then I try to run the code to plot the model against the data:
>dadi-cli Plot --fs ./all_pop_3d_sfs_folded.fs --demo-popt ./Output/3pop_split_nomig_optimization.params.InferDM.bestfits --output ./Output/3pop_data_vs_model_split-nomig.pdf --model split_nomig

For which I get this error:
[...] portik_models_3d.py", line 28, in split_nomig
nu1, nuA, nu2, nu3, T1, T2 = params
ValueError: not enough values to unpack (expected 6, got 5)

I'm new to dadi-cli, so not sure how to fix this issue? Is it a bug, or an issue with my data?

Thanks so much for all your help!

Cheers,

Skye

Ryan Gutenkunst

unread,
May 21, 2024, 2:54:32 PM5/21/24
to dadi-user, Struck, Travis Jared - (tjstruck)
Hello Skye,

This is almost certainly a bug in dadi-cli. We’ll look into it and get back to you. If you’re comfortable with it, it would be helpful for you to add it as an issue here, with the Label “Bug”: https://github.com/xin-huang/dadi-cli/issues .

Best,
Ryan
all_pop_3d_sfs_folded_plot.png
all_pop_3d_sfs_folded_plot.png

Travis Struck

unread,
May 21, 2024, 7:25:29 PM5/21/24
to dadi...@googlegroups.com
Hi Skye,

For now, you need to add "--nomisid" to your Plot subcommands when a model is involved (vs. just data). By default, dadi-cli assumes a model will be for unfolded data, and excepts an extra parameter for misidentification.

As part of the error, it should suggest adding/removing --nomisid when there is a mismatch in the number of parameters given vs. expected. I'll look into why it didn't.

I hope this helps,
Travis

--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/c5228ff0-2636-49c1-8060-ab16878e3a60n%40googlegroups.com.

Skye Davis

unread,
May 28, 2024, 9:28:22 AM5/28/24
to dadi-user
Thanks so much Travis and Ryan, and for the speedy reply! Adding --nomisid worked.
I'm curious - are my spectra truly folded? I thought the upper diagonal should be empty - but whether I generate folded in angsd, or unfolded in angsd and fold in dadi, I get the same output as the plot I sent earlier. Is this normal?

Cheers,

Skye

Ryan Gutenkunst

unread,
May 28, 2024, 10:07:20 PM5/28/24
to dadi-user
Hello Skye,

Yes, your spectra are folded. It’s a little confusing because what is plotted here are marginals coming from the full 3D spectrum. In your case, with 46*3 = 138 total haplotypes between all three populations, the maximum minor allele frequency is 138/2 = 69. So in these 2D marginal spectra, you could see alleles that had, for example, the minor allele at frequency 46 in WP, 0 in EP, and 23 in ASC. But you could not have 46 in WP, 0 in EP, and 24 in ASC (because then it wouldn’t be the minor allele). So you see pure white in the upper corner of the plots, but it’s not 1/2 as you might expect, because we’re considering the minor allele across all three populations, not a particular population pair.

Best,
Ryan

> On May 27, 2024, at 9:41 PM, Skye Davis <k.skye...@gmail.com> wrote:
>
> Thanks so much Travis and Ryan, and for the speedy reply! Adding --nomisid worked.
> I'm curious - are my spectra truly folded? I thought the upper diagonal should be empty - but whether I generate folded in angsd, or unfolded in angsd and fold in dadi, I get the same output as the plot I sent earlier. Is this normal?
>
> Cheers,
>
> Skye
>
> On Wednesday, May 22, 2024 at 9:25:29 AM UTC+10 Travis Struck wrote:
> Hi Skye,
>
> For now, you need to add "--nomisid" to your Plot subcommands when a model is involved (vs. just data). By default, dadi-cli assumes a model will be for unfolded data, and excepts an extra parameter for misidentification.
>
> As part of the error, it should suggest adding/removing --nomisid when there is a mismatch in the number of parameters given vs. expected. I'll look into why it didn't.
>
> I hope this helps,
> Travis
>
> On Mon, May 20, 2024 at 7:47 AM Skye Davis <k.skye...@gmail.com> wrote:
> Hi Ryan and dadi team,
>
> I'm using dadi-cli (very streamlined, thanks for bringing this out!) but I'm encountering an issue when trying to plot results for a 3d model (split_nomig).
>
> I used an unfolded SFS generated with ANGSD/realSFS - then folded it in dadi (generating folded in ANGSD gave errors with dadi-cli, it did not recognize it as folded. The plot is the same though, so I'm not sure what the issue was).Next in dadi-cli I ran:
> >dadi-cli InferDM --fs ./all_pop_3d_sfs_folded.fs --output-prefix EP_WP_AI --optimizations 100 --check-convergence 10 --global-optimization --maxeval 400 --nomisid --model split_nomig --lbounds 1e-3 1e-3 1e-3 1e-3 0 0 --ubounds 100 100 100 100 1 1 --output ./Output/3pop_split_nomig_optimization.params
>
> I added 'global-optimization' and 'maxeval' because convergence was failing without those and it was suggested in the manual.
>
> This works (though with lots of 'model is <0 where data is not masked' errors) and I get the # Converged results and # Top 100 results in the output file. It converges, although the first T parameter is at the lower boundary (0), for all the results.
>
> E.g., top result:
> # Converged results
> # Log(likelihood) nu1 nuA nu2 nu3 T1 T2 theta
> -31490.36291078844 1.0973973171374933 0.004730289727100075 8.198105924385347 0.465853564914095 0.0 0.2366751079720142 4519.67105582473
>
> But then I try to run the code to plot the model against the data:
> >dadi-cli Plot --fs ./all_pop_3d_sfs_folded.fs --demo-popt ./Output/3pop_split_nomig_optimization.params.InferDM.bestfits --output ./Output/3pop_data_vs_model_split-nomig.pdf --model split_nomig
>
> For which I get this error:
> [...] portik_models_3d.py", line 28, in split_nomig
> nu1, nuA, nu2, nu3, T1, T2 = params
> ValueError: not enough values to unpack (expected 6, got 5)
>
> I'm new to dadi-cli, so not sure how to fix this issue? Is it a bug, or an issue with my data?
>
> Thanks so much for all your help!
>
> Cheers,
>
> Skye
>
>
> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/c5228ff0-2636-49c1-8060-ab16878e3a60n%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "dadi-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/69a797b3-1604-493b-8a90-6903a32af9d0n%40googlegroups.com.

Skye Davis

unread,
May 30, 2024, 10:33:16 PM5/30/24
to dadi-user

Amazing, thank you so much for explaining that Ryan - your continued support on this forum is really appreciated! I have a few questions about how to proceed post optimization - happy to create a new forum question if that's better, otherwise posting here because the context for my analysis is above.

1) Should I be concerned one of the parameters of divergence time is hitting the lower bound of 0?

2) Since I generated an sfs in ANGSD - I don't have a vcf file to do the stats tests in dadi-cli described here (https://dadi-cli.readthedocs.io/en/latest/userguide/stat/). I'm wanting confidence intervals around my parameter estimates, but assuming my loci are linked so can't use the straightforward Fisher method etc.

Ryan Gutenkunst

unread,
Jun 4, 2024, 7:12:11 PM6/4/24
to dadi-user
Hello Skye,

> On May 30, 2024, at 5:20 PM, Skye Davis <k.skye...@gmail.com> wrote:
>
> Amazing, thank you so much for explaining that Ryan - your continued support on this forum is really appreciated! I have a few questions about how to proceed post optimization - happy to create a new forum question if that's better, otherwise posting here because the context for my analysis is above.
> 1) Should I be concerned one of the parameters of divergence time is hitting the lower bound of 0?

I see that as biologically interpretable, that you don’t have any evidence for a time above zero. So I wouldn’t be too concerned.

> 2) Since I generated an sfs in ANGSD - I don't have a vcf file to do the stats tests in dadi-cli described here (https://dadi-cli.readthedocs.io/en/latest/userguide/stat/). I'm wanting confidence intervals around my parameter estimates, but assuming my loci are linked so can't use the straightforward Fisher method etc.

I’m not sure there’s a real easy way here. You could do the genomic region bootstrapping on your mapped BAM files, then generate each bootstrap sfs in ANGSD. That would be pretty bioinformatics-intensive, but it might be the only way through ANGSD.

Best,
Ryan
> To view this discussion on the web visit https://groups.google.com/d/msgid/dadi-user/8238401e-5b61-4d7a-a2a0-fd06f2875449n%40googlegroups.com.

Skye Davis

unread,
Jul 3, 2024, 5:11:16 PM7/3/24
to dadi-user
Hi Ryan,

I have generated the bootstrapped sfs in ANGSD as you suggested, to use with the StatDM function in dadi-cli, but all my parameters are 'nan' in the resulting godambi.ci files - for every 1d model I optimized params for, regardless of whether they hit bounds or not (example below is with a model that didn't hit bounds). I can't figure out why all the uncertainties and 95% CI's are nan.

This is what I've done so far:

1) Optimize model parameters - did 1000 optimizations across 5 independent runs (also tried 5000 opts across 10 independent runs - results converge on the same values as in the 1000 opts runs)

dadi-cli InferDM --fs ./ALL_pop_n60_1d_sfs_folded_unmaskSingle.fs --output-prefix ALL_pop_n60 --optimizations 1000 --check-convergence 500 --global-optimization --nomisid --model bottlegrowth_1d --lbounds 1e-3 1e-3 0 --ubounds 100 100 1 --output ./Output/ALL_pop_n60_1d_bottlegrowth_optimize.params

# Example of bestfits output file:
# Converged results
# Log(likelihood) nuB nuF T theta
-444.7261843940661 44.64392900366994 0.6642586638175025 0.3763656605291387 9746.460184023845
-444.7261843940739 44.643891257931124 0.6642586650636733 0.37636548689043875 9746.460562884124
-444.7261843941201 44.64398034780311 0.6642587381623472 0.37636577122046017 9746.45955746676

2) Generate 100 bootstrap sfs in ANGSD (exact same way as empirical sfs - generate as unfolded in angsd, then fold in dadi). Empirical and bootstrap sfs are very similar and have same dimensions and masks.

3) Run StatDM for each model, e.g.: (I have 60 samples, so currently set grids to 120 140 160 - but have tried up to 200 240 280)

dadi-cli StatDM --fs ./ALL_pop_n60_1d_sfs_folded_unmaskSingle.fs --model bottlegrowth_1d --grids 120 140 160 --nomisid --demo-popt ./Output/ALL_pop_n60_1d_bottlegrowth_optimize.params.InferDM.bestfits  --bootstrapping-dir ./ALL_pop_n60_1d_boostrap_sfs/ --output ./Output/ALL_pop_n60_1d_bottlegrowth.bestfit.params.godambe.ci

# Example godambe.ci output file"

Estimated 95% uncerts (theta adj), with step size 0.1): [nan nan nan]
Lower bounds of 95% confidence interval : [nan nan nan]
Upper bounds of 95% confidence interval : [nan nan nan]

Estimated 95% uncerts (theta adj), with step size 0.01): [nan nan nan]
Lower bounds of 95% confidence interval : [nan nan nan]
Upper bounds of 95% confidence interval : [nan nan nan]

Estimated 95% uncerts (theta adj), with step size 0.001): [nan nan nan]
Lower bounds of 95% confidence interval : [nan nan nan]
Upper bounds of 95% confidence interval : [nan nan nan]

Ryan Gutenkunst

unread,
Jul 22, 2024, 4:10:56 PM7/22/24
to dadi-user
Hello Skye,

My apologies for the slow reply. First it was SMBE, then it was coming back with Covid…

It’s hard to know what could be happening without seeing the data and bootstraps. If you’re still having this issue, you can send me the relevant files off-list and I could give it a shot.

Best,
Ryan

Skye Davis

unread,
Aug 8, 2024, 9:22:40 PM8/8/24
to dadi-user
Hi Ryan, no worries at all and thank you so much for helping me with this - what's the best way to send you the files?

Cheers,
Skye

Ryan Gutenkunst

unread,
Aug 8, 2024, 10:08:58 PM8/8/24
to dadi...@googlegroups.com
Hello Skye,

If they’re small enough, email will be fine. Otherwise you can drop them in a Google Drive or some other service and send me a link. (All off list.)

Best,
Ryan

Skye Davis

unread,
Sep 24, 2024, 10:57:38 PM9/24/24
to dadi-user
Hi Ryan,

I'm not sure if I've used the right email - this one is from your University of Arizona page: rgu...@email.arizona.edu is this correct? Let me know if I should resend the email, or use a different email address.

Cheers,

Skye

Ryan Gutenkunst

unread,
Sep 27, 2024, 5:52:49 PM9/27/24
to dadi-user
Hello Skye,

You got the right email address. My apologies for not responding sooner; it fell of my plate. I’ll get back to you as soon as I can.

Best,
Ryan

Reply all
Reply to author
Forward
0 new messages