Interpreting mu and sigma for DFE PDFs

5 views
Skip to first unread message

Emma Howell

unread,
Mar 24, 2026, 12:49:09 PM (12 days ago) Mar 24
to dadi-user
Hi dadi team,

I have a question about interpreting mu and sigma parameters for the lognormal DFE PDFs (DFE.PDFs.biv_lognormal and DFE.PDFs.lognormal). These parameters are meant to represent the mean and standard deviation of the lognormal distribution, correct? In other words, a mu=1 and sigma=2 should correspond to a lognormal distribution with an E[gamma]≈20, right?

In the Huang et al. (2021) joint DFE paper, I noticed that you generate cached spectra under gamma ranging from [1e-4, 2000] (the default gamma_bounds for the Cache1D and Cache2D functions). However, some of the simulated "truth" datasets you describe in the paper are generated with a mu=3.6 and sigma=5.1. If mu and sigma are the lognormal mean and standard deviation, then this should give an E[gamma]≈16272710, which far exceeds the upper bound of the cached spectra. Am I misunderstanding how these variables are related to each other?

Thanks for the help!

Best,
Emma

Ryan Gutenkunst

unread,
Mar 24, 2026, 3:06:32 PM (12 days ago) Mar 24
to dadi...@googlegroups.com
Hi Emma,

Your understanding is correct.

For mu=3.6, sigma=5.1, the distribution is extremely long-tailed, so the mean is large, but about 80% of the weight of the distribution is within the simulated gamma bounds (see calculation below). Mutations with gamma > 2000 are treated as if they had gamma=2000, and thus contribute almost nothing to the observed SFS. (On an evolutionary timescale, they’re essentially lethal.)

Best,
Ryan

In [1]: import scipy.stats.distributions as ssd; from numpy import exp


In [2]: ssd.lognorm.mean(2, scale=exp(1))

Out[2]: np.float64(20.085536923187668)


In [3]: ssd.lognorm.mean(5.1, scale=exp(3.6))

Out[3]: np.float64(16272709.519083133)


In [4]: ssd.lognorm.cdf(2000, 5.1, scale=exp(3.6))

Out[4]: np.float64(0.7836238752525825)


--
You received this message because you are subscribed to the Google Groups "dadi-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dadi-user+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dadi-user/afe2f432-833f-4f77-9b38-785edcb26702n%40googlegroups.com.

Emma Howell

unread,
Mar 24, 2026, 3:53:24 PM (12 days ago) Mar 24
to dadi-user
Hi Ryan,

Thanks for your reply! I suppose I wasn't thinking about the fact that the mean is perhaps not the best measure of central tendency for the lognormal distribution.

Part of what spurred this question is that I'm trying to identify the best bounds to use for mu and sigma. Previously, I had been using the constrained optimization function dadi.Inference.optimize_cons() to ensure that the inferred mu and sigma yielded an E[gamma] within the simulated boundaries. I realize now that, given the long tail of the lognormal distribution, this may not be the best approach.

When it comes to constraining mu and sigma (or simply comparing inferred parameters to those in the literature where E[gamma] or E[s] are common summaries of the DFE), do you think it is more appropriate to use the median? 

Best,
Emma

Ryan Gutenkunst

unread,
Mar 24, 2026, 4:49:15 PM (12 days ago) Mar 24
to dadi...@googlegroups.com
Hi Emma,

We typically just constrain the sigma to be positive and not huge (maybe < 10).

In regard to literature comparisons, I’m fond of plotting out the weight of the DFE in various bins like 0 < 1e-2 < 1e-1 < 1e0 < 1e1 < 100 < infinity. That binning might be appropriate for gamma, whereas the right bins for s will depend on your species.

Best,
Ryan

Reply all
Reply to author
Forward
0 new messages