[Numpy-discussion] Fitting Log normal or truncated log normal distibution to three points

1,453 views
Skip to first unread message

Sachin Kumar Sharma

unread,
Jun 30, 2011, 4:03:09 AM6/30/11
to Discussion of Numerical Python

Hi,

 

I have three points 10800, 81100, 582000.

 

What is the easiest way of fitting a log normal and truncated log normal distribution to these three points using numpy.

 

I would appreciate your reply for the same.

 

Cheers

 

Sachin

 

************************************************************************
Sachin Kumar Sharma

Senior Geomodeler

 

Christoph Deil

unread,
Jun 30, 2011, 5:25:49 AM6/30/11
to Discussion of Numerical Python

On Jun 30, 2011, at 10:03 AM, Sachin Kumar Sharma wrote:

Hi,
 
I have three points 10800, 81100, 582000.
 
What is the easiest way of fitting a log normal and truncated log normal distribution to these three points using numpy.

The lognormal and maximum likelihood fit is available in scipy.stats. It's easier to use this than to implement your own fit in numpy, although if you can't use scipy for some reason you can have a look there on how to implement it in numpy.

The rest of this reply uses scipy.stats, so if you are only interested in numpy, please stop reading.

>>> data = [10800, 81100, 582000]
>>> import scipy.stats
>>> print scipy.stats.lognorm.extradoc


Lognormal distribution

lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2)
for x > 0, s > 0.

If log x is normally distributed with mean mu and variance sigma**2,
then x is log-normally distributed with shape paramter sigma and scale
parameter exp(mu).


>>> scipy.stats.lognorm.fit(data)   
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in subtract
  and max(abs(fsim[0]-fsim[1:])) <= ftol):
/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in absolute
  and max(abs(fsim[0]-fsim[1:])) <= ftol):
(1.0, 30618.493505379971, 117675.94879488947)
>>> scipy.stats.lognorm.fit(data, floc=0, fscale=1)   
[11.405078125000022, 0, 1]

See http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html for further info on how the location and scale parameter are handled, basically the x in the lognormal.pdf formula above is x = (x_actual - loc) / scale.

Now how to fit a truncated lognorm?
First note that because log(x) can only be computed for x>0, scipy.stats.lognorm is already truncated to x > 0.

>>> scipy.stats.lognorm.cdf(0, 1) # arguments: (x, s)
0.0
>>> scipy.stats.lognorm.cdf(1, 1) # arguments: (x, s)
0.5
>>> scipy.stats.lognorm.cdf(2, 1) # arguments: (x, s)
0.75589140421441725

As documented here, you can use parameters a and b to set the support of the distribution (i.e. the lower and upper truncation points):
>>> help(scipy.stats.rv_continuous)
 |  a : float, optional
 |      Lower bound of the support of the distribution, default is minus
 |      infinity.
 |  b : float, optional
 |      Upper bound of the support of the distribution, default is plus
 |      infinity.

However when I try to use the a, b parameters to call pdf() (as a simpler method than fit() to check if it works)  I run into the following problem:

>>> scipy.stats.lognorm.pdf(1, 1)
0.3989422804014327
>>> scipy.stats.lognorm(a=1).pdf(1, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: pdf() takes exactly 2 arguments (3 given)
>>> scipy.stats.lognorm(a=1).pdf(1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 335, in pdf
    return self.dist.pdf(x, *self.args, **self.kwds)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 1113, in pdf
    place(output,cond,self._pdf(*goodargs) / scale)
TypeError: _pdf() takes exactly 3 arguments (2 given)

For a distribution without parameters besides (loc, scale), setting a works:
>>> scipy.stats.norm(a=-2).pdf(3)
0.0044318484119380075

Is this a bug or am I simply using it wrong?
It would be nice if http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html contained an example or two of how to use the a, b, xa, xb, xtol parameters of scipy.stats.rv_continuous .

Christoph

josef...@gmail.com

unread,
Jun 30, 2011, 8:49:36 AM6/30/11
to Discussion of Numerical Python

in your last example a=-2 is ignored

>>> stats.norm.pdf(3)
0.0044318484119380075

a, b, xa, xb are attributes of the distribution that are set when the
distribution instance is created.

changing a and b doesn't make much sense since you would truncate the
support of the distribution without changing the distribution

>>> stats.norm.pdf(-3)
0.0044318484119380075
>>> stats.norm.a = -2
>>> stats.norm.pdf(-3)
0.0

stats.norm and the other distributions is an instances, and if you
change a in it it will stay this way, and might mess up other
calculations you might want to do.

xa, xb are only used internally as limits for the inversion of the cdf
(limits to fsolve) and I don't think they have any other effect.

what's a truncated lognormal ? left or right side truncated.
I think it might need to be created by subclassing.

there might be a direct way of estimating lognormal parameters from
log(x).mean() and log(x).std() if x is a lognormal sample.

(that's a scipy user question)

Josef


> Christoph
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Di...@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Di...@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply all
Reply to author
Forward
0 new messages