It would need a new method for each distribution, e.g. _loglike, _logpdf
So, this is work, and for some distributions the log wouldn't simplify much.
I proposed this once together with other improvements (but without response).
The second useful method for estimation would be _fitstart, which
provides distribution specific starting values for fit, e.g. a moment
estimator, or a simple rules of thumb
http://projects.scipy.org/scipy/ticket/808
Here are some of my currently planned enhancements to the distributions:
but I just checked, it looks like I forgot to copy the _loglike method
that I started from my experimental scripts.
For a few distributions, where this is possible, it would also be
useful to add the gradient with respect to the parameters, (or even
the Hessian). But this is currently mostly just an idea, since we need
some analytical gradients in the estimation of stats models.
>
> If there is not is it possible for me to suggest that this feature is added.
> There is such an excellent range of distributions, each with such an
> impressive range of options, it seems ashame to have to mostly manually code
> up the log of pdfs and often call the log of CDFs from R.
So far I only thought about log pdf, because I wanted it for Maximum
Likelihood estimation.
Do you have a rough idea for which distributions log cdf would work?
that is, for which distribution is an analytical or efficient
numerical expression possible.
I also think that scipy.stats.distributions could be one of the best
(broadest, consistent) collection of univariate distributions that I
have seen so far, once we fill in some missing pieces.
As a way forward, I think we could make the distributions into a
numerical encyclopedia by adding private methods to those
distributions where it makes sense, like log pdf, log cdf and I also
started to add characteristic functions to some distributions in my
experimental scripts.
If you have a collection of logpdf, logcdf, we could add a trac ticket for this.
However, this would miss the generic broadcasting part of the public
functions, pdf, cdf,... but for estimation I wouldn't necessarily call
those because of the overhead.
I'm working on and off on this, so it's moving only slowly (and my
wishlist is big).
(for example, I was reading up on extreme value distributions in
actuarial science and hydrology to get a better overview over the
estimators.)
So, I really love to hear any ideas, feedback, and see contributions
to improving the distributions.
Josef
>
> Thanks,
> Chris.
>
> _______________________________________________
> SciPy-User mailing list
> SciPy...@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
It would need a new method for each distribution, e.g. _loglike, _logpdf
So, this is work, and for some distributions the log wouldn't simplify much.
I proposed this once together with other improvements (but without response).
The second useful method for estimation would be _fitstart, which
provides distribution specific starting values for fit, e.g. a moment
estimator, or a simple rules of thumb
http://projects.scipy.org/scipy/ticket/808
Here are some of my currently planned enhancements to the distributions:
http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
but I just checked, it looks like I forgot to copy the _loglike method
that I started from my experimental scripts.
For a few distributions, where this is possible, it would also be
useful to add the gradient with respect to the parameters, (or even
the Hessian). But this is currently mostly just an idea, since we need
some analytical gradients in the estimation of stats models.
So far I only thought about log pdf, because I wanted it for Maximum
>
> If there is not is it possible for me to suggest that this feature is added.
> There is such an excellent range of distributions, each with such an
> impressive range of options, it seems ashame to have to mostly manually code
> up the log of pdfs and often call the log of CDFs from R.
Likelihood estimation.
Do you have a rough idea for which distributions log cdf would work?
that is, for which distribution is an analytical or efficient
numerical expression possible.
I also think that scipy.stats.distributions could be one of the best
(broadest, consistent) collection of univariate distributions that I
have seen so far, once we fill in some missing pieces.
As a way forward, I think we could make the distributions into a
numerical encyclopedia by adding private methods to those
distributions where it makes sense, like log pdf, log cdf and I also
started to add characteristic functions to some distributions in my
experimental scripts.
If you have a collection of logpdf, logcdf, we could add a trac ticket for this.
I don't think for many use cases log(stats.t.pdf) or many other
distributions the performance and accuracy hit would be large enough
to make it useless. At least, I haven't seen any other comments in
this direction.
On of the main use cases for me of stats.distributions are all the
statistical test distributions, t, F, chi2 and so on. Howver, in
statsmodels we have a mixture of calls to the pdf/cdf of
stats.distributions and reimplementations of loglikelhood functions,
where the scipy version is also just used for testing.
>
>
>>
>> The second useful method for estimation would be _fitstart, which
>> provides distribution specific starting values for fit, e.g. a moment
>> estimator, or a simple rules of thumb
>> http://projects.scipy.org/scipy/ticket/808
>>
>>
>> Here are some of my currently planned enhancements to the distributions:
>>
>>
>> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
>>
>> but I just checked, it looks like I forgot to copy the _loglike method
>> that I started from my experimental scripts.
>>
>> For a few distributions, where this is possible, it would also be
>> useful to add the gradient with respect to the parameters, (or even
>> the Hessian). But this is currently mostly just an idea, since we need
>> some analytical gradients in the estimation of stats models.
>>
> This certainly would be nice as well.
>>
>> >
>> > If there is not is it possible for me to suggest that this feature is
>> > added.
>> > There is such an excellent range of distributions, each with such an
>> > impressive range of options, it seems ashame to have to mostly manually
>> > code
>> > up the log of pdfs and often call the log of CDFs from R.
>>
>> So far I only thought about log pdf, because I wanted it for Maximum
>> Likelihood estimation.
>>
> It is also necessary for MCMC.
pymc has many distributions with loglike in fortran for speed, but for
most distributions only loglike and rvs are defined, if I remember
correctly.
>
>>
>> Do you have a rough idea for which distributions log cdf would work?
>> that is, for which distribution is an analytical or efficient
>> numerical expression possible.
>
> Not sure off the top of my head as I mainly require the only the pdf. I was,
> however, doing a little survival analysis the other day though and it was
> required. The log of the survival and hazard functions would be nice also.
> So far I have only required the exponential (analytical), weibull
> (analytical), normal (numerical) and powernormal (analytical function of the
> log of the normal cdf). I just had a peak at the R source code for pnorm
> (R's code for the normal cdf). The function is not big and also licensed
> under the GNU public licence. I assume it could be fairly easily ported to
> scipy.
R's license, GPL, is incompatible with the license of scipy, BSD.
While they are allowed to look at our code, code that goes into scipy
cannot be based on GPL licensed code.
If never seen it mentioned before that there is a direct function for
log(norm.cdf). Which functions and packages in R implement the
logarithm of the cdf of these distributions?
The cdf for several distributions (including normal) is implement in
Fortran or C in scipy.special, and I've never seen a log version for
them.
>>
>> I also think that scipy.stats.distributions could be one of the best
>> (broadest, consistent) collection of univariate distributions that I
>> have seen so far, once we fill in some missing pieces.
>>
>> As a way forward, I think we could make the distributions into a
>> numerical encyclopedia by adding private methods to those
>> distributions where it makes sense, like log pdf, log cdf and I also
>> started to add characteristic functions to some distributions in my
>> experimental scripts.
>> If you have a collection of logpdf, logcdf, we could add a trac ticket for
>> this.
>
> I could fairly easy whip up a collection of functions to compute the logpdf
> for a large number of distributions. Not sure about the CDFs but I can look
> into it as well. The pdf's are definitely far more urgent for my own work. I
> am a bit busy at work though for the next three weeks so it would have to be
> after that.
I looked at some of the distributions, and logpdf could be more
efficiently calculated in many of them and very often also logcdf
I opened a ticket for this
http://projects.scipy.org/scipy/ticket/1184
I also saw that there are still smaller, numerical improvements
possible in several distributions.
Thanks,
Josef
>>
>> However, this would miss the generic broadcasting part of the public
>> functions, pdf, cdf,... but for estimation I wouldn't necessarily call
>> those because of the overhead.
>>
>>
>> I'm working on and off on this, so it's moving only slowly (and my
>> wishlist is big).
>> (for example, I was reading up on extreme value distributions in
>> actuarial science and hydrology to get a better overview over the
>> estimators.)
>>
>>
>> So, I really love to hear any ideas, feedback, and see contributions
>> to improving the distributions.
>>
>> Josef
>>
>>
>
"Useless" is a value judgement, of course, but it doesn't seem *too*
far off to me either. I myself basically always find myself wanting
log-space values, and even if you're just doing statistical tests,
numerical precision in the tails can become very practically relevant
when doing multiple hypothesis correction.
> R's license, GPL, is incompatible with the license of scipy, BSD.
> While they are allowed to look at our code, code that goes into scipy
> cannot be based on GPL licensed code.
You mean, they're allowed to copy our code, and we're allowed to look
at their code for reference but can't use it directly :-).
> If never seen it mentioned before that there is a direct function for
> log(norm.cdf). Which functions and packages in R implement the
> logarithm of the cdf of these distributions?
>
> The cdf for several distributions (including normal) is implement in
> Fortran or C in scipy.special, and I've never seen a log version for
> them.
Yet R does in fact use specialized code for computing the log-cdf for
the normal distribution... at least over some parts of its range. I'm
not sure how much difference it makes or anything, I'm just reporting
on the existence of 'if' statements in the source :-). See the base R
distribution, src/nmath/pnorm.c (which also contains references).
-- Nathaniel
I don't think for many use cases log(stats.t.pdf) or many other
distributions the performance and accuracy hit would be large enough
to make it useless. At least, I haven't seen any other comments in
this direction.
On of the main use cases for me of stats.distributions are all the
statistical test distributions, t, F, chi2 and so on. Howver, in
statsmodels we have a mixture of calls to the pdf/cdf of
stats.distributions and reimplementations of loglikelhood functions,
where the scipy version is also just used for testing.
R's license, GPL, is incompatible with the license of scipy, BSD.
While they are allowed to look at our code, code that goes into scipy
cannot be based on GPL licensed code.
If never seen it mentioned before that there is a direct function for
log(norm.cdf). Which functions and packages in R implement the
logarithm of the cdf of these distributions?
The cdf for several distributions (including normal) is implement in
Fortran or C in scipy.special, and I've never seen a log version for
them.
I looked at some of the distributions, and logpdf could be more
efficiently calculated in many of them and very often also logcdf
http://projects.scipy.org/scipy/ticket/1184
I opened a ticket for this
I also saw that there are still smaller, numerical improvements
possible in several distributions.
Thanks,
Josef
We are allowed to look at their manuals but not their code.
(Life ain't fair.)
>
>> If never seen it mentioned before that there is a direct function for
>> log(norm.cdf). Which functions and packages in R implement the
>> logarithm of the cdf of these distributions?
>>
>> The cdf for several distributions (including normal) is implement in
>> Fortran or C in scipy.special, and I've never seen a log version for
>> them.
>
> Yet R does in fact use specialized code for computing the log-cdf for
> the normal distribution... at least over some parts of its range. I'm
> not sure how much difference it makes or anything, I'm just reporting
> on the existence of 'if' statements in the source :-). See the base R
> distribution, src/nmath/pnorm.c (which also contains references).
pnorm is the cdf not the log of the cdf, that's what I thought,
but I just saw that they have a "log.p" option.
from the R manual:
"""
For pnorm, based on
Cody, W. D. (1993) Algorithm 715: SPECFUN – A portable FORTRAN package
of special function routines and test drivers
"""
this sounds similar to the fortran or c code that scipy.special has. I
never tried to read that one, except for the doc comments.
accuracy doesn't seem to be a problem
np.log(stats.norm.cdf(np.linspace(-20,20,21))) - [r.pnorm(x,
log_p=True) for x in np.linspace(-20,20,21)]
array([ -2.84217094e-14, -2.84217094e-14, -2.84217094e-14,
0.00000000e+00, -1.42108547e-14, -7.10542736e-15,
-7.10542736e-15, -3.55271368e-15, -1.77635684e-15,
-4.44089210e-16, 0.00000000e+00, 0.00000000e+00,
-5.42101086e-20, -5.53867815e-17, -4.40377573e-17,
7.61985302e-24, 1.77648211e-33, 7.79353682e-45,
6.38875440e-58, 9.74094892e-73, 2.75362412e-89])
except the small numbers in the tail look much better in R
>>> np.log(stats.norm.cdf(np.linspace(-20,20,21)))
array([ -2.03917155e+02, -1.65812373e+02, -1.31695396e+02,
-1.01563034e+02, -7.54106730e+01, -5.32312852e+01,
-3.50134372e+01, -2.07367689e+01, -1.03601015e+01,
-3.78318433e+00, -6.93147181e-01, -2.30129093e-02,
-3.16717434e-05, -9.86587701e-10, -6.66133815e-16,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00])
>>> np.array([r.pnorm(x, log_p=True) for x in np.linspace(-20,20,21)])
array([ -2.03917155e+02, -1.65812373e+02, -1.31695396e+02,
-1.01563034e+02, -7.54106730e+01, -5.32312852e+01,
-3.50134372e+01, -2.07367689e+01, -1.03601015e+01,
-3.78318433e+00, -6.93147181e-01, -2.30129093e-02,
-3.16717434e-05, -9.86587646e-10, -6.22096057e-16,
-7.61985302e-24, -1.77648211e-33, -7.79353682e-45,
-6.38875440e-58, -9.74094892e-73, -2.75362412e-89])
except if we use a branch cut
>>> np.log1p(-stats.norm.sf(np.linspace(-20,20,21)))
array([ -Inf, -Inf, -Inf,
-Inf, -Inf, -Inf,
-3.49450411e+01, -2.07367689e+01, -1.03601015e+01,
-3.78318433e+00, -6.93147181e-01, -2.30129093e-02,
-3.16717434e-05, -9.86587646e-10, -6.22096057e-16,
-7.61985302e-24, -1.77648211e-33, -7.79353682e-45,
-6.38875440e-58, -9.74094892e-73, -2.75362412e-89])
I have no idea about speed.
Josef
I agree that it would be very good to have this generally available,
and I will appreciate it for maximum likelihood.
For MCMC (where I know only little about the details), it might,
however, always be faster to work with dedicated code as in pymc.
>
>>
>> R's license, GPL, is incompatible with the license of scipy, BSD.
>> While they are allowed to look at our code, code that goes into scipy
>> cannot be based on GPL licensed code.
>>
> Fair enough. Still at least for the normal cdf we could simply use the
> references in the R code to write a Scipy version.
If it's the C or Fortran implementation, then it is out of my
competence, I'm a pure scripting language person.
Another idea for this would be to see if any of the pymc code for this
would fit into scipy. Since I leave Fortran to others, I never looked
at it.
I think if we get the easier cases, logpdf and logcdf that don't
require compiled versions, we would be able to cover already a
considerable range of the distributions.
However, I also agree now, having norm.logcdf would also be useful for
many other distributions.
>
>>
>> If never seen it mentioned before that there is a direct function for
>> log(norm.cdf). Which functions and packages in R implement the
>> logarithm of the cdf of these distributions?
>
> pnorm it is in the stats package for the log of the normal CDF. Kind of
> essential for distributions like the powernormal as well that use the normal
> cdf as a part of their pdf.
see previous message, I never paid enough attention to see the log.p option
Josef
>>
>> The cdf for several distributions (including normal) is implement in
>> Fortran or C in scipy.special, and I've never seen a log version for
>> them.
>>
>> I looked at some of the distributions, and logpdf could be more
>> efficiently calculated in many of them and very often also logcdf
>>
>> I opened a ticket for this
>> http://projects.scipy.org/scipy/ticket/1184
>>
>> I also saw that there are still smaller, numerical improvements
>> possible in several distributions.
>>
>> Thanks,
>>
>> Josef
>>
>> ______________________________________________
>> SciPy-User mailing list
>> SciPy...@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy...@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>
_______________________________________________
I'm contradicting and confusing myself, I don't think pymc has any cdf
code, only pdf.
Josef
Josef
>
> I think if we get the easier cases, logpdf and logcdf that don't
> require compiled versions, we would be able to cover already a
> considerable range of the distributions.
>
> However, I also agree now, having norm.logcdf would also be useful for
> many other distributions.
>
I'm busy for another month.
Just a warning in case you don't know: scipy is still stuck at fortran
77 because some platforms (e.g. Windows with mingw - which I use)
support only g77. I don't know when the upgrade will happen.
cython would be an alternative that's easier to maintain.
Josef
I'm busy for another month.On Sat, May 29, 2010 at 12:32 AM, Chris Strickland
<christopherm...@gmail.com> wrote:
>
>
> On Sat, May 29, 2010 at 2:20 PM, <josef...@gmail.com> wrote:
>>
>> Josef
>>
>> >
>> > I think if we get the easier cases, logpdf and logcdf that don't
>> > require compiled versions, we would be able to cover already a
>> > considerable range of the distributions.
>> >
>> > However, I also agree now, having norm.logcdf would also be useful for
>> > many other distributions.
>> >
>
> I can write in C and Fortran (I prefer Fortran with Python) so I could
> easily write code ( in around three weeks when my workload reduces) for
> cases where compiled languages are required.
Just a warning in case you don't know: scipy is still stuck at fortran
77 because some platforms (e.g. Windows with mingw - which I use)
support only g77. I don't know when the upgrade will happen.
cython would be an alternative that's easier to maintain.
Josef
Thanks, I will have a look at the gradient branch
To get started, I added a test script to the ticked that makes it
easier to add and test new methods for lnpdf and lncdf. It's adapted
from the scipy tests and tests all distributions that have a _lnpdf or
_lncdf method.
The new methods can just be added to the script to monkey patch
scipy.stats.distributions.
I monkey patched 13 easy cases so far, mainly to check that the script
works. (Still far too go for full coverage of cases where this makes
sense.)
The tests use nosetests and test for (almost) equality of the new
methods with the log of the old methods, and check a simple
broadcasting case.
Everyone is invited to add new cases, and to report any problems with
the script.
I hope that helps to get the ball rolling.
Josef
>
> Hope that is useful.
Presumably it would be easy to add a method to all distributions that
called the pdf and took its log. This could be over-riden for those
distributions for which a specialised log_pdf implementation was
available. This would make the entry cost of providing the functionality
lower.
Yes, I haven't thought about it yet for this case, but that's how the
system for the current methods works, only _pdf or _cdf is required,
all other methods have generic substitutes (which are sometimes very
slow.)
For testing, not having the generic version is easier, I have to
figure out again how to check whether a method was defined in the
super or the sub class (instead of using hasattr).
a naming question
_lnpdf or _logpdf ? _lncdf or _logcdf ?
Josef
My vote would be for log over ln since np.log(np.e) == 1.
Skipper
Yes, I don't know what I was thinking early in the morning, nobody
seems to use ln anymore
I edited the script
rename ln ->log
make print optional
add generic method
replace hasattr by (not sure this is the best way)
def isnotgeneric(distfn, methodname):
sub = getattr(distfn, methodname)
gen = getattr(stats.distributions.rv_continuous, methodname)
return not sub.im_func is gen.im_func
(generic methods don't pass the tests for all distributions, there are
some problems with broadcasting in the current _pdf, _cdf
implementations for some distributions)
Josef
It sounds like you guys have this well in hand, but just a point here
-- you certainly are allowed to look at their code, just not copy the
"expressive aspects" of it. (Saying you can't *look* at it because of
the license is like saying writers can't read other people's novels!)
"Expressive" is a tricky term, of course -- IIUC it's basically
anything that could be changed while preserving functionality (because
the functionality, the algorithm itself, is not covered by copyright).
So, say, variable names certainly count as expressive, decisions about
which way to lay out the code, etc. If one wants to be really safe,
one can write down a textual description of the algorithm and then ask
someone else to translate back to code (the "clean room" method).
So you do have to be a bit careful, but when you have code that
contains valuable information that isn't really written down anywhere
else then I'd say it's worth it.
> On Fri, May 28, 2010 at 7:29 AM, Chris Strickland
> <christopherm...@gmail.com> wrote:
>> Hi,
>>
>> When using any of the distributions of scipy.stats there does not seem to be
>> the ability (or at least I cannot figure out how) to have the function
>> return
>> the log of the pdf, cdf, sf, etc. For statistical analysis this is
>> essential.
>> For instance suppose we are interested in an exponential distribution for a
>> random variable x with a hyperparameter lambda there needs to be an option
>> that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
>> calculate log(scipy.stats.expon.pdf(x,lambda)).
>>
>> Is there a way to do this using the distributions in scipy.stats?
>
> It would need a new method for each distribution, e.g. _loglike, _logpdf
> So, this is work, and for some distributions the log wouldn't simplify much.
>
> I proposed this once together with other improvements (but without response).
>
> The second useful method for estimation would be _fitstart, which
> provides distribution specific starting values for fit, e.g. a moment
> estimator, or a simple rules of thumb
> http://projects.scipy.org/scipy/ticket/808
>
>
> Here are some of my currently planned enhancements to the distributions:
>
> http://bazaar.launchpad.net/~scipystats/statsmodels/trunk/annotate/head:/scikits/statsmodels/sandbox/stats/distributions_patch.py
Hey Josef,
I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution).
I also added your _fitstart suggestion. I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.
Do you have updated code I could look at. These are relatively easy adds that I would like to put in today. Do you have check-in rights to SciPy?
Thanks,
-Travis
---
Travis Oliphant
Enthought, Inc.
olip...@enthought.com
1-512-536-1057
http://www.enthought.com
I just committed the changes for the _logpdf, ..., I didn't see any
changes of yours in the timeline, nor in svn changes, plus a fix to
internal wrapcauchy_cdf
generic _logpdf, logcdf and the 13 cases of my test script are in svn
Josef
Josef
I would like to get the private _logpdf in a useful (vectorized or
broadcastable) version because for estimation and optimization, I want
to avoid the logpdf overhead. So, my testing will be on the underline
versions.
>
> I also added your _fitstart suggestion. I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.
I have written a semi-frozen fit function and posted to the mailing
list a long time ago, but since I'm not sure about the API and I'm
expanding to several new estimators, I kept this under
work-in-progress.
Similar _fitstart might need extra options, for estimation when some
parameters are fixed, e.g. there are good moment estimators that work
when some of the parameters (e.g. loc or scale) are fixed. Also
_fitstart is currently used only by my fit_frozen.
I was hoping to get this done this year, maybe together with the
enhancements that Per Brodtkorb proposed two years ago, e.g. Method of
Maximum Spacings.
I also have a Generalized Method of Moments estimator based on
matching quantiles and moments in the works.
So, I don't want yet to be pinned down with any API for the estimation
enhancements.
Josef
Thanks, this is useful to know. I've always erred on the side of
caution and just compared the results of functions/algorithms that
*should* be the same vs, say, R, but if I could do this and then look
at implementation details this could relieve substantial headaches.
It still seems like such a fine line though.
Skipper
This is exactly the problem. I don't think the R community is
particularly litigious, but as a rule of thumb, doing something that
is technically legal but for which the legality is subtle opens one up
to lawsuits. The problem is that even when you are right, a lawsuit is
tremendously destructive. So things that are legal but subtle should
probably be avoided by a group as penniless as the community as scipy
developers. So it's probably better to just not read their source
code.
Anne
On Sat, May 29, 2010 at 4:51 PM, Travis Oliphant <olip...@enthought.com> wrote:I would like to get the private _logpdf in a useful (vectorized or
>
> Hey Josef,
>
> I've been playing with distributions.py today and added logpdf, logcdf, logsf methods (based on _logpdf, _logcdf, _logsf methods in each distribution).
broadcastable) version because for estimation and optimization, I want
to avoid the logpdf overhead. So, my testing will be on the underline
versions.
I have written a semi-frozen fit function and posted to the mailing
>
> I also added your _fitstart suggestion. I would like to do something like your nnlf_fit method that allows you to fix some parameters and only solve for others, but I haven't thought through all the issues yet.
list a long time ago, but since I'm not sure about the API and I'm
expanding to several new estimators, I kept this under
work-in-progress.
Similar _fitstart might need extra options, for estimation when some
parameters are fixed, e.g. there are good moment estimators that work
when some of the parameters (e.g. loc or scale) are fixed. Also
_fitstart is currently used only by my fit_frozen.
I was hoping to get this done this year, maybe together with the
enhancements that Per Brodtkorb proposed two years ago, e.g. Method of
Maximum Spacings.
I also have a Generalized Method of Moments estimator based on
matching quantiles and moments in the works.
So, I don't want yet to be pinned down with any API for the estimation
enhancements.
When using any of the distributions of scipy.stats there does not seem to be
the ability (or at least I cannot figure out how) to have the function return
the log of the pdf, cdf, sf, etc. For statistical analysis this is essential.
For instance suppose we are interested in an exponential distribution for a
random variable x with a hyperparameter lambda there needs to be an option
that returns -log(lambda)-x/lambda. It is not sufficient (numerically) to
calculate log(scipy.stats.expon.pdf(x,lambda)).
Is there a way to do this using the distributions in scipy.stats?
If there is not is it possible for me to suggest that this feature is added.
There is such an excellent range of distributions, each with such an
impressive range of options, it seems ashame to have to mostly manually code
up the log of pdfs and often call the log of CDFs from R.
Thanks,
Chris.