Version .4

36 views
Skip to first unread message

Jeff Alstott

unread,
Mar 22, 2012, 8:52:58 PM3/22/12
to powerlaw...@googlegroups.com, powerlaw...@googlegroups.com, Neil Gong
Lots of changes in this version:

First and foremost, bug fixes:

Continuous power law, continuous truncated power law, and lognormal discrete all had typos in the equations. Some of them should have been immaterial, but if you have used those distributions, I suggest checking nothing has changed. Sorry these flew under the radar; I don't focus on those distributions in my research, and hadn't implemented tests for them. I have not yet written full tests for all versions of all distributions, and that will be coming in the next version. I wanted to get these revisions out now, though, so that people didn't continue to use buggy code. There might yet be more bugs hiding, and that will be fixed with better unit testing for version .5.

New features:
find_xmin now has the keyword return_all. If return_all=True, then the results of find_xmin are:
xmin, D, alpha, loglikelihood, n_tail, noise_flag, xmins, Ds, alphas, sigmas

Where xmins are all xmins tested, and Ds, alphas, and sigmas are those values for the fits using those xmins.

distribution_fit, Fit, etc. now include the keyword estimate_discrete, which sets the power law calculation for the discrete case to use a equation B.17 from Clauset et al. This is not perfect, but does an ok job and runs much faster than numerical search for the best fit. This particularly matters when searching for the optimal xmin on a large dataset (one with many unique values).

Added stretched exponential, gamma, and negative binomial distributions (untested).

To Do for version .5:
Rock solid testing of distributions for all cases (should have been done in version .1!)
Prettier code, with comments
Pretty documentation


Jeff Alstott

unread,
Mar 23, 2012, 5:02:29 PM3/23/12
to powerlaw...@googlegroups.com, powerlaw...@googlegroups.com, Neil Gong
Turns out it was a false alarm on the lognormal and the truncated power law errors. I fixed the "fixes" to these and incremented to version .4.1 The error in the continuous power law case, however, was real.

The gamma and truncated power law distributions now rely on the mpmath package, as scipy's gamma functions are numerically inaccurate for typical use cases. 

All equations have now been audited several times, so hopefully there will be no more fundamental miscalculation bugs.

Added the option to search for xmin only within a range. Just set xmin=(lower bound, upper bound) or xmin=[lower bound, upper bound].

Jeff Alstott

unread,
Sep 12, 2012, 7:16:25 AM9/12/12
to powerlaw...@googlegroups.com, powerlaw...@googlegroups.com
"All equations have now been audited several times, so hopefully there will be no more fundamental miscalculation bugs."
Nope.

The lognormal distribution in the continuous case was missing a set of parentheses, which resulted in lognormal fits typically outcompeting power law fits in likelihood ratio tests. My own research typically looks at the discrete case, so I didn't notice. Thanks to Davide Cittaro for finding this bug.

This is the sort of situation where it's impossible to write unit tests. When the expected behavior is "match this formula", the only question is whether the coder is capable of converting the formula to code accurately, which one them compares to ... another conversion of the formula.

I thank all those that have helped improve this package by using/testing/breaking it, and apologize that, as beta software, new fixes like this one sometimes still crop up. 

Now on version .4.2. 
Reply all
Reply to author
Forward
0 new messages