Obtain the loglikelihood to select models using Akaike criteria

ary...@gmail.com

unread,

Oct 24, 2017, 8:16:22 AM10/24/17

to powerlaw-general

Thank you for developing this very good package, I am new to phyton so maybe I don't understand things that are very easy. I would like to use the Akaike information criteria [1] to compare and select from different distributions

AICc = (2*k-2*LL)+2*k*(k+1)/(n-k-1)

where k is the number of parameters, LL is the loglikelihood (of the fitted model), and n is the number of data points used in the fitting. I believe that all these things are already calculated in the fitting process but I don't figure out how to get them.

refs:

[1] Burnham, K., Anderson, D., & Huyvaert, K. (2011). AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology, 65(1), 23–35. Retrieved from http://dx.doi.org/10.1007/s00265-010-1029-6

Jeff Alstott

unread,

Oct 24, 2017, 8:36:22 AM10/24/17

to powerlaw...@googlegroups.com

data .= [1,77, ...]

import powerlaw

fit = powerlaw.Fit(data)

n = len(data)

LL_power_law = sum(fit.power_law.loglikelihoods())

LL_exponential = sum(fit.exponential.loglikelihoods())

k_for_power_law = len(fit.power_law.parameters) + 1 #alpha and xmin = 2

k_for_exponential = len(fit.exponential.parameters) + 1 #lambda and xmin = 2

I just did that from memory, so if anything isn't actually right just let me know.

--
You received this message because you are subscribed to the Google Groups "powerlaw-general" group.
To unsubscribe from this group and stop receiving emails from it, send an email to powerlaw-general+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ary...@gmail.com

unread,

Oct 24, 2017, 9:18:02 AM10/24/17

to powerlaw-general

Thanks Jeff,

you need to add the data as a parameter so it is calculated again:

LL_power_law = sum(fit.power_law.loglikelihoods(data))

LL_exponential = sum(fit.exponential.loglikelihoods(data))

in fact the number of data used is not

n = len(data)

because xmin

n = len(data[data>fit.power_law.xmin])

and xmin is not a true parameter because is a restriction on the length of the data set obtained previously to the maximum likelihood estimation, and the same dataset have to be used to fit all the distributions.

What I wonder is that these should be already calculated inside the class and maybe there is a simple way to add a method that uses the internal objects and calculate AICc

anyway with the clues you give me I can calculate what I need, thanks!

To unsubscribe from this group and stop receiving emails from it, send an email to powerlaw-gener...@googlegroups.com.

Jeff Alstott

unread,

Oct 24, 2017, 8:56:22 PM10/24/17

to powerlaw...@googlegroups.com

Good eye!

What you've described are the available methods. Any new internal method would just call these methods. You're very welcome to make a pull request that implements AIC, using these method calls.

BUT!

xmin is weird. It's unclear to me how AIC ought to handle it. It's not a parameter in the conventional sense, but it definitely is a free element that is manipulated in order to maximize the likelihood. But it only maximizes the likelihood of the power law, and not any of the other distributions. If you are interested in AIC, I would look into its derivation in hopes of figuring out what would be the right interpretation.

I would not be surprised if there isn't a right interpretation, because the selection of xmin is rather hack-y. One way of telling the story of the development of the original Clauset et al. methods was creating a way to give the power law the maximum possible edge (by selecting xmin to look at the best possible tail) and still not finding power laws. I doubt AIC was built with this kind of situation in mind.

To unsubscribe from this group and stop receiving emails from it, send an email to powerlaw-general+unsubscribe@googlegroups.com.

ary...@gmail.com

unread,

Oct 26, 2017, 3:42:09 PM10/26/17

to powerlaw-general

Great insight about xmin and AIC!

The xmin issue is controversial. it is clear that you are assuming beforehand that the data comes from a power-law then you select xmin minimizing the distance to a power-law.... the method to choose xmin should be independent of the power-law distribution because in the Clauset way you are biased to find a power-law. A possible simple approach should be to select xmin using all the candidate distribution and then use an average, in this way you are cutting some noisy points not being biased towards a particular model. This would also solve the AICc problem.

for the moment I need to learn more phyton to add methods to your package...

Reply all

Reply to author

Forward