binned data

53 views
Skip to first unread message

elhamva...@gmail.com

unread,
May 20, 2014, 2:26:27 AM5/20/14
to powerlaw...@googlegroups.com
Hi

great work!
thanks!

I have a question for you,

Does power law work for binned data? 
if yes, how can I import data that is binned 
in a way that not only the data is binned, but also the number of observations and binning intervals are not consistent through the data?

my data is about playtime of gamers, for example
number of gamers = X = 2 ---> p(X) = time they played = 10 
X = 5 --->p(X) = 2
X = 1 --->p(X) = 35
X= 1----> p(X) = 43
 .....

do you think I should convert this distribution to the raw data? I mean
playtime of player #1 = 10
playtime of player #2 = 10
playtime of player #3 = 35
playtime of player #4 = 43
playtime of player #5 = 2
playtime of player #6 = 2
playtime of player #7 = 2
playtime of player #8 = 2
playtime of player #9 = 2

and import data =[2,2,2,2,2,10,10,35,43]




Jeff Alstott

unread,
May 20, 2014, 4:57:33 AM5/20/14
to powerlaw...@googlegroups.com
Thanks!

powerlaw does not currently support binned data. If you can convert your data back into its raw form (as you have shown you can do) then I recommend that, as the fitting will be more accurate and conceptually simple. However, I do not follow the particulars of your particular dataset. This part:
X = 1 --->p(X) = 35 
X= 1----> p(X) = 43
confuses me. But if you understand the data and are confident that you can convert it into a simple array of observations, then you should be good to go!

Note, there are now some methods developed by Virkar and Clauset on the subject of fitting on binned data, but I have not looked at them. If you look into them and implement them in Python, let me know and we could include the material in powerlaw.

Good luck!
Jeff


--
You received this message because you are subscribed to the Google Groups "powerlaw-general" group.
To unsubscribe from this group and stop receiving emails from it, send an email to powerlaw-gener...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

elhamva...@gmail.com

unread,
May 20, 2014, 3:33:02 PM5/20/14
to powerlaw...@googlegroups.com
thanks for prompt response,
I will let you know if I implement their methods.

I have tried for the raw data
results = powerlaw.Fit(data)
print results.power_law.alpha
print results.power_law.xmin
R, p = results.distribution_compare('power_law', 'lognormal')


but I have got the following errors
do you have any idea why?

Calculating best minimal value for power law fit
/usr/local/lib/python2.7/dist-packages/powerlaw.py:686: RuntimeWarning: invalid value encountered in divide
  (Theoretical_CDF * (1 - Theoretical_CDF))
2.13560411943
35.0
/usr/local/lib/python2.7/dist-packages/powerlaw.py:803: RuntimeWarning: invalid value encountered in multiply
  likelihoods = f*C
/usr/lib/python2.7/dist-packages/scipy/optimize/optimize.py:447: RuntimeWarning: invalid value encountered in subtract
  and max(abs(fsim[0] - fsim[1:])) <= ftol):
/usr/local/lib/python2.7/dist-packages/powerlaw.py:686: RuntimeWarning: divide by zero encountered in divide
  (Theoretical_CDF * (1 - Theoretical_CDF))
/usr/local/lib/python2.7/dist-packages/powerlaw.py:1611: RuntimeWarning: invalid value encountered in subtract
  ( (loglikelihoods1-loglikelihoods2) - mean_diff)**2



--
You received this message because you are subscribed to a topic in the Google Groups "powerlaw-general" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/powerlaw-general/JuTMJGR5SfI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to powerlaw-gener...@googlegroups.com.

Jeff Alstott

unread,
May 20, 2014, 5:28:57 PM5/20/14
to powerlaw...@googlegroups.com
You likely have a really bad fit to a lognormal. What are the values of R and p?

elhamva...@gmail.com

unread,
May 20, 2014, 7:51:30 PM5/20/14
to powerlaw...@googlegroups.com
as you see, it has yield and didn't get to the point to calculate the R and P.
how do you conclude that it doesn't fit to lognormal?

elhamva...@gmail.com

unread,
May 20, 2014, 8:02:20 PM5/20/14
to powerlaw...@googlegroups.com
oh, sorry.
my data was discrete
now I set the discrete to True
and R = 589.047451769
and p = 2.42216154736e-81

elhamva...@gmail.com

unread,
May 20, 2014, 8:09:26 PM5/20/14
to powerlaw...@googlegroups.com
so what does p show?
as R is positive , my data  fits to power low
I am not sure how to interpret R, though.

Jeff Alstott

unread,
May 21, 2014, 12:23:10 PM5/21/14
to powerlaw...@googlegroups.com
Glad it now works for you!

Check out the paper, specifically the section starting "R is the loglikelihood ratio between the two candidate distributions...". The p value shows the statistical significance of the sign of the R.

elhamva...@gmail.com

unread,
May 21, 2014, 12:36:33 PM5/21/14
to powerlaw...@googlegroups.com

Thank you so much!

Reply all
Reply to author
Forward
0 new messages