Can I directly supply histogram as opposed to list of data to powerlaw?

25 views
Skip to first unread message

phili...@aol.com

unread,
Sep 30, 2016, 8:34:50 AM9/30/16
to powerlaw-general
Hi,

I have a network for which I have computed the shortest path length combinations for all my different vertices which I have saved in a histogram. I now want to analyse the distribution using powerlaw. Is it possible to supply the histogram directly to the package to do the analysis based on that? From what I have seen so far I have only found ways of analysing it by first converting the histogram into a long list of values based on which powerlaw then generates a histogram again (suing genfromtxt). Being able to avoid the list generation step would probably make things a fair bit more efficient for me (creating a list with billions of values isn't great on my RAM and writing it directly to the drive isn't very efficient).

Jeff Alstott

unread,
Sep 30, 2016, 11:18:03 AM9/30/16
to powerlaw...@googlegroups.com
That is a whoooole other thing, as discussed previously here:

The statistics of fitting binned data are different from those of fitting raw data. 

"there are now some methods developed by Virkar and Clauset on the subject of fitting on binned data, but I have not looked at them. If you look into them and implement them in Python, let me know and we could include the material in powerlaw."

On Fri, Sep 30, 2016 at 8:34 AM, philipp2503 via powerlaw-general <powerlaw...@googlegroups.com> wrote:
Hi,

I have a network for which I have computed the shortest path length combinations for all my different vertices which I have saved in a histogram. I now want to analyse the distribution using powerlaw. Is it possible to supply the histogram directly to the package to do the analysis based on that? From what I have seen so far I have only found ways of analysing it by first converting the histogram into a long list of values based on which powerlaw then generates a histogram again (suing genfromtxt). Being able to avoid the list generation step would probably make things a fair bit more efficient for me (creating a list with billions of values isn't great on my RAM and writing it directly to the drive isn't very efficient).

--
You received this message because you are subscribed to the Google Groups "powerlaw-general" group.
To unsubscribe from this group and stop receiving emails from it, send an email to powerlaw-general+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

phili...@aol.com

unread,
Sep 30, 2016, 11:30:56 AM9/30/16
to powerlaw-general
OK. I have integer data binned into bins of length one so conversion back into raw data is straightforward requiring no particular efforst on reconstruction so I shall then just stick with that. Thank you for the response!

phili...@aol.com

unread,
Sep 30, 2016, 1:03:03 PM9/30/16
to powerlaw-general, phili...@aol.com
Looking at my data a bit more closely the histogram looks something like this:
[array([  0.00000000e+00,   1.83413630e+07,   1.74493106e+09,
          7.91390628e+10,   4.54474023e+11,   5.38810039e+11,
          3.01718080e+11,   1.38440761e+11,   6.17865624e+10,
          2.77457730e+10,   1.32412328e+10,   6.71579967e+09,
          3.35556066e+09,   2.00513046e+09,   1.18435261e+09,
          7.34440685e+08,   5.13846805e+08,   3.97894623e+08,
          1.97770421e+08,   1.11546165e+08,   6.63624300e+07,
          3.93196820e+07,   2.81038760e+07,   1.87733930e+07,
          1.57307950e+07,   1.55162030e+07,   1.38710060e+07,
          3.52969100e+06,   2.32881000e+05,   5.32210000e+04,
          1.59100000e+04,   4.89700000e+03,   1.61300000e+03,
          6.54000000e+02,   2.63000000e+02,   1.08000000e+02,
          3.10000000e+01,   8.00000000e+00,   4.00000000e+00,
          2.00000000e+00]),
 array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37, 38, 39, 40], dtype=uint64)]

The first array gives the counts and the second one the bin edges. In histogram form not too bad to deal with as a list though it gets incredibly unwieldy very quickly.

phili...@aol.com

unread,
Oct 1, 2016, 12:01:12 PM10/1/16
to powerlaw-general, phili...@aol.com
Having played around with this for a while I can't come up with a good way of feeding this data to powerlaw without binning it as the physical size of the file would simply get too big. Is there any way I could trick powerlaw into accepting it as a histogram?

Jeff Alstott

unread,
Oct 1, 2016, 12:34:54 PM10/1/16
to powerlaw...@googlegroups.com, phili...@aol.com
Nope. They are totally different methods. Look at the other paper referenced for methods for fitting a power law using a histogram. It would be great if someone implemented those methods for powerlaw, but for now it doesn't have them. 


On Saturday, October 1, 2016, philipp2503 via powerlaw-general <powerlaw...@googlegroups.com> wrote:
Having played around with this for a while I can't come up with a good way of feeding this data to powerlaw without binning it as the physical size of the file would simply get too big. Is there any way I could trick powerlaw into accepting it as a histogram?

--
Reply all
Reply to author
Forward
0 new messages