Hello,
Thanks for developing the package, it seems to have a great functionality.
I have been comparing the results of my own code, which does something very similar, to the results of powerlaw.
My fit concerns a continuous small (<50) sample, with xmin and xmax specified to be the min and max values present in the data, but the issue remains in generic cases as well.
When I plot my results against the ones from powerlaw, there are clear discrepancies which I pinned down to arise from different values of the empirical distribution of the data.
I tried reading the source code and looked at past discussions, but could not answer my question, so here it goes:
From my understanding of the empirical cdf, I am calculating it as data=(x1, x2, x3, ... x_n) ecdf=(1/n, 2/n, 3/n, ...., n/n=1) where n is the size of the sample.
But to match the empirical cdf returned from powerlaw, I need to start at zero, that is ecdf=(0, 1/n, 2/n ,..., n-1/n)
Using x, y = fitparam.cdf()
print x ,y
confirms that this is the case, as the first point in y seems to always be zero.
Am I understanding something wrong in how the powerlaw code works (or how the ecdf should be defined in general)?
If not, then what is the reason for defining the ecdf like that?
This has also obviously consequences when one leaves xmin free to be specified from the code, as the ecdf is used to calculate the KS statistic.
Many thanks in advance,
Danai