Selecting xmin

32 views
Skip to first unread message

sanjasc...@gmail.com

unread,
Mar 30, 2017, 10:15:56 AM3/30/17
to powerlaw-general
Hi,

Thanks again for very useful and fully opensource contribution to us scientists.

I would need help with understanding the optimal xmin value found by powerlaw.Fit() versus a value supplied by myself (xmin=1). In figure below I show the output on my data (it is very similar to Fig.1 in the powerlaw PLoS One paper).

For both datasets, blue and red, the continuous line shows the pdf of a powerlaw fit with optimal xmin (7 and 10, respectively). The dashed lines show the pdfs of powerlaw first with supplied xmin=1. The second ones obviously *look* better, but if I understand well the paper, they still do not fit better the data.

Is there something incorrect if reporting the fit with xmin=1 only? With it, powerlaw is still found a better fit compared to other distributions.

Kind regards,
Sanja

Jeff Alstott

unread,
Mar 30, 2017, 11:32:13 AM3/30/17
to powerlaw...@googlegroups.com
Note that between x=1 and x=7 is most of your data. As such, if that section is bent even a bit from a power law, it will be a very loud signal during the step where we select the xmin that yields the closest fit to a theoretical power law. What the algorithm is finding is that using only the data after x=7 yields a closer KS-distance (D) to the theoretical power law than using the data after x=1. How big is the difference between the fit at xmin=7 vs xmin=1? Take a look at fit.Ds, which is discussed in the paper around Figure 5:

It might be that the difference is very small. Given that there's a lot of data between x=1 and x=7, it could be that the difference is very large. What is the right way to interpret your data depends on what you're trying to achieve. It appears that you could say something like "The data was better described by a power law than any of the other candidate distributions.The superiority of the power law was true using all data points, though the best-fit power law was obtained by only using the 20% of data points larger than 7 (all data: D=X, data>7: D=Y)." 

People tend to care about xmin because they say they're only interested in the tail of the distribution, and they "expect" there to be other factors on the left-hand side of the distribution that make it deviation from a power law. As such, they're fishing around for where exactly the "tail" starts, and we find the best possible tail for describing a power law. Note that this somewhat puts the cart before the horse, but the severity of the sin depends on what is your research question and how you describe what you did. For your case, if the entirety of the data set (xmin=1) is better described by a power law than anything else (including a lognormal!) then I don't think there's any reason to use anything other than xmin=1.




--
You received this message because you are subscribed to the Google Groups "powerlaw-general" group.
To unsubscribe from this group and stop receiving emails from it, send an email to powerlaw-general+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sanjasc...@gmail.com

unread,
Mar 30, 2017, 12:24:45 PM3/30/17
to powerlaw-general
Great, thanks! Your answer is more than helpful. 

Indeed, when plotting the fit.Ds, sigma and sigma/alpha, as in Fig. 5 you pointed out, I can see that all those values are less than 0.05 on the whole range up to xmin = 20 (they are low for a while later, too).

I post the results for one of the distributions, just in case it can be helpful to someone else. Basically, thanks to you, now I am confident that I can use xmin=1 or less than optimal anyway (since the optimal is a global minimum, but we have many local ones, and the corresponding values for KS distance are low enough for other xmins, too).


I very much appreciate your help and time, Jeff. We will cite your paper, hopefully the one we send is soon accepted.

Cheers!
Sanja
To unsubscribe from this group and stop receiving emails from it, send an email to powerlaw-gener...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages