Interpreting the p-value for distribution

Niklas

unread,

Aug 20, 2020, 6:03:31 AM8/20/20

to powerlaw-general

Hi there,

could someone please explain to me, how I can interpret the p-value of the distribution_compare function?

My understanding is the p-value gives the statistical evidence for the null hypothesis, and the first distribution in the distribution_compare is the null hypothesis and the second is the alternative hypothesis. So, when I call

R, p = fit.distribution_compare('truncated_power_law', 'lognormal')

R->25.52082, p->0.008509

Then I assume that the statistical evidence for the truncated power law is rather low, even though the R value is high. Yet, when I run the opposite test:

R, p = fit.distribution_compare(' lognormal ', ' truncated_power_law ')

R->-25.52082, p->0.008509

This p-value is the same, even though the hypothesis have switched?! The only mention I could find on the p-value was in the paper, where it said: "The significance value for that direction is p." However, in my example the direction does not influence the p-value at all.

So, am I getting this wrong and the p-value here means something else, such as evidence that the test is wrong, and a lower p-value is better than a higher p-value? Is there documentation on this anywhere?

Cheers,

Niklas

Jeff Alstott

unread,

Aug 20, 2020, 8:16:14 AM8/20/20

to powerlaw...@googlegroups.com

Note that in the two examples you gave R is the same size, *but the sign was flipped* (in the second case R is negative). The p value is the odds that the R value is different from zero. From Clauset et al., pg. 19:

"The basic idea behind the likelihood ratio test is to compute the likelihood of the data under two competing distributions. The one with the higher likelihood is then the better fit. Alternatively one can calculate the ratio of the two likelihoods, or equivalently the logarithm R of the ratio, which is positive or negative depending on which distribution is better, or zero in the event of a tie.

The sign of the log likelihood ratio alone, however, will not definitively indicate which model is the better fit because, like other quantities, it is subject to statistical fluctuation. If its true value, meaning its expected value over many independent data sets drawn from the same distribution, is close to zero, then the fluctuations could change the sign of the ratio and hence the results of the test cannot be trusted. In order to make a firm choice between distributions we need a log likelihood ratio that is sufficiently positive or negative that it could not plausibly be the result of a chance fluctuation from a true result that is close to zero.

To make a quantitative judgment about whether the observed value of R is sufficiently far from zero, we need to know the size of the expected fluctuations, i.e., we need to know the standard deviation σ on R. This we can estimate from our data using a method proposed by Vuong [63]. This method gives a p-value that tells us whether the observed sign of R is statistically significant. If this p-value is small (say p < 0.1) then it is unlikely that the observed sign is a chance result of fluctuations and the sign is a reliable indicator of which model is the better fit to the data. If p is large on the other hand, the sign is not reliable and the test does not favor either model over the other. It is one of the advantages of this approach that it can tell us not only which of two hypotheses is favored, but also when the data are insufficient to favor either of them"

https://arxiv.org/pdf/0706.1062.pdf

--
You received this message because you are subscribed to the Google Groups "powerlaw-general" group.
To unsubscribe from this group and stop receiving emails from it, send an email to powerlaw-gener...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/powerlaw-general/f2985bd6-9136-47c2-9660-ad2f636c3546n%40googlegroups.com.

Niklas

unread,

Aug 20, 2020, 9:00:54 AM8/20/20

to powerlaw-general

Amazing, thank you so much, Jeff! I looked for hours to find an explanation and this is just perfect :-)

Reply all

Reply to author

Forward

Interpreting the p-value for distribution_compare

Niklas

Jeff Alstott

Niklas