Error in lilliefors test?

293 views
Skip to first unread message

Thomas Haslwanter

unread,
May 11, 2017, 6:42:30 AM5/11/17
to pystatsmodels
When I run

from statsmodels.stats.diagnostic import lilliefors
lilliefors(np.arange(5))

I get the output

(0.13645537156723098, 2.1686565649378555)

The second value is supposed to be the p-value. And p-values above 1 don't make any sense.
Do I miss something? Or is there a limitation of the test?

In MATLAB I get for
[H, p] = lillietest([0:4])
the output with the warning

Warning: P is greater than the largest tabulated value, returning 0.5.
> In lillietest (line 203)
H =
     0
p =
    0.5000

I guess that a similar warning should be implemented here, to avoid confusion by the users.

josef...@gmail.com

unread,
May 11, 2017, 8:19:15 AM5/11/17
to pystatsmodels
Thanks for reporting. Can you open an issue?

Based on the docstring I wouldn't expect that it extrapolates, so I need to check what's going on.

for the table, we just return the boundary value, but don't explicitly warn.

>>> sms.lilliefors(np.arange(5), pvalmethod='table')
(0.13645537156723098, 0.20000000000000001)

Josef

josef...@gmail.com

unread,
May 11, 2017, 8:36:17 AM5/11/17
to pystatsmodels
On Thu, May 11, 2017 at 8:19 AM, <josef...@gmail.com> wrote:


On Thu, May 11, 2017 at 6:42 AM, Thomas Haslwanter <thomas.h...@gmail.com> wrote:
When I run

from statsmodels.stats.diagnostic import lilliefors
lilliefors(np.arange(5))

I get the output

(0.13645537156723098, 2.1686565649378555)

The second value is supposed to be the p-value. And p-values above 1 don't make any sense.
Do I miss something? Or is there a limitation of the test?

In MATLAB I get for
[H, p] = lillietest([0:4])
the output with the warning

Warning: P is greater than the largest tabulated value, returning 0.5.
> In lillietest (line 203)
H =
     0
p =
    0.5000

I guess that a similar warning should be implemented here, to avoid confusion by the users.

Thanks for reporting. Can you open an issue?

Based on the docstring I wouldn't expect that it extrapolates, so I need to check what's going on.

Yes it's a bug.
The comments and docstring say that the interpolation/approximation formula is only valid for pvalues <= 0.1 but the code never checks for this and doesn't switch if pvalue > 0.1.

Josef

Thomas Haslwanter

unread,
May 17, 2017, 2:44:44 PM5/17/17
to pystatsmodels
Done - I have opened a new issue on github.

thomas

josef...@gmail.com

unread,
May 17, 2017, 3:05:56 PM5/17/17
to pystatsmodels


On Wed, May 17, 2017 at 2:44 PM, Thomas Haslwanter <thomas.h...@gmail.com> wrote:
Done - I have opened a new issue on github.

thomas


Thanks

I wanted to put it on my todo list for this week. It just requires rearranging some if else.

Question for users: Should we also issue a warning if the p-value is at the boundary of the table, i.e. 0.2 at the current table?
It's described in the docstring and p_value==0.2 doesn't have any extra decimals. Up to now I thought that's enough.


Josef
"It's more fun writing new code than work on existing code." also applies to maintainers.
"Reinvent the wheel, we don't have all polygon shapes for wheels yet." (*)

(*)
http://www.sciencedirect.com/science/article/pii/S0012365X97002380
(first link when searching for "limit of convex polygon")

Reply all
Reply to author
Forward
0 new messages