using Wilcoxon can be dangerous (damage your conclusions)

29 views
Skip to first unread message

josef...@gmail.com

unread,
Mar 26, 2017, 4:32:01 PM3/26/17
to pystatsmodels
"
All the cases considered so far exemplify inappropriate applications
of the two-sample Wilcoxon test; what the researchers intend to test
(testing equality of medians, means, or distributions) is incongruous
with what the Wilcoxon test is actually testing (H0 : P(X ≤ Y) = 1 / 2
). However, as will be argued in a more general setting in Section 3,
even when testing H0 : P(X ≤ Y) = 1 / 2 , the standard Wilcoxon test
is invalid unless it is appropriately studentized.
"
Chung, EunYi, and Joseph P. Romano. 2016. “Asymptotically Valid and
Exact Permutation Tests Based on Two-Sample -Statistics.” Journal of
Statistical Planning and Inference 168 (January): 97–105.
doi:10.1016/j.jspi.2015.07.004.

The section on "Misapplication of the Wilcoxon test" is a good read,
the same essentiallyapplies to permutation tests.


I bounced into applicability of permutation based tests, and the
distinction about what tests actually test under different auxiliary
assumptions, when I was looking initially small sample inference in
binomial models and contingency tables.

https://github.com/statsmodels/statsmodels/issues/3575
https://github.com/statsmodels/statsmodels/issues/3571



studentized nonparametric hypothesis tests sound like a good (newish)
category of tests.


Josef

josef...@gmail.com

unread,
Mar 26, 2017, 9:14:17 PM3/26/17
to pystatsmodels


On Sun, Mar 26, 2017 at 4:31 PM,  <josef...@gmail.com> wrote:
> "
> All the cases considered so far exemplify inappropriate applications
> of the two-sample Wilcoxon test; what the researchers intend to test
> (testing equality of medians, means, or distributions) is incongruous
> with what the Wilcoxon test is actually testing (H0 : P(X ≤ Y) = 1 / 2
> ). However, as will be argued in a more general setting in Section 3,
> even when testing H0 : P(X ≤ Y) = 1 / 2 , the standard Wilcoxon test
> is invalid unless it is appropriately studentized.
> "
> Chung, EunYi, and Joseph P. Romano. 2016. “Asymptotically Valid and
> Exact Permutation Tests Based on Two-Sample -Statistics.” Journal of
> Statistical Planning and Inference 168 (January): 97–105.
> doi:10.1016/j.jspi.2015.07.004.
>
> The section on "Misapplication of the Wilcoxon test" is a good read,
> the same essentiallyapplies to permutation tests.
>
>
> I bounced into applicability of permutation based tests, and the
> distinction about what tests actually test under different auxiliary
> assumptions, when I was looking initially small sample inference in
> binomial models and contingency tables.

(*)


>
> https://github.com/statsmodels/statsmodels/issues/3575
> https://github.com/statsmodels/statsmodels/issues/3571
>
>
>
> studentized nonparametric hypothesis tests sound like a good (newish)
> category of tests.
>
>
> Josef

(*)
"
It should be noted that asymptotic procedures can also be quite conservative. For instance, Koehler and Larntz (1980) noted that the likelihood-ratio test tends to be highly conservative when most expected frequencies are smaller than 0.5. To illustrate, consider the 3 x 9 table (0, 7, 0, 0, 0, 0, 0, 1, 1/1, 1, 1, 1, 1, 1, 1,0,0/0,8,0,0,0,0,0,0,0), discussed in the StatXact manual. For the likelihood-ratio statistic, the asymptotic p-value is 0.0837 and the exact p-value is 0.0015; for the Pearson statistic, the values are 0.1342 and 0.0013.
"
Agresti, Alan. 1992. “A Survey of Exact Inference for Contingency Tables.” Statistical Science 7 (1): 131–53.

>>> t = [[0, 7, 0, 0, 0, 0, 0, 1, 1],[1, 1, 1, 1, 1, 1, 1,0,0],[0,8,0,0,0,0,0,0,0]]
>>> t0 = time.time();rr = chisquare_contingency(t);t1 = time.time()
>>> t1 - t0
0.5123629570007324

>>> print(rr)   # formatted by hand
{'statistic': 22.285714285714281,
'p_value': 0.0011000000000000001,
'nobs': 24,
'table': array([[0, 7, 0, 0, 0, 0, 0, 1, 1],
                [1, 1, 1, 1, 1, 1, 1, 0, 0],
                [0, 8, 0, 0, 0, 0, 0, 0, 0]]),
'n_repl': 10000,
'text': 'Chisquare test for independence with permutation pvalue',
'confint_p_value': (0.00054923988469170811, 0.0019673494805434455),
'p_value_midp': 0.001,
'p_value_chi2': 0.13420348271205493}

10,000 permutations are not enough for good **relative** precision
Because of the randomness in permutation the simulation confidence interval for the p-value is 0.00055, 0.00197, in contrast to StatXact 0.0013.


100,000 permutation take more than 5 seconds in the current implementation and result in
'p_value': 0.00128
'confint_p_value': (0.0010679813034679948, 0.0015217400465570297)

but with 24 observations the randomness in the data should be much larger than what the approximation of the p-value at the 3rd decimal.

However, the difference between asymptotic pvalue of 0.13 and exact pvalue at around 0.001 is large.


Josef

josef...@gmail.com

unread,
Apr 1, 2017, 9:51:03 AM4/1/17
to pystatsmodels
On Sun, Mar 26, 2017 at 9:14 PM, <josef...@gmail.com> wrote:


On Sun, Mar 26, 2017 at 4:31 PM,  <josef...@gmail.com> wrote:
> "
> All the cases considered so far exemplify inappropriate applications
> of the two-sample Wilcoxon test; what the researchers intend to test
> (testing equality of medians, means, or distributions) is incongruous
> with what the Wilcoxon test is actually testing (H0 : P(X ≤ Y) = 1 / 2
> ). However, as will be argued in a more general setting in Section 3,
> even when testing H0 : P(X ≤ Y) = 1 / 2 , the standard Wilcoxon test
> is invalid unless it is appropriately studentized.
> "
> Chung, EunYi, and Joseph P. Romano. 2016. “Asymptotically Valid and
> Exact Permutation Tests Based on Two-Sample -Statistics.” Journal of
> Statistical Planning and Inference 168 (January): 97–105.
> doi:10.1016/j.jspi.2015.07.004.
>
> The section on "Misapplication of the Wilcoxon test" is a good read,
> the same essentiallyapplies to permutation tests.
>
>
> I bounced into applicability of permutation based tests, and the
> distinction about what tests actually test under different auxiliary
> assumptions, when I was looking initially small sample inference in
> binomial models and contingency tables.

(*)

>
> https://github.com/statsmodels/statsmodels/issues/3575
> https://github.com/statsmodels/statsmodels/issues/3571
>
>
>
> studentized nonparametric hypothesis tests sound like a good (newish)
> category of tests.

10 to 30 papers in the last year or two to robustify (*) permutation tests for various hypothesis tests sounds pretty good.

The main disadvantage, as far as I can see having only read and skimmed a few, is that it is all or mostly for obtaining "no-effect" pvalues.
However, I guess that in many cases this should apply also to score confidence intervals using reparameterization.

(*) with respect to distributional assumptions


(category small samples, no big data)

Josef
(always use enough qualifiers in a sentence so it is a correct statement)
Reply all
Reply to author
Forward
0 new messages