P-value: why Mann-Whitney test and not Student´s t-test?

MartinH

unread,

Aug 8, 2019, 2:14:39 PM8/8/19

to estimationstats

Hello group,

I am not very versed in statistics, so my questions (this one here and another one in a separate topic) are probably naive, but anyways ...

In the Nature Methods article " Moving beyond P values: data analysis with estimation graphics" (https://www.nature.com/articles/s41592-019-0470-3) it is suggested that Gardner-Altman-plots should replace null-hypothesis significance testing (NHST). It is also noted that Student´s t-test is the traditional way of analyzing the data in question.

In addition to the Gardner-Altmann-Plot, the web-tool also calculates a P-value – as indicated on the website “to satisfy a common requirement of scientific journals”.

However, this is not the Student´s t-test P-value but the Mann-Whitney test P-value. Why not the Student´s t-test P-value it the plot is intended to replace this test?

Martin

Joses Ho

unread,

Aug 10, 2019, 9:34:32 PM8/10/19

to estimationstats

Hi Martin,

Thanks for this question. The webapp performs the Mann-Whitney test as it is non-parametric, and thus makes lesser assumptions about the data. (See https://en.m.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test)

Performing a null-hypothesis significance test requires the analyst to make several assumptions about the data (eg are they sampled from a normally distributed population?) Often when a p-value is reported, it is unclear which test is performed (unless one digs through the methods section of the study).

We do posit the Gardner-Altman two-group plot as an alternative not just to the student's t-test, but null-hypothesis testing in general.

Hope this helps!
Joses

Adam Claridge-Chang

unread,

Aug 11, 2019, 2:07:59 AM8/11/19

to estimationstats

Hi Martin,

Just to add a few notes to Joses' response.

There is another test, the permutation test, that is more closely compatible with the bootstrap curve and CIs that we use in the plots. However, we haven't had time to implement that yet, so the Mann-Whitney is the most conservative substitute. It is underpowered compared to permutation, so typically will show slightly larger P values. Since we don't see any value in P values (except to satisfy reviewers), this isn't a priority.

The Gardner-Altman is a conceptual replacement for the t-test; you are right that the bootstrap version is not a direct counterpart. For this, someone would need to code an effect-size distribution based on the t-distribution.

best,

Adam

Alex M

unread,

Dec 11, 2019, 6:03:06 AM12/11/19

to estimationstats

Hello and thanks for all the information.

As long as this method offers a p-value based on Mann-Whitney test, would it make sense to add in the figure legend a p-value based on t-test while showing the plot made with Mann-Whitney test? I think this could give complementary information for those who still want to see the p-value, often required by reviewers.

Thank you!

Joses Ho

unread,

Dec 11, 2019, 10:31:28 PM12/11/19

to estimationstats

Hi Alex,

Thanks for your suggestion. Adding a P value legend or label would, in our opinion, clutter the figure. Estimation graphics are designed to emphasise the effect sizes in group comparisons.

We understand that reviewers sometimes want to see the P-value; in the text summary produced by estimationstats.com, this is included alongside the 95% CI and effect size. More importantly, the summary ends with the boilerplate:

The P value(s) reported are the likelihood(s) of observing the effect size(s), if the null hypothesis of zero difference is true; they are included here to satisfy a common requirement of scientific journals.

This text is deliberately worded to achieve two things: ensure both readers and reviewers understand what a P value actually is (and subtly point out its limitations), and to highlight an almost dogmatic adherence to its usage without actually knowing why we rely on it. This is how we currently report our results, and we've faced relatively mild or no pushback.

If reviewers insist on displaying the P value on the figure itself, we would advise using a vector graphics software to display the actual P value (eg. P = 0.045) rather than a dicthomised one (eg. P < 0.05) above the relevant effect size curve. If there are several figures, one should use the Python or R packages to programmatically generate them and annotate the P values appropriately.