Importance Of Statistics In Evaluation

0 views

Skip to first unread message

Karoline Oum

unread,

Aug 3, 2024, 5:06:00 PM8/3/24

to golfruzamud

I'm a layman, so to speak. I'm trained in biology but I have no formal education in statistics or mathematics. I enjoy R and often make an effort to read (and understand...) some of the theoretical foundations of the methods that I apply when doing research. It wouldn't surprise me if the majority of people doing analyses today are actually not formally trained. I've published around 20 original papers, some of which have been accepted by recognized journals and statisticians have frequently been involved in the review-process. My analyses commonly include survival analysis, linear regression, logistic regression, mixed models. Never ever has a reviewer asked about model assumptions, fit or evaluation.

Thus, I never really bothered too much about model assumptions, fit and evaluation. I start with a hypothesis, execute the regression and then present the results. In some instances I made an effort to evaluate these things, but I always ended up with "well it didn't fulfill all assumptions, but I trust the results ("subject matter knowledge") and they are plausible, so it's fine" and when consulting a statistician they always seemed to agree.

Now, I've spoken to other statisticians and non-statisticians (chemists, physicians and biologists) who perform analyses themselves; it seems that people don't really bother too much about all these assumptions and formal evaluations. But here on CV, there is an abundance of people asking about residuals, model fit, ways to evaluate it, eigenvalues, vectors and the list goes on. Let me put it this way, when lme4 warns about large eigenvalues, I really doubt that many of its users care to address that...

Is it worth the extra effort? Is it not likely that the majority of all published results do not respect these assumptions and perhaps have not even assessed them? This is probably a growing issue since databases grow larger every day and there is a notion that the bigger the data, the less important is the assumptions and evaluations.

I am trained as a statistician not as a biologist or medical doctor. But I do quite a bit of medical research (working with biologists and medical doctors), as part of my research I have learned quite a bit about treatment of several different diseases. Does this mean that if a friend asks me about a disease that I have researched that I can just write them a prescription for a medication that I know is commonly used for that particular disease? If I were to do this (I don't), then in many cases it would probably work out OK (since a medical doctor would just have prescribed the same medication), but there is always a possibility that they have an allergy/drug interaction/other that a doctor would know to ask about, that I do not and end up causing much more harm than good.

If you are doing statistics without understanding what you are assuming and what could go wrong (or consulting with a statistician along the way that will look for these things) then you are practicing statistical malpractice. Most of the time it will probably be OK, but what about the occasion where an important assumption does not hold, but you just ignore it?

I work with some doctors who are reasonably statistically competent and can do much of their own analysis, but they will still run it past me. Often I confirm that they did the correct thing and that they can do the analysis themselves (and they are generally grateful for the confirmation) but occasionally they will be doing something more complex and when I mention a better approach they will usually turn the analysis over to me or my team, or at least bring me in for a more active role.

So my answer to your title question is "No" we are not exaggerating, rather we should be stressing some things more so that laymen will be more likely to at least double check their procedures/results with a statistician.

Adam, Thanks for your comment. The short answer is "I don't know". I think that progress is being made in improving the statistical quality of articles, but things have moved so quickly in many different ways that it will take a while to catch up and guarentee the quality. Part of the solution is focusing on the assumptions and the consequences of the violations in intro stats courses. This is more likely to happen when the classes are taught by statisticians, but needs to happen in all classes.

Some journals are doing better, but I would like to see a specific statistician reviewer become the standard. There was an article a few years back (sorry don't have the reference handy, but it was in either JAMA or the New England Journal of Medicine) that showed a higher probability of being published (though not as big a difference as it should be) in JAMA or NEJM if a biostatistican or epidemiologist was one of the co-authors.

The question is how much they matter -- this varies across procedures and assumptions and what you want to claim about your results (and also how tolerant your audience is of approximation -- even inaccuracy -- in such claims).

So for an example of a situation where an assumption is critical, consider the normality assumption in an F-test of variances; even fairly modest changes in distribution may have fairly dramatic effects on the properties (actual significance level and power) of the procedure. If you claim you're carrying out a test at the 5% level when it's really at the 28% level, you're in some sense doing the same kind of thing as lying about how you conducted your experiments. If you don't think such statistical issues are important, make arguments that don't rely on them. On the other hand, if you want to use the statistical information as support, you can't go about misrepresenting that support.

In other cases, particular assumptions may be much less critical. If you're estimating the coefficient in a linear regression and you don't care if it's statistically significant and you don't care about efficiency, well, it doesn't necessarily matter if the homoskedasticity assumption holds. But if you want to say it's statistically significant, or show a confidence interval, yes, it certainly can matter.

Attention is paid to characteristics that might not matter at all for the properties of the inference, or may matter little while other things may matter much more. Here I don't just mean the specific assumptions (which is definitely a thing) but also the manner in which they're considered.

It's important to understand the behaviour of the tools being used in population situations that may be somewhat like the circumstances you're in, as well as under counterfactuals you might be considering (such as under the null, perhaps, or under different effect sizes from the one that actually applies, such as one you used to compute a sample size). Tests of assumptions are commonly used but do not address these issues.

As one example, if you're focused on assumptions made to guarantee the significance level is correct (or very nearly correct), using data that doesn't arise under H0 is not necessarily very telling. It may easily happen that two distributions would be identical if H0 were true (if the treatment had literally zero effect), but the spread or shape may gradually change as the effect increases, yielding noticeably different distributions at some plausible effect size. (e.g. consider increasing test scores due to some effective treatment squeezing up against the highest possible score -- which would change both spread and shape). This observation would have no impact on the behavior under H0. There's many other situations where there's a focus on the data rather than the population properties of the procedure, which is the thing you're actually worried about.

However, what concerns me even more is that I very regularly see people testing assumptions that the procedures they employ do not even make under either H0 (when computing significance level from null distributions given some assumptions) or at some place under H1 (when computing power, given some assumptions). This is astonishingly common. One very common example is checking variables (whether DVs or IVs) for marginal normality.

So to an extent there may be some over-focus in some circumstances (which may largely be avoided if the "consider the impact on your inference of likely/potential violations" is the starting point) but the greater danger is simply focusing very strongly on things that weren't even assumptions.

One consideration is whether you really want to get the scientific truth, which would require polishing your results and figuring out all the details of whether your approach is defensible, vs. publishing in the "ah well, nobody checks these eigenvalues in my discipline anyway" mode. In other words, you'd have to ask your inner professional conscience whether you are doing the best job you could. Referring to the low statistical literacy and lax statistical practices in your discipline does not make a convincing argument. Reviewers are often at best half-helpful if they come from the same discipline with these lax standards, although some top outlets have explicit initiatives to bring statistical expertise into the review process.

But even if you are a cynical "publish-or-perish" salami slicer, the other consideration is basically the safety of your research reputation. If your model fails, and you don't know it, you are exposing yourself to the risk of rebuttal by those who can come and drive the ax into the cracks of the model checks with more refined instruments. Granted, the possibility of that appears to be low, as the science community, despite the nominal philosophical requirements of reputability and reproducibility, rarely engages in attempts to reproduce somebody else's research. (I was involved in writing a couple of papers that basically started with, "oh my God, did they really write that?", and offered a critique and a refinement of a peer-reviewed published semi-statistical approach.) However, the failures of statistical analyses, when exposed, often make big and unpleasant splashes.