Thanks David, this seems to be an excellent resource to cite in the guidelines.
I suppose it could be useful at some point to discuss what these guidelines should be trying to achieve. For now I see three possible goals.
1. Moving towards reviews that are more rigorous and less forgiving of errors in statistical analyses/interpretations. This would be desirable but as we all keep pointing out, we're lacking statistical expertise in the reviewing pool. So I don't think we should be too obsessed with this goal. Educational material and resources that discuss common statistical errors are plentiful and we should encourage reviewers to read them, but I don't think the CHI guidelines necessarily need to repeat these.
2. Moving towards reviews that recognize and reward good practices that are not widely recognized as such at CHI. Things like clarity and completeness, nuanced conclusions, shared material, etc. are all easy to assess even by non-expert reviewers, and if reviewers are properly educated on the importance of these, this could contribute to improving the quality and transparency of reports overall and reduce practices like p-hacking.
3. Moving towards reviews that do not reject statistical reports for the wrong reasons. I'm not sure why this one is so much overlooked. For a non-expert and/or hurried reviewer, it is tempting to use simple heuristics to assess the validity of a statistical report (e.g., does it report ANOVAs / p-values? Are the results significant? Is sample size more than X?) rather than looking at the subtleties of the analysis or at the big picture. As long as reviewers will believe in such heuristics, other recommendations will have little influence.
My hope is that we can encourage reviewers to replace their old, misguided heuristics (3) with other, better heuristics (2) for evaluating studies. Covering (3) is difficult and we may not all agree, but it seems fairly easy to list the different ways we address concrete statistical problems, and state that they're all valid. Such a statement may seem vague as recommendation to authors, but as a recommendation to reviewers it is quite specific, because it implies that using method X rather Y shouldn't be a reason for rejection.