| Data Science 26100: Statistical Pitfalls and Misinterpretation of Data asks students to think critically about what data and models constitute evidence. Which means there’s a lot of looking at other peoples’ mistakes, immortalized in poorly thought-out studies and articles. |
| | |
|
| The course, taught by David Biron, assistant senior instructional professor of statistics and director of undergraduate data science, is part of the new data science major launched this academic year. |
| | |
|
| In one November class, the lesson is all about regression to the mean—the statistical phenomenon that can make natural variation in data look like real change. To take an example just a few miles from the quads, Chicago White Sox catcher Yermín Mercedes began the 2021 season by getting eight hits in eight at-bats, posting an impossibly high 1.000 batting average. Baseball fans recognized this was an outlier; sure enough, over the season, Mercedes’s batting average regressed to the mean, dropping to a pedestrian .271. Outliers in a data set like this tend to revert to a value closer to the mean over time, regardless of whatever effect is being studied; a simplistic analysis misinterprets that as cause and effect. Regression to the mean trips up many researchers. |
| | |
|
| Biron cues up an example from a 1987 New York Times article touting the ability of a beta-blocker called propranolol to help overanxious students on their SATs. Twenty-five test-takers who scored lower than expected (based on their grades) took the SAT again—only this time with propranolol. They performed an average of 120 points higher the second time. Sounds good? Not to Biron: “So, what might be wrong with the design of the study?” |
| | |
|
| A student points out it’s not a random sample and there’s no control group. She’s right on the money: “This is the worst thing you can do” in designing an experiment, says Biron. It’s easy to think of reasons other than anxiety that depressed the students’ original SAT scores. Or it might be another example of regression to the mean. The point is, he explains, “with the original study design, you can’t prove anything.” |
| | |
|
|