We discussed the 5th chapter from Model Thinker—Normal distributions and LogNormal distributions.
This book approached these distributions more as a modeler than as a statistician. An example to clarify this:
In a lot of statistics books, we consider that data on heights of people is normally distributed. It is more a matter of fact (normal/natural). And then a few books say that data on incomes would not be normally distributed (long-tailed distributions). These books reach these conclusions based on empirical analysis. The statisticians probably saw a lot of datasets of heights that were normally distributed and datasets of incomes that were not so distributed.
But in Model Thinker, the author used the Central Limit Theorem (CLT) to explain why these datasets were so distributed. He defined CLT in a slightly different manner than what I found in other stats books.
"The sum of independent random variables will be normally distributed."
Then he goes on to explain a person's height as a sum of 180 random variables, where each random variable is the height contributed by a gene responsible for height (one gene corresponds to the height of the neck, another to the leg, etc.). Assuming that these 180 genetic contributions are independent, CLT would make us reach the conclusion that heights will be normally distributed.
If we cannot define a dataset as a sum of independent random variables, we cannot assume normal distribution there. There are good examples with the incomes and farm sizes dataset in the book.
Rumi questioned the N>20 condition. We had no answers to that yet.
Another discussion is about comparing items of different sample sizes/population sizes (schools, cities, districts, departments, etc). There is a chance that we over-interpret the variation between these items. Small sample sizes naturally have more variation. This is the reason why standard deviation formula is called the most dangerous equation in the world (Howard Wainer)
One key takeaway: We can do stats even when we have population data by treating it like sample data. In my recent
work, I took the AQI data of Delhi, treated it like sample data and constructed confidence intervals to show that the recent reduction in July's average AQI is not statistically significant.

For the next week, we planned to read till the 6th chapter (Power Law Distribution) of Model Thinker.
The call will be on August 24th, Sunday, at 10AM tentatively. Please RSVP to the GMeet invite when you receive it.
Please add anything I missed or misrepresented.
Best,