Hypothesis Testing

61 views

Skip to first unread message

Richard Bruns

unread,

Dec 12, 2019, 11:34:35 AM12/12/19

to yasai-simulation

First of all, thank you for this software. It is great, exactly what I need to show everyone at my think tank how to run simulations. It has the essential core functionality and can be installed without admin privileges.

However, the Hypothesis Test function is flawed and basically unusable. The problem is that it treats each additional recalculation as an actual new sample. So if you run a small number of simulations, it gives really large p-vales, but if you run a large number of simulations, it will always give absurdly tiny p-values, even for variables that are extremely close together. But clearly, the question of whether or not two output variables are significantly different should not depend on the number of simulations you ran.

In the attached sheet, Output 1 is 100 samples and Output 2 is 1000 samples drawn from the exact same input distributions. The output means are (by design) so incredibly close that anyone should see there is no significant difference. The confidence intervals are stacked almost exactly on top of each other. And yet it claims a highly significant p-value in the 1000-sample run.

For future releases, please delete this misleading feature or replace it with something that compares the means and variance of simulation outputs and gives the same result independent of the chosen number of simulation runs.

Test Data.xlsx

Haolin Feng

unread,

Sep 16, 2020, 9:26:25 AM9/16/20

to yasai-simulation

Hi. I have taken a look at the results. It seems that the p-value is correct when we assume that the simulated observations in those two columns are paired. If the results are not paired, then the p-value will be a bit larger, but still < 10%.

I think it is worth pointing out that statistical significance and practical significance are two different things. Two population can have a different population means that are somewhat close to each other, and the hypothesis testing with random sample drawn from each population will correctly provide a statistically significant test results as long as the sample sizes are large enough. For example, if you simulate 1M of random numbers from N(0, 1) and another 1M random numbers from N(0.1, 1), you will most likely have a very small p-value if the test is about whether two populations are having the same mean, although practically the means are somewhat close.

I am not sure if I am clear enough. Feedback is welcome.

H. Feng

Reply all

Reply to author

Forward

0 new messages