Dear Kush,
You shouldn't trust KMO and Bartlett's tests alone. These tests are necessary, but not enough.
The KMO statistic tells us the proportion of variance that might be common variance. A high KMO indicates that the variables share enough covariance to factor. But it does not tell whether the common variance forms a stable, replicable structure. It is blind to sampling error. Bartlett's test of sphericity rejects the hypothesis that the correlation matrix is an identity matrix. In practice, with N
≈ 120 and any modest correlations, it’s almost always significant. It also tells nothing about the number of factors, loading strength, or cross-loading contamination. Therefore, its significance is trivial.
A small sample can produce a “clean” loading matrix by chance alone, but that pattern may vanish in the next sample. To demonstrate this, I have attached the simulation file (efa_demo.txt), which you can copy and paste into Google Colab to run.
The simulation generates data from a realistic, moderately strong two‑factor model (primary loadings ≈ 0.60, cross‑loadings ≈ 0.35) and then performs an exploratory factor analysis on 1000 independent samples at each of six sample sizes: N = 50, 75, 100, 150, 200, and 300. For each sample, it also computes the KMO and Bartlett tests.
Now, the question we ask is: Given that KMO and Bartlett are excellent (KMO > 0.80,
Bartlett's p < .00001), what is the probability that the factor solution correctly assigns all 10 items to their true factors? That probability is what I referred to as recovery power in my previous email.
As the simulation will show:
(1) KMO and Bartlett are always excellent. Across every sample size, KMO is well above 0.80, and Bartlett’s test is significant. By the usual guidelines, all these datasets “pass” the preliminary checks and are deemed suitable for EFA.
(2) Yet recovery power is terrible at small N. At N = 50, even though KMO ≈ 0.84 and Bartlett's p < 10⁻⁵, only 14.5% of the time do we get all items on the correct factor. That means more than 4 out of 5 studies would misclassify at least one item, often leading to mislabelled constructs. At N = 100 (a commonly cited rule of thumb), perfect recovery happens only 35% of the time; still a gamble. At N = 150, it’s slightly better than a coin flip. See attached recovery_vs_sample.png.
(3) Adequate recovery only appears with much larger N. To have a ≥ 90% chance of a perfect solution, we need around N = 300 (and our simulation at N = 300 came very close, at 89.8%).
(4) The diagnostic tests do not improve much with N, but recovery does. KMO goes from 0.84 (N = 50) to 0.92 (N = 300). A very small change. Bartlett’s p‑value is always virtually zero. The dramatic improvement in recovery is driven almost entirely by sample size, not by “stronger” KMO or Bartlett results.
To summarise this already lengthy message, KMO and Bartlett’s test are necessary checks. They tell you whether there is enough shared variance in your data to attempt factoring. But they are not sufficient to guarantee that the solution you obtain will be accurate or replicable. That accuracy depends on the amount of information you have relative to your model's complexity. In other words, the sample size. This is exactly the concept of power in EFA: not classical hypothesis‑testing power, but the probability that your analysis recovers the true structure.
Best,
MRS