Minimum sample for EFA

40 views
Skip to first unread message

Kush

unread,
May 28, 2026, 2:42:03 AM (7 days ago) May 28
to dataanalys...@googlegroups.com
Dear sir

I am doing my research on agriculture cooperatives. For that purpose I have developed two scales with 5 items each. I also have other scales in my questionnaire derived from previous literatures. My population of the study is around 250 and  samples around 120. 
Sir, can I run EFA in my case as my sample size is very small and it don't fit the multiplier criteria.
also sir can you please suggest how to approach statistical analysis part


Regards

Muhammad R Siregar

unread,
May 28, 2026, 7:47:48 AM (6 days ago) May 28
to dataanalys...@googlegroups.com
Dear Kush,

I would say it is doubtful.

If our only concern were the sampling variance of the sample correlation coefficient (r), your sample size of N ≈ 120 would probably be adequate. As long as the population correlations (ρ) between items are strong enough, the sampling variance drops and fewer observations are needed for a stable correlation matrix.

However, we also care about power in EFA, albeit not in the classical hypothesis-testing sense. EFA is exploratory, and there are no null hypotheses about exact loading values or number of factors. Instead, the analogous concern is the probability of accurate factor recovery—i.e., the chance that the sample solution (number of factors, pattern of loadings, and communalities) closely reproduces the population structure despite sampling variability and possible minor model error. This recovery power is evaluated using Monte Carlo simulations in the methodological literature.

For your case (N ≈ 120, short 5-item scales), the concern is whether this N gives a sufficiently high probability of recovering a clean, replicable structure for your specific items, assuming the items are psychometrically strong.

Best,
MRS

--
The members of this group are expected to follow the following Protocols:
1. Please search previous posts in the group before posting the question.
2. Don't write the query in someone's post. Always use the option of New topic for the new question. You can do this by writing to dataanaly...@googlegroups.com
3. It’s better to give a proper subject to your post/query. It'll help others while searching.
4. Never write Open-ended queries. This group intends to help research scholars, NOT TO WORK FOR THEM.
5. Never write words like URGENT in your posts. People will help when they are free.
6. Never upload any information about National Seminars/Conferences. Send such information
in personal emails and feel free to share any RESEARCH-related information.
7. No Happy New Year, Happy Diwali, Happy Holi, Happy Birthday, Happy Anniversary, etc. allowed in this group.
8. Asking or sharing Research Papers is NOT ALLOWED.
9. You can share your questionnaire only once.
---
You received this message because you are subscribed to the Google Groups "DataAnalysis" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataanalysistrai...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dataanalysistraining/CA%2BdRmrYD0GkEbC5UVaU_rEqsK%2B0-MUCHcwzn_AOO3fuZjTgJXA%40mail.gmail.com.

Dr. Mohammad Faisal

unread,
May 28, 2026, 12:26:49 PM (6 days ago) May 28
to dataanalys...@googlegroups.com
Dear Kush

To assess the sample adequacy and strength of relationship among the variables, you need to check KMO test and Bartlett test results. If these two are meeting the criteria, you can use EFA even if not meeting the multiplier criteria. But keep in mind, you need to have a reason for small sample size. Statistically, these test are enough.


Dr. Mohammad Faisal

Assistant Professor, Guest Faculty 

School of Banking, Financial Services, and Insurance

Delhi Skill and Entrepreneurship University

 

(Government of NCT of Delhi)

 

New Delhi, India


Kush

unread,
May 28, 2026, 7:22:06 PM (6 days ago) May 28
to dataanalys...@googlegroups.com

Muhammad R Siregar

unread,
May 29, 2026, 7:13:27 AM (6 days ago) May 29
to dataanalys...@googlegroups.com
Dear Kush,

You shouldn't trust KMO and Bartlett's tests alone. These tests are necessary, but not enough.

The KMO statistic tells us the proportion of variance that might be common variance. A high KMO indicates that the variables share enough covariance to factor. But it does not tell whether the common variance forms a stable, replicable structure. It is blind to sampling error. Bartlett's test of sphericity rejects the hypothesis that the correlation matrix is an identity matrix. In practice, with N ≈ 120 and any modest correlations, it’s almost always significant. It also tells nothing about the number of factors, loading strength, or cross-loading contamination. Therefore, its significance is trivial.

A small sample can produce a “clean” loading matrix by chance alone, but that pattern may vanish in the next sample. To demonstrate this, I have attached the simulation file (efa_demo.txt), which you can copy and paste into Google Colab to run.

The simulation generates data from a realistic, moderately strong two‑factor model (primary loadings ≈ 0.60, cross‑loadings ≈ 0.35) and then performs an exploratory factor analysis on 1000 independent samples at each of six sample sizes: N = 50, 75, 100, 150, 200, and 300. For each sample, it also computes the KMO and Bartlett tests.

Now, the question we ask is: Given that KMO and Bartlett are excellent (KMO > 0.80,  Bartlett's p < .00001), what is the probability that the factor solution correctly assigns all 10 items to their true factors? That probability is what I referred to as recovery power in my previous email.

As the simulation will show:
(1) KMO and Bartlett are always excellent. Across every sample size, KMO is well above 0.80, and Bartlett’s test is significant. By the usual guidelines, all these datasets “pass” the preliminary checks and are deemed suitable for EFA.
(2) Yet recovery power is terrible at small N. At N = 50, even though KMO ≈ 0.84 and Bartlett's p < 10⁻⁵, only 14.5% of the time do we get all items on the correct factor. That means more than 4 out of 5 studies would misclassify at least one item, often leading to mislabelled constructs. At N = 100 (a commonly cited rule of thumb), perfect recovery happens only 35% of the time; still a gamble. At N = 150, it’s slightly better than a coin flip. See attached recovery_vs_sample.png.
(3) Adequate recovery only appears with much larger N. To have a ≥ 90% chance of a perfect solution, we need around N = 300 (and our simulation at N = 300 came very close, at 89.8%).
(4) The diagnostic tests do not improve much with N, but recovery does. KMO goes from 0.84 (= 50) to 0.92 (N = 300). A very small change. Bartlett’s p‑value is always virtually zero. The dramatic improvement in recovery is driven almost entirely by sample size, not by “stronger” KMO or Bartlett results.

To summarise this already lengthy message, KMO and Bartlett’s test are necessary checks. They tell you whether there is enough shared variance in your data to attempt factoring. But they are not sufficient to guarantee that the solution you obtain will be accurate or replicable. That accuracy depends on the amount of information you have relative to your model's complexity. In other words, the sample size. This is exactly the concept of power in EFA: not classical hypothesis‑testing power, but the probability that your analysis recovers the true structure.

Best,
MRS


efa_demo.txt
recovery_vs_sample.png
Reply all
Reply to author
Forward
0 new messages