About MCA and PCA

Skip to first unread message

Mahmood Naderan

Feb 7, 2021, 5:34:55 PMFeb 7
to factomin...@googlegroups.com
As I read the MCA [1] and PCA examples, I came across some questions
where the answers can help better understand the concept. So, I
appreciate any further explanation.

- In the MCA example, it is stated that "Except for the age, all the
variables are categorical". I want to know if categorizing is a manual
process or that is determined by some algorithms? Why is age excluded?
and what does continuous variable mean exactly?

- What information MCA shows that PCA doesn't show? I mean, what is
the shortcoming of PCA? Categorical variables?

- How can PCA results help to better categorizing variables in MCA? I
mean which case is correct?
1) Use raw variables -> do PCA -> figure out the variable categories
-> do MCA -> analyze results
2) Use raw variables -> do PCA -> analyze results
3) Preprocess variables by definitions and categorize them -> do MCA
-> analyze results

[1] http://factominer.free.fr/factomethods/multiple-correspondence-analysis.html
[2] http://factominer.free.fr/factomethods/principal-components-analysis.html



Feb 8, 2021, 6:02:16 AMFeb 8
to FactoMineR users
Hi.  You really need to go and consult any of the many standard text books on multivariate statistical analysis.  These questions are not related to FactoMineR.

The short answer is:  PCA is mainly concerned with analysing continuous variables such as length, width, weight.

Multiple Correspondence Analysis is concerned with mulitiple categories: e.g.,

object 1 is green, oval and has a flower on it
object 2 is yellow, oval and has a cat on it
object 3 is yellow, square and has a flower on it

and so on.

You can have supplementary  qualitative variables in a PCA.  So, for example, if you have objects described by length, width and weight, you could have colour as a supplementary qualitative variable.  The colour is not used in the analysis, but the results are compared to it.

For beginners in multivariate analysis I always suggest sticking to:

PCA: continuous data.
CA: cross-tabulated count data
MCA: multistate descriptive categorical data

Compositional data (i.e., data which sums to 100%) are a whole other can of worms.

Hope this helps.
Reply all
Reply to author
0 new messages