Number of dimensions to retain from MCA analysis.

Edd Collett

unread,

Feb 19, 2015, 4:38:29 PM2/19/15

to factomin...@googlegroups.com

Dear group

This may seem a rather simple question but i hope someone can help.

Whilst conducting a free sorting analysis using the MCA function there is the option to choose the number of dimensions that are retained by the analysis (using the ncp option). My question is . . . . is there something that can be used to determine exactly how many dimensions should be retained for the best analysis? Maybe based on the amount of variance covered by each dimension?

Please help as this could be essential to a lot my work, and my thanks in advance.

Edward Collett

Gianmarco Alberti

unread,

Feb 20, 2015, 1:11:53 AM2/20/15

to factomin...@googlegroups.com

Hello,

I have summarized the issue and some of the approaches to it in a page of my website on Correspondence Analysis.

The address is:

http://cainarchaeology.weebly.com/number-of-dimensions-useful-for-data-interpretation.html

Hope this helps a bit

Gianmarco

Inviato da iPad

--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "FactoMineR users".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse factominer-use...@googlegroups.com.
Pour obtenir davantage d'options, consultez la page https://groups.google.com/d/optout.

josse

unread,

Feb 20, 2015, 5:12:29 AM2/20/15

to factomin...@googlegroups.com

Hello,

Selecting the number of components in Multiple Correspondence Analysis is not a trivial task.

Often, there is no clear cut in the barplot of the eigenvalues.

The percentage of variability is related to the size of your data (number of rows - of variables and of categories). These percentages are often very small, but it is usual, due to the nature of the data. It is very classical to get for instance only 15% on the first two dimensions, it does not mean that it is not good, but only that you summarized 15% of the information (which was big) with these two dimensions.

So the first thing you can do, is keeping only the dimensions associated to eigenvalues greater than 1/(nbvar), it is the equivalent of the rule "keeping the eignevalues greater than 1" in PCA which
corresponds to keeping more than "independence". It means that with random data, you expect that all the eigenvalues will be equal to 1/nbvar, thus when you have information in your data you may expect to have more.

Then, in the package missMDA we implemented a method based on cross-validation in the function estim_ncpMCA; it can give a rough idea. The rationale is to remove data and select the number of components that provide the best prediction of the values. (Recommandation, use it with GCV or Kfold argument when the data set is big).

Some methods based on permutation are also available.

Finally, I would say that it is also dependent of your aim and what you do with MCA.

All the best,
HTP,
Julie.

Le 20/02/15 07:11, Gianmarco Alberti a écrit :

--
Conference MissDATA to deal with missing values
Rennes, France 17-19 June 2015
http://missdata2015.agrocampus-ouest.fr

Reply all

Reply to author

Forward