FAMD: Singular Value Decomposition

69 views
Skip to first unread message

Aquinas Mind

unread,
Jul 22, 2024, 10:06:56 PM7/22/24
to FactoMineR users

Good evening,


First and foremost, I want to thank you for the useful tool you have developed.

I've been using the FAMD function from the FactoMineR package, but I have a minor question. My understanding is that FAMD utilizes Singular Value Decomposition (SVD) as its decomposition method. Specifically, for a matrix X, it computes the SVD on the matrix

t(t(x)*sqrt(col.w))*sqrt(row.w)

 where

row.w = rep(1/nrow(x), nrow(x))

and

col.w = rep(1,ncol(x))

If the number of left singular vectors and right singular vectors to compute is set to ncp (the parameter in the FAMD function), this decomposition corresponds to the svd.triplet function in FactoMineR.

I confirmed this using the R base function svd(), and it works correctly when all variables are continuous (applying PCA in FactoMineR). However, it doesn't work within the context of mixed data, assuming that categorical variables are transformed into continuous ones (via one-hot encoding), which I know is not appropriate since FAMD treats categorical variables as such.

Therefore, I wanted to ask if you could provide some guidance on how to correctly apply SVD in the case of mixed data or what the structure of the correlation matrix should be.


Best regards,

A.M. 

Pontificia Universidad Javeriana

François Husson

unread,
Jul 23, 2024, 4:15:30 AM7/23/24
to factomin...@googlegroups.com
Dear Aquinas,

When FAMD is performed, the continuous variables are considered as they are considered in PCA (so centered and standardized), and the categorical variables are  considered as they are for Multiple correspondence analysis, so a disjunctive data table is calculated and each dummy variable is divided by sqrt(I/I_k) and then centered. I is the total number of individuals (the number of rows), and I_k is the number of individual that take category k.
Then, on the global matrix (see below the matrix  in purple) an svd is performed.
Remark: the maximum number of dimensions is equal to the sum of the number of continuous bariables and the total number of categories.



Hope it helps
FH
--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "FactoMineR users".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse factominer-use...@googlegroups.com.
Cette discussion peut être lue sur le Web à l'adresse https://groups.google.com/d/msgid/factominer-users/8e990552-d8df-4cc4-b1cd-fab23c7d2220n%40googlegroups.com.

--
François Husson
Department Statistics & Computer Science
L'Institut Agro
65 rue de St-Brieuc - 35042 Rennes
Tel: +33 2 23 48 58 86
https://husson.github.io/
https://www.youtube.com/@HussonFrancois/videos

Aquinas Mind

unread,
Jul 26, 2024, 8:37:57 AM7/26/24
to factomin...@googlegroups.com
Thank you so much for your prompt response, Professor Husson. It was perfectly clear and very helpful. 

Best regards.

Reply all
Reply to author
Forward
0 new messages