implementing col.w in PCA (and HCPC) for weighting active variables

68 views
Skip to first unread message

Camille Van Eupen

unread,
Jun 9, 2021, 5:29:17 AM6/9/21
to FactoMineR users
Bonjour,

I am trying to find a correct implementation of using the col.w argument in the PCA function, together with an HCPC clustering, but cannot find much information or examples online.

My main question: can you use col.w for giving a higher weight to some of the active variables when you want the clustering to be more defined by those variables?

Take the decathlon dataset, where you have the scores for each discipline as active variables. Let's say that you want to cluster the athletes not only based on their performance, but also (and more importantly) on some other physical traits (e.g. weight, height and fat%). 
Are there limitations/implications for interpreting the results of the HCPC clustering (mostly res.hcpc$desc.var) when col.w are used?

Reproducible example:

> library(FactoMineR)
> library(tidyverse)

> data("decathlon")

> d.PCA <- decathlon %>% 
>   filter(Competition == 'OlympicG') %>% 
>   select(c(1:5,12))  %>% 
>   mutate(weight = sample(seq(50,100,0.5),28, replace = T),
>                  height = sample(seq(150,210),28, replace = T),
>                  fat = sample(seq(10,30),28, replace = T))

> res.pca <- PCA(d.PCA, ncp = Inf, quanti.sup = 6,
>                             col.w = c(0.5,0.5,0.5,0.5,0.5,1,1,1))

> res.hcpc <- HCPC(res.pca)

> res.hcpc$desc.var

Note that all dimensions are used for clustering here. Are there implications when you take, for example, only the first five dimensions? 

Cordialement,
Camille

Francois Husson

unread,
Jun 16, 2021, 4:48:50 AM6/16/21
to factomin...@googlegroups.com
Hi,

If you use the weights on variables, then it will have an impact on the PCA and then on the clusters of the clustering. But once the clusters are done, the charatcterization of each cluster by the variables do not depend on the weights of the variables because you just link the cluster with each variable one by one.

FH
--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "FactoMineR users".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse factominer-use...@googlegroups.com.
Cette discussion peut être lue sur le Web à l'adresse https://groups.google.com/d/msgid/factominer-users/56b1250d-1d7e-4794-9f0b-9e5e4046ec4en%40googlegroups.com.

--
Francois Husson
Department Statistics & Computer science
L'Institut Agro - AGROCAMPUS OUEST
65 rue de St-Brieuc - 35042 RENNES
Tel: +33 2 23 48 58 86
https://husson.github.io

Camille Van Eupen

unread,
Jun 16, 2021, 5:27:19 AM6/16/21
to factomin...@googlegroups.com
Thank you, Francois. I can work with that :)
Kind regards,
Camille


Op wo 16 jun. 2021 om 10:48 schreef Francois Husson <francoi...@agrocampus-ouest.fr>:
Vous recevez ce message, car vous êtes abonné à un sujet dans le groupe Google Groupes "FactoMineR users".
Pour vous désabonner de ce sujet, visitez le site https://groups.google.com/d/topic/factominer-users/jLpYqqVQj7Q/unsubscribe.
Pour vous désabonner de ce groupe et de tous ses sujets, envoyez un e-mail à l'adresse factominer-use...@googlegroups.com.
Cette discussion peut être lue sur le Web à l'adresse https://groups.google.com/d/msgid/factominer-users/1388daf7-a08f-e307-1d78-db99713c6a1e%40agrocampus-ouest.fr.
Reply all
Reply to author
Forward
0 new messages