I am trying to find a correct implementation of using the col.w argument in the PCA function, together with an HCPC clustering, but cannot find much information or examples online.
My main question: can you use col.w for giving a higher weight to some of the active variables when you want the clustering to be more defined by those variables?
Take the decathlon dataset, where you have the scores for each discipline as active variables. Let's say that you want to cluster the athletes not only based on their performance, but also (and more importantly) on some other physical traits (e.g. weight, height and fat%).
Are there limitations/implications for interpreting the results of the HCPC clustering (mostly res.hcpc$desc.var) when col.w are used?
> d.PCA <- decathlon %>%
> filter(Competition == 'OlympicG') %>%
> select(c(1:5,12)) %>%
> mutate(weight = sample(seq(50,100,0.5),28, replace = T),
> height = sample(seq(150,210),28, replace = T),
> fat = sample(seq(10,30),28, replace = T))
> res.pca <- PCA(d.PCA, ncp = Inf, quanti.sup = 6,
> col.w = c(0.5,0.5,0.5,0.5,0.5,1,1,1))
> res.hcpc <- HCPC(res.pca)
Note that all dimensions are used for clustering here. Are there implications when you take, for example, only the first five dimensions?