Sparse weighted K-means for groups of mixed-type variables
(Travail en collaboration avec M. Chavent, M. Cottrell, J. Lacaille et A. Mourer)
Assessing the underlying structure of a dataset is often done by training a clustering procedure on the features describing the data. In practice, while the data may be described by a large number of features, only a minority of them may be actually informative with regard to the structure. Furthermore, redundant features may also bias the clustering, whether one speaks of redundancy in the informative or the uninformative features. This presentation aims at illustrating two sparse clustering algorithms designed for mixed data (made of numerical and categorical features). The proposed methods summarise redundant features into groups, and select the most relevant groups of features only in the clustering procedure. The performances and the interpretability of the methods are illustrated on a real-life data set.