Kmeans clustering with too many categorical variables

1,002 views
Skip to first unread message

sojeong park

unread,
May 14, 2021, 3:36:27 AM5/14/21
to Keras-users
Hello, there

Is there anyone to help me with this?

I'm currently working on Kmeans Clustering.
but I have few questions about it.

There are too many categorical variables in my data.
and I also used one-hot encoding for categorical variables..

But, Clustering results are not that beautiful.
I don't know how to get meaningful results from my data..

Can I use K-means clustering with categorical variables?

If so, what is the best way to deal with this?
It would be really good, if I can get advice from you :)

Thank you so much!

Sai Durga Kamesh Kota

unread,
May 14, 2021, 4:16:00 AM5/14/21
to Keras-users
Hi, sojeong park

If you want to cluster your data with categorical variables then the KMeans Clustering will work but will not give good results. This is because categorical variables won't contribute much in distance from the mean. To solve this problem we can use K-modes clustering which uses the mode as the measure to classify them as clusters. It initially picks random k cluster centroids and checks the similarity score rather than euclidian distance in K-means clustering. But if you have both categorical and continuous variables, we can use k-prototype clustering. To implement it you can use the kmodes python package ( https://pypi.org/project/kmodes/ ). If you want to build more intuition on k-modes clustering, you can refer to this video ( https://www.youtube.com/watch?v=b39_vipRkUo ). Hope this answer helps. 

With Regards,
Sai Durga Kamesh Kota
Data Science Intern
Sony Research India

sojeong park

unread,
May 14, 2021, 5:39:03 AM5/14/21
to Sai Durga Kamesh Kota, Keras-users
Thank you for your reply and detailed explanation!

It helps me a lot.

Have a nice weekend :D


--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/3556f6c5-336d-4f00-908f-0674ecedcd2an%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages