Kmeans and EM comparison using Silhouette index or any possible measures

203 views
Skip to first unread message

Nour Chetouane

unread,
Jun 7, 2021, 6:42:52 AM6/7/21
to python-weka-wrapper
Hello everyone, 
I am trying to compare K-means clustering and EM clustering and therefore  I am wondering if it is possible  to measure silhouette index for EM algorithm using  the weka python wrapper,
I found a weka package called  kValid  that can be added to weka tool however it is only valid for k-means clustering. 

Also, I am wondering if there are other possible methods to evaluate the clustering results of EM and k-means  in PWW or in WEKA GUI knowing that classes to clusters  are not  available for the dataset that i am using. 

Thank you in advance !
 

Peter Reutemann

unread,
Jun 8, 2021, 4:26:13 AM6/8/21
to python-we...@googlegroups.com
I'm not aware of a Weka package for that. However, Eibe posted some Groovy code a while ago which might do what you need:

https://weka.8497.n7.nabble.com/Silhouette-Measures-and-Dunn-Index-DI-in-Weka-td44072.html

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

Nour Chetouane

unread,
Jun 8, 2021, 5:24:36 AM6/8/21
to python-we...@googlegroups.com
Thank you Peter for your reply, I have actually tried this Groovy code but unfortunately it did not work with clusterer = new weka.clusterers.EM().
I believe it is related to the fact that EM does not use distances in clustering so I always get NullPointerException. 
Do you have any idea how to fix this? 

Kind regards,
Nour 

--
You received this message because you are subscribed to a topic in the Google Groups "python-weka-wrapper" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-weka-wrapper/qu-QpIgCVFA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-weka-wra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python-weka-wrapper/4B0F790F-E49C-4885-B508-D83F069AA1B1%40gmail.com.

Peter Reutemann

unread,
Jun 8, 2021, 5:13:46 PM6/8/21
to python-weka-wrapper
> Thank you Peter for your reply, I have actually tried this Groovy code but unfortunately it did not work with clusterer = new weka.clusterers.EM().
> I believe it is related to the fact that EM does not use distances in clustering so I always get NullPointerException.
> Do you have any idea how to fix this?

The distance function needs to be initialized with the data it is
supposed to calculate the distances for, which happens implicitly
within SimpleKMeans.
For EM, you have to do this explicitly:

weka.core.EuclideanDistance distance = new weka.core.EuclideanDistance();
distance.setDontNormalize(true); // Turn normalisation off because
we have standardised the data
distance.setInstances(data); // initialize the distance function

Cheers, Peter

Peter Reutemann

unread,
Jun 8, 2021, 8:17:14 PM6/8/21
to python-weka-wrapper
> > Thank you Peter for your reply, I have actually tried this Groovy code but unfortunately it did not work with clusterer = new weka.clusterers.EM().
> > I believe it is related to the fact that EM does not use distances in clustering so I always get NullPointerException.
> > Do you have any idea how to fix this?
>
> The distance function needs to be initialized with the data it is
> supposed to calculate the distances for, which happens implicitly
> within SimpleKMeans.
> For EM, you have to do this explicitly:
>
> weka.core.EuclideanDistance distance = new weka.core.EuclideanDistance();
> distance.setDontNormalize(true); // Turn normalisation off because
> we have standardised the data
> distance.setInstances(data); // initialize the distance function

I just pushed out a new release of pww3 (0.2.3), which added the
following method to the weka.clusterers module:
avg_silhouette_coefficient

This method is based on Eibe's Groovy code.

Example code is here:
https://github.com/fracpete/python-weka-wrapper3-examples/blob/master/src/wekaexamples/clusterers/silhouette_coefficient.py

Nour Chetouane

unread,
Jun 9, 2021, 3:23:22 AM6/9/21
to python-we...@googlegroups.com
Thanks a lot Peter, this is very helpful !

Best regards,
Nour

--
You received this message because you are subscribed to the Google Groups "python-weka-wrapper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-weka-wra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python-weka-wrapper/CAHoQ12KPEFaQCTs3_4R107V34FVXU9jq-G%3D5Ads55g2OYPFsrA%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages