WEKA : Apply kmeans clustering on multiple files

164 views
Skip to first unread message

Arash Haydaryan

unread,
Jun 23, 2014, 1:21:28 AM6/23/14
to wekamooc...@googlegroups.com

I am new to WEKA and have never worked with command line. Please help me to do the following using command line: I have 100 .csv files and would like to apply the kmeans algorithm on all of them. Using Explorer option(the user friendly interface), I can do it on files one by one, and as a result, one column will be added to my .csv file which contains the cluster number. Now I need to do it in command line for several files.

How can I apply kmeans clustring(unsupervised, add attribute ,20 seeds, 40 clusters, Euclidean distance, ignore first two attributes(first two columns in .csv files)) on those 100 .csv files located in d:\csvFiles folder ?


Thank you,
Arash 

Sujit Pal

unread,
Jun 23, 2014, 2:00:27 PM6/23/14
to wekamooc...@googlegroups.com
Hi Arash, you could just concatenate all the files together into one large file. If on Unix, you can do something like this:

cd /directory/for/csv/files
cat *csv > /tmp/all_files.csv

On Windows you could use cygwin (its a long download but worth it). If you don't want to use cygwin, you could write a simple script with Perl or Python to do this.

-sujit

Arash Haydaryan

unread,
Jun 23, 2014, 5:59:02 PM6/23/14
to wekamooc...@googlegroups.com
Hi Sujit,

Thanks for your answer, I have to apply the algorithm on them one by one, I should now concatenate them. BTW what is the command line to apply the algorithm on one file for the following details ? :  unsupervised, add attribute ,20 seeds, 40 clusters, Euclidean distance, ignore first two attributes(first two columns in .csv files

Thank you,
Arash

Sujit Pal

unread,
Jun 25, 2014, 9:55:08 PM6/25/14
to wekamooc...@googlegroups.com
Hi Arash,

You can use the command built in the Explorer as a result of your customizations in the CLI and just add in the csv filename (with -t parameter). Alternatively you can run directly from command line (by prefixing the CLI command with java -classpath weka.jar).

-sujit

Arash Haydaryan

unread,
Jun 26, 2014, 6:17:07 AM6/26/14
to wekamooc...@googlegroups.com
Thanks Sujit, 
There are 2 main issues : 

1. My main purpose is to apply the algorithm to "several files" lets say in d:\files . I dont want to apply it to them one by one, I am looking for a command that can go through and apply it to all , 

2.The command line that I get from Explorer, does not give the details that how can I give the address of csv files to read and how can I give the destination address to save the clustered  csv files, Explorer gives me only this : 

weka.filters.unsupervised.attribute.AddCluster -W "weka.clusterers.SimpleKMeans -N 5 -A \"weka.core.EuclideanDistance -R first-last\" -I 500 -S 10" 


--
You received this message because you are subscribed to a topic in the Google Groups "WekaMOOC-general" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/wekamooc-general/p4n3nKdwCXw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to wekamooc-gener...@googlegroups.com.
To post to this group, send email to wekamooc...@googlegroups.com.
Visit this group at http://groups.google.com/group/wekamooc-general.
To view this discussion on the web, visit https://groups.google.com/d/msgid/wekamooc-general/ccdc33ae-09f4-490f-b7b5-e2c1b6f693f5%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Arash Heidarian
Software Engineer and Software Architect 
Reply all
Reply to author
Forward
0 new messages