Completing weka-ml-weka

33 views
Skip to first unread message

Majid Laali

unread,
Mar 22, 2015, 9:04:29 PM3/22/15
to cleart...@googlegroups.com
Hi

(sorry if such questions should not be asked here!)
I am going to complete the weka-ml-weka submodule. I have a question regarding the WekaFeaturesEncoder:
In the current implementation, if a feature is not an instance of Number, Boolean or Enum, the feature will be considered as a String weka attribute. String attributes are very problematic in Weka and most of classifiers can not be learned on these type of attributes. So, in my view, it is better to avoid String attributes. My suggestion is all attributes except Number, Boolean are considered encode as Nominal attributes (such as the current implementation for enums and outcomes), unless the user explicitly mentions that the feature should be encoded as a String attribute. Is this approach fine?

Thanks, 
Majid

/*******************************************
 *   Majid Laali, Ph.D. Candidate, 
 *   Computer Science & Software Engineering Department
 *   Concordia University
*******************************************/

Steven Bethard

unread,
Mar 24, 2015, 11:34:56 AM3/24/15
to cleart...@googlegroups.com
I don't have a strong preference here, but I also have the impression
that Weka does reasonable things with Nominals, and not so much for
Strings.

In general, I'm happy to trust your judgement on what makes most sense
for the Weka bindings, so yes, go ahead with your approach.

Steve
> --
> You received this message because you are subscribed to the Google Groups
> "cleartk-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cleartk-user...@googlegroups.com.
> To post to this group, send email to cleart...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cleartk-users.
> For more options, visit https://groups.google.com/d/optout.

Philip Ogren

unread,
Mar 29, 2015, 8:03:39 PM3/29/15
to cleart...@googlegroups.com
Hi Majid,

This is great news.  I started the cleartk-ml-weka module some years ago and didn't have time to get it up and running.  I think at one point I had all the pieces of code I needed to make it all work but I just didn't make time to stitch it all together.  One of the things that would be really nice to have is a tutorial that shows how to create an ARFF file using a ClearTK pipeline and then load that into the Weka visualization tool (I can't remember the terminology) and perform feature selection in Weka (which I seem to recall has nice tooling for that.)  Any documentation along these lines will substantially increase the value of any code you might contribute.  

Thanks,
Philip

Majid

unread,
Mar 29, 2015, 8:13:55 PM3/29/15
to cleart...@googlegroups.com
Hi Philip, 

Thank you for your interest. 
You are right. Most of the codes has been there and I just filled remaining parts. Regards to tutorial for Weka, I am busy with a task for the next few months, however, after that I will absolutely work on it. 

Thanks, 
Majid
Reply all
Reply to author
Forward
0 new messages