Hi
(sorry if such questions should not be asked here!)
I am going to complete the weka-ml-weka submodule. I have a question regarding the WekaFeaturesEncoder:
In the current implementation, if a feature is not an instance of Number, Boolean or Enum, the feature will be considered as a String weka attribute. String attributes are very problematic in Weka and most of classifiers can not be learned on these type of attributes. So, in my view, it is better to avoid String attributes. My suggestion is all attributes except Number, Boolean are considered encode as Nominal attributes (such as the current implementation for enums and outcomes), unless the user explicitly mentions that the feature should be encoded as a String attribute. Is this approach fine?
Thanks,
Majid
/*******************************************
* Majid Laali, Ph.D. Candidate,
* Computer Science & Software Engineering Department
* Concordia University
*******************************************/