We are a service in which we get lots of queries and we go on web to find relevant content out of it.
We classify the content to many domains which we created.
What we were thinking of now as we have huge data created to classify the queries based on the content which we have already classified. Data what we have with us is something like this, we have a folder which is class name (around 1000 classes) and we have many files with the content (text) in it.
In order to create a classifier how do we need to go forward that what we are trying to figure out.
How should we create attributes for the instances ?
Do we have to use FilteredClassifier ?
What are the possible ways we should try in order to create a classifier and query that classifier at run time to get the probable domain for the query?
Do we have to create an arff file manually so that we can use some classifier ?