On Mon, Mar 16, 2015 at 6:08 AM, <
pravin...@gmail.com> wrote:
> I had following question:
> - What classifier should be used for this type of data? Is Maxent
> appropriate for this data?
Maxent (a.k.a. logistic regression) would be fine. I'd recommend the
LIBLINEAR one.
> - How I can create an Annotator class to train model using CSV file and
> classify a input sentence in to appropriate outcome?
You'll have to write some UIMA code that loads the CSV file into the
UIMA CAS. I would recommend asking on the UIMA Users mailing list for
this part:
https://uima.apache.org/mail-lists.html. You probably want
to aim for having only the "description" part as text in the CAS, and
the "outcome" part stored somehow in your type system.
Once you have your outcomes and descriptions stored somehow in the
CAS, then you can train a model similarly to what is shown in the
chunking example:
https://code.google.com/p/cleartk/wiki/TutorialNamedEntityChunkingClassifier
Though in your case, you probably want just a CleartkAnnotator instead
of a CleartkSequenceAnnotator (since your outcomes are determined only
by the description, and not by their order in the CSV file).
Hope that helps,
Steve