Yeah, it's a bit overwhelming and we don't have enough documentation.
Sorry about that!
For part of speech tagging, look at the package
org.cleartk.examples.pos. Basically, the ExamplePOSAnnotator is the
same as the tutorial, BuildTestExamplePosModel shows you how to train
the model, and RunExamplePOSAnnotator shows you how to apply your
trained model to new data.
For named entity chunking, I don't think we have any example code for
training a model. But if you don't mind looking at a slightly
different task, you can see how the chunking code is used in
org.cleartk.timeml.event.EventAnnotator, which is a chunk-based event
annotator. Basically, you define a FeatureExtractor, and then you
create an AnalysisEngineDescription based on configuring a
org.cleartk.chunker.Chunker to use your FeatureExtractor. EventTrain
shows you how to train the model (it trains a few other models as well
at the same time, but you can ignore those), and EventAnnotate shows
you how to apply the model to new data.
For syntactic parsing, we don't really have any code in ClearTK other
than wrappers to various syntactic parsers provided by others (e.g.
OpenNLP, Berkeley, Stanford). So if you want to train a new syntactic
parser, you'll probably have to work through their APIs. If you do end
up going this route, we'd of course welcome any contributions that
made this easier.
Steve
--
Where did you get that preposterous hypothesis?
Did Steve tell you that?
--- The Hiphopopotamus
I bother you again...
I recently have a look again at the code and its seems the pointers (I
mean the name of the classes) you gave me concerning the chunking
(performed in the event package) are not up to date. I do not find any
class named EventTrain or EventAnnotate.
Thank you for helping me again to understand how to develop a chunker
(train and annotate) with ClearTk.
/Nicolas
> --
> You received this message because you are subscribed to the Google Groups "cleartk-users" group.
> To post to this group, send email to cleart...@googlegroups.com.
> To unsubscribe from this group, send email to cleartk-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cleartk-users?hl=en.
>
>
--
Dr. Nicolas Hernandez
Associate Professor (Maître de Conférences)
Université de Nantes - LINA CNRS
http://enicolashernandez.blogspot.com
http://www.univ-nantes.fr/hernandez-n
+33 (0)2 51 12 53 94
+33 (0)2 40 30 60 67
Sorry about that. I'm not quite sure how we lost EventTrain, though
the "correct" way to train an EventAnnotator is now via the TempEval
data, using TempEval2010TaskBExtents. However, that's pretty far from
a simple example now, so I wouldn't look at that. I also discovered
that it doesn't actually help event identification to train it as a
chunking task - you get better accuracy just training it as a word
classification task, so I've subsequently converted EventAnnotator to
a simple CleartkAnnotator instead of a Chunker - which means it's not
a great chunking example anymore.
That said, if you just want to see an example, look at revision 2843,
back when EventAnnotator was still as a Chunker. Here is
EventAnnotator:
Here's EventTrain:
And here's EventAnnotate:
Hope that helps,
Steve