Using CRFSuite I have trained and evaluated a CRF for a named entity recognition task. The evaluation using ExperimentTrainTest showed that the performance is close to 90%, which means that most predictions of the CRF should be in-line with the gold standard from the test set.
However, when I use the trained model with TcAnnotator to annotate the training set with the predictions from the CRF all annotations are terribly off. Here's an example sentence:
Gold standard:
LOC[Japan] began the defence of their MISC[Asian Cup] title with a lucky 2-1 win against LOC[Syria] in a Group C championship match on Friday .
Predictions:
Japan began the defence PER[of their Asian Cup title] with LOC[a lucky 2-1] win LOC[against] Syria LOC[in a Group] C LOC[championship] match on Friday .
At first I would assume that the CRF is just not working well, but the results of the experiments indicate that it should perform much better.
Any help/ideas are very much appreciated.
Dominic
> Which version of DKPro TC are you using?
I am using DKPro TC 0.8.0-SNAPSHOT.
> Did you make sure that the preprocessing is performed the same way on the unlabeled data as has been done while training the model?
Yes I double-checked that the annotations when doing the annotations with TcAnnotator are equal to ExperimentSaveModel.setPreprocessing().
Best,
Dominic
On Tuesday, March 15, 2016 at 1:53:48 PM UTC+1, Johannes Daxenberger wrote:
> Hi Dominic,
>
> Which version of DKPro TC are you using?
> Did you make sure that the preprocessing is performed the same way on the unlabeled data as has been done while training the model?
>
> Best,
> Johannes
>
>
>
> Am 15.03.16 13:36 schrieb "dkpro-t...@googlegroups.com im Auftrag von " <dkpro-t...@googlegroups.com im Auftrag von >:
The problem was that my source data already had annotations for TextClassificationSequence and TextClassificationUnit. The TcAnnotator added these annotations as well, thereby doubling the text that was classified. After removing the TextClassificationSequence and TextClassificationUnit from the jCas everything worked fine :)
Thought I should share this.
Dominic