TcAnnotator annotations seem to be wrong

10 views
Skip to first unread message

dom...@ambiverse.com

unread,
Mar 15, 2016, 8:36:57 AM3/15/16
to dkpro-tc-users
Hi,

Using CRFSuite I have trained and evaluated a CRF for a named entity recognition task. The evaluation using ExperimentTrainTest showed that the performance is close to 90%, which means that most predictions of the CRF should be in-line with the gold standard from the test set.
However, when I use the trained model with TcAnnotator to annotate the training set with the predictions from the CRF all annotations are terribly off. Here's an example sentence:

Gold standard:
LOC[Japan] began the defence of their MISC[Asian Cup] title with a lucky 2-1 win against LOC[Syria] in a Group C championship match on Friday .

Predictions:
Japan began the defence PER[of their Asian Cup title] with LOC[a lucky 2-1] win LOC[against] Syria LOC[in a Group] C LOC[championship] match on Friday .

At first I would assume that the CRF is just not working well, but the results of the experiments indicate that it should perform much better.


Any help/ideas are very much appreciated.


Dominic

Johannes Daxenberger

unread,
Mar 15, 2016, 8:53:48 AM3/15/16
to dom...@ambiverse.com, dkpro-tc-users
Hi Dominic,

Which version of DKPro TC are you using?
Did you make sure that the preprocessing is performed the same way on the unlabeled data as has been done while training the model?

Best,
Johannes



Am 15.03.16 13:36 schrieb "dkpro-t...@googlegroups.com im Auftrag von dom...@ambiverse.com" <dkpro-t...@googlegroups.com im Auftrag von dom...@ambiverse.com>:
>--
>You received this message because you are subscribed to the Google Groups "dkpro-tc-users" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-tc-user...@googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.

dom...@ambiverse.com

unread,
Mar 15, 2016, 9:06:03 AM3/15/16
to dkpro-tc-users, dom...@ambiverse.com
Hi Johannes,

> Which version of DKPro TC are you using?

I am using DKPro TC 0.8.0-SNAPSHOT.


> Did you make sure that the preprocessing is performed the same way on the unlabeled data as has been done while training the model?

Yes I double-checked that the annotations when doing the annotations with TcAnnotator are equal to ExperimentSaveModel.setPreprocessing().


Best,
Dominic


On Tuesday, March 15, 2016 at 1:53:48 PM UTC+1, Johannes Daxenberger wrote:
> Hi Dominic,
>
> Which version of DKPro TC are you using?
> Did you make sure that the preprocessing is performed the same way on the unlabeled data as has been done while training the model?
>
> Best,
> Johannes
>
>
>

> Am 15.03.16 13:36 schrieb "dkpro-t...@googlegroups.com im Auftrag von " <dkpro-t...@googlegroups.com im Auftrag von >:

dom...@ambiverse.com

unread,
Mar 15, 2016, 11:20:02 AM3/15/16
to dkpro-tc-users, dom...@ambiverse.com
I found the solution:

The problem was that my source data already had annotations for TextClassificationSequence and TextClassificationUnit. The TcAnnotator added these annotations as well, thereby doubling the text that was classified. After removing the TextClassificationSequence and TextClassificationUnit from the jCas everything worked fine :)

Thought I should share this.

Dominic

Johannes Daxenberger

unread,
Mar 17, 2016, 6:19:11 AM3/17/16
to dom...@ambiverse.com, dkpro-tc-users
Great :)
Thanks for sharing.

Best,
Johannes

-----Ursprüngliche Nachricht-----
Von: dkpro-t...@googlegroups.com [mailto:dkpro-t...@googlegroups.com] Im Auftrag von dom...@ambiverse.com
Gesendet: Dienstag, 15. März 2016 16:20
An: dkpro-tc-users
Cc: dom...@ambiverse.com
Betreff: [dkpro-tc-users] Re: TcAnnotator annotations seem to be wrong
Reply all
Reply to author
Forward
0 new messages