CRFsuite confidence values

68 views
Skip to first unread message

pa...@ucw.cz

unread,
May 14, 2015, 12:38:45 PM5/14/15
to cleart...@googlegroups.com
Hi! I'd like to ask if there is any straightforward way I could use the cleartk's CRFsuite wrapper to not just get chunks but also confidence values generated by the crfsuite when passed -p.
It seems to me that I'd have to create my own fork of cleartk and modify *a lot* of pieces (the wrapper, the generic sequence tagging interface to carry something more complicated than a string as an outcome, the chunker, plus all the abstractions) - so at that point, it's probably more straightforward to use some more down-to-earth crfsuite wrapper than the whole cleartk. Or did I miss anything?

Thanks!
  -- Petr Baudis

Steven Bethard

unread,
May 18, 2015, 11:18:20 AM5/18/15
to cleart...@googlegroups.com
On Wed, May 13, 2015 at 7:21 PM, <pa...@ucw.cz> wrote:
> Hi! I'd like to ask if there is any straightforward way I could use the
> cleartk's CRFsuite wrapper to not just get chunks but also confidence values
> generated by the crfsuite when passed -p.
> It seems to me that I'd have to create my own fork of cleartk and modify *a
> lot* of pieces (the wrapper, the generic sequence tagging interface to carry
> something more complicated than a string as an outcome, the chunker, plus
> all the abstractions)

I don't think you need to modify any interfaces. Basically what's
missing is that `CrfSuiteStringOutcomeClassifier` only implements
`classify` from the `SequenceClassifier` interface, and does not
implement the `score` method:

https://github.com/ClearTK/cleartk/blob/master/cleartk-ml/src/main/java/org/cleartk/ml/SequenceClassifier.java#L56

So "all" you need to do is implement the `score` method on
`CrfSuiteStringOutcomeClassifier`. Looking at the implementation, I'd
guess you probably want to add a `scoreFeatures` method to
`CrfSuiteWrapper` and then delegate to that (as is done for the
`classify` method).

I'm not so familiar with the `CrfSuiteWrapper` class, but if you can
see how to add a `scoreFeatures` method to that class, you should be
95% of the way to your solution, I'd think.

Steve

pa...@ucw.cz

unread,
May 21, 2015, 12:39:43 PM5/21/15
to cleart...@googlegroups.com
I see, thank you for that explanation - I wasn't aware of the score() method.
One would still need to sub-class the BioChunker, but that's no big deal
(and I needed to do it anyway as I needed to pass yet another attribute).

(In the end, I switched the classifier to use jcrfsuite instead, though; because
I need to classify() very often instead of in large batches, so per-execution
overhead gets significant, and because the crfsuitewrapper in cleartk is GPL2.
If anyone wants to use jcrfsuite for classification (not training) within cleartk,
this ugly quick hack of mine might be a starting point:
  https://github.com/brmson/yodaqa/blob/f57bb467cc9ceec32d3f7f0bfb476796c0128a9c/src/main/java/cz/brmlab/yodaqa/provider/crf/CRFSuite.java
I might be able to take the time to make a proper Cleartk classifier out of this
and try to contribute it back sometime in the future, if I continue to use it.)
Reply all
Reply to author
Forward
0 new messages