> This is a question from a very content user of TT4J :-) I'm using the wrapper to tag transcriptions of spoken language inside a Java application. My question is: is it possible to get the tag probabilities out of TT4J? I.e. can I run TreeTagger with the option "-prob" and then somehow use the token handler (or whatever other class) to read out the tag probability assigned by tree tagger for a specific token?
I tried it on the command line using
./tree-tagger -quiet -no-unknown -sgml -token -lemma -prob -threshold 0.1 ../lib/english-par-linux-3.2.bin test.txt
This DT this 1.000000
is VBZ be 1.000000
a DT a 1.000000
simple NN simple 0.733482 JJ simple 0.266518
can MD can 0.968350
test VV test 0.966410
. SENT . 1.000000
You could add the "-prob", "-threshold" and "0.1" (or whatever) arguments using the setArguments() method. If you do that, there are two possibilities
a) TT4J crashes
b) TT4J will report "NN" as a POS a complex lemma like "simple 0.733482 JJ simple 0.266518" to the token handler.
I didn't test it, but it's quite likely that b) is going to happen. Then you could parse the purported lemma reported by TT4J as you need it in your TokenHandler.
I think it might be interesting too, to extend TT4J to properly support probabilities. It should not be too much effort.
Best,
-- Richard
I tried running tree-tagger with the -prob and -threshold flags, but there seems to be a problem here with TreeTagger itself. Normally, TT starts outputting results after a couple of tokens have been passed to it, thus TT4J can run TreeTagger in a streaming mode. However, this does not work with the probabilities are enabled. Not running TT in this streaming mode would terribly slow down processing and it would mean that input needed to be provided sentence by sentence.
-- Richard
> Thanks a lot. I tried passing the appropriate parameters via TT4J and, as you say, the process seems to hang or at least takes ages to process the very first tag. I'll try to find another way of achieving this then...
I am trying to contact Helmut Schmid to see if that is a problem that can be fixed. If so, I'd be happy to add the functionality to TT4J.
-- Richard
Helmut Schmid has released a new version of TreeTagger which resolves the problem. I have added support for the probabilities to the latest SVN version of TT4J now. Can you please test and comment if that works for you? You can find an illustration of how it works here:
As far as I know, only the TreeTagger binary for Linux has been fixed so far. Is that enough for you to test?
If you have comments or problems, please report them to http://code.google.com/p/tt4j/issues/detail?id=13
Best,
-- Richard
Am 18.04.2012 um 09:02 schrieb Bernd Moos:
-- Richard