tt4j chunk

191 views
Skip to first unread message

laleh

unread,
Nov 8, 2011, 10:19:44 PM11/8/11
to tt4j-users
Hello,

I have a question regarding tt4j, I saw in the change log that we
could use TT4J with chunker models and build a chunker on top of
TT4J.

Could you please explain how?

I did passed the english-chunker.par as parameter to setModel method
of TreeTaggerWrapper but when I run the code for a simple example
"check my homework" it produce the "NN/B-NC" for all words; however
when I pass the same example to the treetagger installed on my machine
it gives me this out put
<VC>
check VV check
</VC>
<NC>
my PP$ my
homework NN homework
</NC>

so I guess I am not correctly using tt4j for phrase chunking.

Could you explain how to use tt4j for phrase chunking?

Richard Eckart de Castilho

unread,
Nov 9, 2011, 12:42:33 PM11/9/11
to tt4j-...@googlegroups.com
Hi,

I should put up an example for chunking on the project page. However, I am very busy at the moment and may not get around to doing that before a few days.

In short: to do chunking, you have to run TreeTagger twice:

1) first use it as a pos-tagger with the POS model
2) then use it as a chunker with the chunker model

For 2), you have to send <word>-<postag> as tokens to the TreeTagger (so the token + "-" + the pos tag returned in 1).

Also for 2) you need to call

setEpsilon(0.00000001);
setHyphenHeuristics(true);

on the TreeTagger instance you use for chunking.

I have no simple example at hand, but the code of the UIMA component we use might give you an idea how to get set up.

http://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.treetagger/src/main/java/de/tudarmstadt/ukp/dkpro/core/treetagger/TreeTaggerChunkerTT4J.java

I'll try to provide a simple example... when I get around to doing that.

Cheers,

-- Richard

Reply all
Reply to author
Forward
0 new messages