supporting possible tags

10 views
Skip to first unread message

eg...@soas.ac.uk

unread,
Feb 6, 2013, 8:05:23 AM2/6/13
to tt4j-...@googlegroups.com
hi richard,

i've got tt4j running and integrated into a java application, which is
quite handy. i have a question for you, though. i want to send POS
hints to the tagger as per the following remark in the TreeTagger
readme file:

 * <input file>: Name of the file which is to be tagged. Each token in this
  file has to be on a separate line. Tokens may contain blanks. It is possible
  to override the lexical information contained in the parameter file of the
  tagger by specifying a list of possible tags after a token. This list has
  to be preceded by a tab character and the elements are separated by tab
  characters. This pretagging feature could be used e.g. to ensure that
  certain text-specific expressions are tagged properly.

however, as i discovered when i did setPerformanceMode(true) and
setStrictMode(true), including tabs in the tokens i send is not
allowed. for example, i can't send

cook    V

as my token, to force this instance of cook to be a verb (and not a noun, say).

i am a java programmer so am going to start digging into the code, but
i just wanted to check with you to see if you'd thought about the
issue before or had any thoughts.

best,
edward garrett

Richard Eckart de Castilho

unread,
Feb 6, 2013, 8:38:39 AM2/6/13
to tt4j-...@googlegroups.com
Hello Edward,

thank you for using TT4J. Unfortunately, TT4J does not come with built-in support for pre-tagging. The best way to support this would be, to add a sub-class of the TokenAdapter which allows TT4J to fetch pre-tags and forward them to the TreeTagger. In this way, strictMode and performanceMode can be made aware of pre-tagging.

The two modes were added to quickly catch communication problems between TT4J and the TreeTagger process due to problematic data.

You should be able to hack TT4J into allowing pre-tags if you turn off strictMode and performanceMode. It will make the setup a bit more fragile, so you'd have to be careful not to feed data to TreeTagger that it does not like.

Feel free to open a feature request for pre-tagging. You are also welcome to contribute a patch with an extended TokenAdapter and pre-tagging awareness for strictMode and performanceMode. I'm a bit short on time recently, so it may take a while until I get to implementing that feature myself.

Cheers,

-- Richard
Reply all
Reply to author
Forward
0 new messages