2009/12/14 Andrew S. <
andrews...@gmail.com>:
> ... I would like to have certain default POS
> tags for certain words overriden, e.g., I would like to assign the
> word "said" to be a determiner (/DT) type.
Your suggested solution seems ok to me, though it risks creating tag
sequences that aren't attested in the corpus on which the tagger was
originally trained, which might be a problem.
Any solution that overrode the tag during the tagging process, as you
suggest, risks causing more serious problems. For instance, a simple
bigram tagger would be led astray if you changed a tag during
processing, creating a sequence of tags it hadn't seen during
training.
A better approach would be to create a tagged corpus using your
favourite tagger, correct it in whatever way you like, then train a
new tagger on that corpus.
-Steven Bird