Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Part of Speech Tagging
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
jf  
View profile  
 More options Jun 17 2011, 5:55 pm
From: jf <jfis...@gmail.com>
Date: Fri, 17 Jun 2011 17:55:48 -0400
Local: Fri, Jun 17 2011 5:55 pm
Subject: Part of Speech Tagging

Raphael,
If you have a moment, this question could only be answered by you.  It looks
like the tagging is done within the AtD code.  I was wondering why you chose
to do this, rather than use an available 3rd party PoS tagger?  Was it for
performance reasons?  (I know your design document mentioned that
performance was considered critical for the service).  For my needs,
accuracy is going to trump performance and I'm considering swapping out the
tagger with a more accurate one.  As I develop rules, I'm finding that I
need to get very creative in order to work around improperly tagged
sentences.  I end up having to considerably narrow the scope of my rules, or
abandon the rule completely due to so many false positives.  If you have a
minute, I'd be interested in hearing your thoughts on the PoS tagging and
the idea of swapping that portion out.  Any pitfalls?  Suggestions?
Warnings?

Also, if anyone else has modified, or attempted to modify, the AtD PoS
tagging, I would love to hear your experiences.

Thanks,
Jay


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Raphael Mudge  
View profile  
 More options Jun 18 2011, 12:43 am
From: Raphael Mudge <rsmu...@gmail.com>
Date: Fri, 17 Jun 2011 23:43:44 -0500
Local: Sat, Jun 18 2011 12:43 am
Subject: Re: [atd-developers] Part of Speech Tagging
Hi Jay,
Next to myself, you're the most involved in AtD rule development of
anyone I have interacted with to this point. There may be someone else
doing it in secret, but I don't know who they are. :)

AtD's tagger is trigram based and it uses a few rules to correct some
of the trigram tagger output. I don't write rules assuming an accurate
tagging of a sentence. I write rules based on how the tagger
interprets an incorrect sentence. The trigram tagger is not as
accurate as a HMM tagger, but in a situation where a lot of the text
may be wrong, it doesn't make sense to have a super-accurate tagger
either. An error in the sentence (what we're checking for!) may throw
the tagger off. Language Tool uses an even simpler dictionary-based
tagger for similar reasons, see:
http://languagetool.wikidot.com/developing-a-tagger-dictionary

If you change out AtD's tagger, beware that it may break a lot of
rules as they rely on the output of the existing tagger. When I
developed my tagger training and evaluation sets, I used the following
two taggers:

http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/postagger/
http://nlp.stanford.edu/software/tagger.shtml

Fun trivia, these taggers arrive at the same result except for a few
cases. The speed difference between the two is incredible though. The
.jp tagger went through and helped me build my data sets in minutes.
The Stanford tagger had to run for a weekend to do the same thing.

-- Raphael


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
jf  
View profile  
 More options Jun 18 2011, 8:26 am
From: jf <jfis...@gmail.com>
Date: Sat, 18 Jun 2011 08:26:22 -0400
Local: Sat, Jun 18 2011 8:26 am
Subject: Re: [atd-developers] Part of Speech Tagging

Raphael,
Thanks for the background on the tagging.  I was considering the Stanford
parser, which you mentioned.  Sounds like it would not be worth the extra
effort.

Thanks again,
Jay


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »