On 15 November 2011 04:51, Russell Weber
> --
> You received this message because you are subscribed to the Google Groups
> "nltk-users" group.
> To post to this group, send email to nltk-...@googlegroups.com.
> To unsubscribe from this group, send email to
> nltk-users+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/nltk-users?hl=en.
>
Even if there is a way, it's not necessarily a good idea. How do you
determine what that low probability should be? Steven Bird's solution is
simple. Take the least common words in your training set (say, those
which appear only once or twice), and replace them with "UNK." The PCFG
can then learn something about the behavior of rare words from your
training set.
John
==
http://homepages.inf.ed.ac.uk/s0930006/
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.