On Fri, Sep 19, 2014 at 6:43 AM, Rodger Kibble <
rki...@gmail.com> wrote:
> Thanks Alex!
>
> This is interesting, I get a different result from the example in the book.
> This is what they get:
>
>>>> from nltk.corpus import brown
>>>> brown_news_tagged = brown.tagged_words(categories='news',
>>>> tagset='universal')
>>>> tag_fd = nltk.FreqDist(tag for (word, tag) in brown_news_tagged)
>>>> tag_fd.most_common()
> [('NOUN', 30640), ('VERB', 14399), ('ADP', 12355), ('.', 11928), ('DET',
> 11389),
> ('ADJ', 6706), ('ADV', 3349), ('CONJ', 2717), ('PRON', 2535), ('PRT',
> 2264),
> ('NUM', 2166), ('X', 106)]
>
>
> I get 92 'X' and 14 'UNK'. The Xes are indeed all foreign words. The UNKs
> are 'West', 'East' and 'North'. Digging a bit more, they seem to be the
> ones that are originally tagged NR-TL in Brown.