How is pos_tag() implemented?

76 views
Skip to first unread message

Oli

unread,
Apr 10, 2010, 1:49:05 PM4/10/10
to nltk-users
Hi,

I am really interested in how the standard-tagging-function pos_tag()
does work.

Normally I would check out the source code, but as its loaded from
a .pickle file that is not possible.

It must be something with:

- ClassifierBasedPOSTagger()
- MaxentClassifier()

It would be great to have the sourcecode of that tagger, to train it
on another corpus than treebank.

I want to describe different tagging methods in my master
thesis...therefor I need to understand, how it works :)

all the best
Oli

Richard Careaga

unread,
Apr 10, 2010, 2:34:23 PM4/10/10
to nltk-...@googlegroups.com
browse http://tinyurl.com/yyruzbv

Oli wrote:
> standard-tagging-function pos_tag()
>

Oli

unread,
Apr 10, 2010, 2:53:05 PM4/10/10
to nltk-users
Thanks, but that did not help me...I am looking for the "unpickled"
code of the standard pos tagger.

On 10 Apr., 20:34, Richard Careaga <leuc...@gmail.com> wrote:
> browsehttp://tinyurl.com/yyruzbv
>
> Oli wrote:
> >   standard-tagging-function pos_tag()

Richard Careaga

unread,
Apr 10, 2010, 5:01:55 PM4/10/10
to nltk-...@googlegroups.com
object =  pickle.load(fileobject)

Steven Bird

unread,
Apr 10, 2010, 5:15:41 PM4/10/10
to nltk-...@googlegroups.com

I've asked Edward Loper to commit the code he used for building this and other trained models.

-Steven Bird


--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To post to this group, send email to nltk-...@googlegroups.com.
To unsubscribe from this group, send email to nltk-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nltk-users?hl=en.

James Smith

unread,
Apr 10, 2010, 7:29:26 PM4/10/10
to nltk-users
I am _very_ interested in knowing whether this includes the NER stuff?

On Apr 10, 10:15 pm, Steven Bird <stevenbi...@gmail.com> wrote:
> I've asked Edward Loper to commit the code he used for building this and
> other trained models.
>
> -Steven Bird
>

> On 11 Apr 2010 03:49, "Oli" <oliver.pes...@googlemail.com> wrote:
>
> Hi,
>
> I am really interested in how the standard-tagging-function pos_tag()
> does work.
>
> Normally I would check out the source code, but as its loaded from
> a .pickle file that is not possible.
>
> It must be something with:
>
> - ClassifierBasedPOSTagger()
> - MaxentClassifier()
>
> It would be great to have the sourcecode of that tagger, to train it
> on another corpus than treebank.
>
> I want to describe different tagging methods in my master
> thesis...therefor I need to understand, how it works :)
>
> all the best
> Oli
>
> --
> You received this message because you are subscribed to the Google Groups
> "nltk-users" group.
> To post to this group, send email to nltk-...@googlegroups.com.
> To unsubscribe from this group, send email to

> nltk-users+...@googlegroups.com<nltk-users%2Bunsu...@googlegroups.com>

Oli

unread,
Apr 11, 2010, 9:19:06 AM4/11/10
to nltk-users
>object = pickle.load(fileobject)

tried this first, but didnt work...

>I've asked Edward Loper to commit the code he used for building this and other trained models.

that would be fantastic. could you please post a link in this thread?

I think, it could be very helpful to see how it works. I want to
evaluate the standard tagger on other corpus and also try some
combinations with other taggers.

all the best
Oli

Oli

unread,
Apr 17, 2010, 6:07:10 AM4/17/10
to nltk-users
Hi,

I need to push that thread again, because I am still very interested
in the implementation of pos_tag().

With the help of Jacob, I tried this one:

cpos = ClassifierBasedPOSTagger(train=train_sents)

That works pretty goog for a first try. I got 92.9% accuracy on the
treebank corpus vs. 96.1% from the pos_tag().

As Jacob mentioned in his blog ( http://streamhacker.com/2010/04/12/pos-tag-nltk-brill-classifier/
), there must be another "secret" with the used classifier or the
featureset.

I hope, somebody could shed some light on that topic.

Tanks in advance!

Oli

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To post to this group, send email to nltk-...@googlegroups.com.
To unsubscribe from this group, send email to nltk-users+...@googlegroups.com.

The Dixie Flatline

unread,
Apr 30, 2010, 11:51:14 AM4/30/10
to nltk-...@googlegroups.com, steve...@gmail.com
On Sat, Apr 10, 2010 at 9:15 PM, Steven Bird <steve...@gmail.com> wrote:
> I've asked Edward Loper to commit the code he used for building this and
> other trained models.

Can we push this request again? Jacob Perkins has been doing some
great work comparing the efficacy of different taggers in different
situations, and having nltk.pos_tag's source to retrain would be very
helpful in this. In particular, I'm very eager to see the results of
tagger agility for the different POS tag implementations he has tested
so far.

I suspect that some types of taggers may behave differently when
trained on all of Brown and tested against specific categories. I'm
also wondering if some types of taggers might behave better than
others when trained and tested on totally different domains of text
(say Brown's romance and tested against Brown's science_fiction).

I'm sure many nltk users would benefit from this information when
writing applications for dealing with text in specific domains, or
multiple domains.


P.S. Why is it always so hard to get source code out of CS academics?
Is it that they don't understand the importance of reproducibility of
results to scientific progress? Or is it that they understand it all
too well...

Steven Bird

unread,
Apr 30, 2010, 7:46:26 PM4/30/10
to The Dixie Flatline, nltk-users
2010/5/1 The Dixie Flatline <td.fl...@gmail.com>:
> On Sat, Apr 10, 2010 at 9:15 PM, Steven Bird <steve...@gmail.com> wrote:
>> I've asked Edward Loper to commit the code he used for building this and
>> other trained models.
>
> Can we push this request again?

Done.

> Jacob Perkins has been doing some
> great work comparing the efficacy of different taggers in different
> situations, and having nltk.pos_tag's source to retrain would be very
> helpful in this. In particular, I'm very eager to see the results of
> tagger agility for the different POS tag implementations he has tested
> so far.

I couldn't agree more.

> P.S. Why is it always so hard to get source code out of CS academics?
> Is it that they don't understand the importance of reproducibility of
> results to scientific progress? Or is it that they understand it all
> too well...

Well, what can I say? Please encourage your local CS NLP academics to
contribute to NLTK!

-Steven Bird

Oli

unread,
Jun 1, 2010, 11:21:51 AM6/1/10
to nltk-users
Were there any news on this topic during the last 4 weeks?

I want to use the information in my masters thesis and there is not
too much time left :)

thanks
Oli

Steven Bird

unread,
Jun 1, 2010, 11:27:55 AM6/1/10
to nltk-...@googlegroups.com
Unfortunately it looks like the code for training these models was
never committed to the NLTK repository, and I have been unsuccessful
in obtaining it. The code shouldn't be hard to write, since it just
counts word and tag ngrams...

-Steven Bird

hari kumar

unread,
May 21, 2015, 9:01:45 AM5/21/15
to nltk-...@googlegroups.com

Does any one know how to fix this error, i have been trying to unpickle and encode in iso types, but nothing seems to work.

UnpicklingError: invalid load key, 'ÿ'.


CODE:

>>> sent = """here we are going to die"""
>>> sent
'here we are going to die'
>>> import nltk
>>> from nltk import*
>>> tok = word_tokenize(sent)
>>> tok
['here', 'we', 'are', 'going', 'to', 'die']
>>> ptag = pos_tag(tok)

ERROR

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    ptag = pos_tag(tok)
  File "E:\Python27\lib\site-packages\nltk\tag\__init__.py", line 103, in pos_tag
    tagger = load(_POS_TAGGER)
  File "E:\Python27\lib\site-packages\nltk\data.py", line 786, in load
    resource_val = pickle.load(opened_resource)

UnpicklingError: invalid load key, 'ÿ'.


I use python 2.7 for execution.

pls, help me!!!

Reply all
Reply to author
Forward
0 new messages