StanfordTagger not accessing environment variables

1,880 views
Skip to first unread message

dgraham

unread,
Nov 6, 2012, 12:27:48 AM11/6/12
to nltk-...@googlegroups.com
Hi!

I have just installed both NLTK and the Stanford POS Tagger on Windows 7 Ultimate. I have set

CLASSPATH to include: C:\stanford-postagger-full-2012-07-09\stanford-postagger.jar
STANFORD_MODELS: C:\stanford-postagger-full-2012-07-09\models\

But when I try to use nltk.tag.stanford.StanfordTagger I get errors about unset environment variables. (See below)
Does anyone know what I've done wrong, or how to fix this?

Thank you very much,

Dougal Graham


ERRORS:
If I don't set the jar file location:

Traceback (most recent call last):
  File "tagging.py", line 3, in <module>
    tg = nltk.tag.stanford.StanfordTagger('bidirectional-distsim-wsj-0-18.tagger')
  File "C:\Python27\lib\site-packages\nltk\tag\stanford.py", line 42, in __init__
    verbose=verbose)
  File "C:\Python27\lib\site-packages\nltk\internals.py", line 597, in find_jar
    raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div))
LookupError:

===========================================================================
  NLTK was unable to find ! Set the CLASSPATH environment variable.

If I explicitly set the jar file location:

Traceback (most recent call last):
  File "tagging.py", line 3, in <module>
    tg = nltk.tag.StanfordTagger('bidirectional-distsim-wsj-0-18.tagger', 'C:\\stanford-postagger-full-2012-07-09\\stanford-postagger.jar')
  File "C:\Python27\lib\site-packages\nltk\tag\stanford.py", line 45, in __init__
    env_vars=('STANFORD_MODELS'), verbose=verbose)
  File "C:\Python27\lib\site-packages\nltk\internals.py", line 512, in find_file
    raise LookupError('\n\n%s\n%s\n%s' % (div, msg, div))
LookupError:

===========================================================================
NLTK was unable to find the bidirectional-distsim-wsj-0-18.tagger file!
Use software specific configuration paramaters or set the STANFORD_MODELS enviro
nment variable.
===========================================================================

Alexis Dimitriadis

unread,
Nov 6, 2012, 5:08:23 AM11/6/12
to nltk-...@googlegroups.com
The second version of your code SHOULD have worked, but there's a bug in stanford.py that makes STANFORD_MODELS useless (more below). To get around it, give the full path to your model.

That's enough to get your second version to work. But also, CLASSPATH should be a list of folders, not files. Set

    CLASSPATH=<other stuff>;C:\stanford-postagger-full-2012-07-09

and you'll be able to call it like this:

    tg = nltk.tag.stanford.StanfordTagger(r'C:\stanford-postagger-full-2012-07-09\models\bidirectional-distsim-wsj-0-18.tagger')


The bug: In StanfordTagger.__init__, the following code is intended to pass a tuple to env_vars, but passes a string instead.

        self._stanford_model = find_file(path_to_model,
                env_vars=('STANFORD_MODELS'), verbose=verbose)

The fix is to add a comma just after the literal string, like this:

        self._stanford_model = find_file(path_to_model,
                env_vars=('STANFORD_MODELS',), verbose=verbose)

Regards,

Alexis
--
 
 

dgraham

unread,
Nov 6, 2012, 9:25:15 PM11/6/12
to nltk-...@googlegroups.com
Hi Alexis,

Thanks for the advice. I made both those changes (and it turns out I had the file name for the tagger file wrong too so I fixed that) and now it works IFF I give the path to the jar file:

This is OK: tg = nltk.tag.stanford.StanfordTagger(r'wsj-0-18-bidirectional-distsim.tagger', r'C:\\stanford-postagger-full-2012-07-09\\stanford-postagger.jar')

This is not working: tg = nltk.tag.stanford.StanfordTagger(r'wsj-0-18-bidirectional-distsim.tagger')

So, it is still not recognizing the CLASSPATH. Are there any other possible reasons for it? I find it strange that the error is reporting

"Unable to find !" as if there is a problem with finding an empty string?

Thanks for your help,

Dougal

Morten Minde Neergaard

unread,
Nov 24, 2012, 8:38:27 AM11/24/12
to dgraham, nltk-...@googlegroups.com
At 18:25, Tue 2012-11-06, dgraham wrote:
[…]
> This is not working: tg =
> nltk.tag.stanford.StanfordTagger(r'wsj-0-18-bidirectional-distsim.tagger')
>
> So, it is still not recognizing the CLASSPATH. Are there any other possible
> reasons for it? I find it strange that the error is reporting

Sorry for the belated answer. I think the StanfordTagger is intended to
be a semi-abstract interface for the rest of the stanford tagger stuff.
Try using stanford.POSTagger.

We should maybe raise an exception if the main StanfordTagger class is
constructed. As a matter of fact, let me get right on that =)


Smiles,
--
Morten Minde Neergaard
Reply all
Reply to author
Forward
0 new messages