Question about page 51 of NLTK with Python Book

max77

unread,

3 Jul 2017, 07:25:4003/07/2017

to nltk-users

Hello all,

I am on page 51 of the NLTK with Python Book but I am having trouble with some commands...

I am working on this on my Raspberry Pi 3 Jessie and don't know how to make the commands match my linux file system.

This is what I have so far:

>>> from nltk.corpus import BracketParseCorpusReader

>>> corpus_root = r"/home/pi/nltk_data/corpora/penntreebank/parsed/mrg/wsj"

>>>

>>> file_pattern = r".*/wsj_.*\.mrg"

>>> ptb = BracketParseCorpusReader(corpus_root, file_pattern)

Traceback (most recent call last):

File "<pyshell#16>", line 1, in <module>

ptb = BracketParseCorpusReader(corpus_root, file_pattern)

File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/bracket_parse.py", line 49, in __init__

CorpusReader.__init__(self, root, fileids, encoding)

File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/api.py", line 84, in __init__

root = FileSystemPathPointer(root)

File "/usr/local/lib/python2.7/dist-packages/nltk/compat.py", line 221, in _decorator

return init_func(*args, **kwargs)

File "/usr/local/lib/python2.7/dist-packages/nltk/data.py", line 303, in __init__

raise IOError('No such file or directory: %r' % _path)

IOError: No such file or directory: '/home/pi/nltk_data/corpora/penntreebank/parsed/mrg/wsj'

since I am not doing this in windows and don't have a C: drive the bold line was changed.

Any thoughts, tips, or suggestions as to how I can fix this?

-Thanks!

Dimitriadis, A. (Alexis)

unread,

3 Jul 2017, 08:32:0803/07/2017

to nltk-...@googlegroups.com

The format of the path you wrote is correct, so the message must be correct too: You don’t have a folder at the specified path. I assume you actually downloaded the Penn Treebank files? Use a non-Python method (a bash terminal or a GUI navigator, if your environment provides it) to inspect the folder structure and find out where your files actually are. For example, after downloading the “Penn treebank sample” the `.mrg` files are in .../nltk_data/corpora/treebank/combined.

Alexis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

max77

unread,

8 Jul 2017, 15:14:5108/07/2017

to nltk-users

Hello again,

I've been experimenting with different approaches and still aren't making progress. The wsj files with the .mrg extension are in fact located in the folder combined like you said. So that was a good clue. Except now the code is throwing me back empty sets with no objects. Heres the code:

from nltk.corpus import BracketParseCorpusReader

>>> corpus_root = r"/home/pi/nltk_data/corpora/treebank/combined"

>>> file_pattern = r".*/wsj_.*\.mrg"

>>> ptb = BracketParseCorpusReader(corpus_root, file_pattern)

>>> ptb.fileids()

[]

>>> len(ptb.sents())

Traceback (most recent call last):

File "<pyshell#77>", line 1, in <module>

len(ptb.sents())

File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/api.py", line 414, in sents

for fileid, enc in self.abspaths(fileids, True)])

File "/usr/local/lib/python2.7/dist-packages/nltk/corpus/reader/util.py", line 422, in concat

raise ValueError('concat() expects at least one object!')

ValueError: concat() expects at least one object!

Dimitriadis, A. (Alexis)

unread,

8 Jul 2017, 15:43:5208/07/2017

to nltk-...@googlegroups.com

That’s pretty obvious, if you’ll forgive me for saying so: Your `file_pattern` includes a slash, which effectively requires the `mrg` files to be in a subdirectory— but they are not. Just write `file_pattern = r”wsj_.*\.mrg”` and you’re in business.

Incidentally, defining your own reader instance is good practice, but this dataset can be accessed with `from nltk.corpus import treebank`.

Alexis

max77

unread,

9 Jul 2017, 08:46:2709/07/2017

to nltk-users

Terrific! It worked finally. Thanks Alexis for your help. :)

Reply all

Reply to author

Forward