Issue 745 in nltk: problem in Sinica Treebank Sample Corpus

7 views
Skip to first unread message

nl...@googlecode.com

unread,
Apr 22, 2013, 4:20:40 AM4/22/13
to nltk-...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 745 by zhenzhen...@gmail.com: problem in Sinica Treebank Sample
Corpus
http://code.google.com/p/nltk/issues/detail?id=745

I'm using nltk 2.0b8 that comes with Python 2.6.

---------------------
What steps will reproduce the problem? (e.g. include Python source code)
The problem was encountered when looping through parsed trees from the
corpus

import nltk
from nltk.tree import ParentedTree as PT
from nltk.corpus import sinica_treebank as sinica
for t in sinica.parsed_sents():
for s in PT.convert(t).subtrees():
if (s.node.startswith('VF') and \
s.right_sibling and \
s.right_sibling.node.startswith('VP') and\
isinstance(s[0], str):
print t

----------------------
What is the expected output? What do you see instead?
Error message from File "/usr/lib/pymodules/python2.6/nltk/tree.py", line
568, in _parse_error
raise ValueError(m
ValueError: Tree.parse(): expected '(' but got 'end-of-string'
at index 1.
" "
^

--------------------------
Please use labels and text to provide additional information.
The error was caused by problem in the data file for sinica treebank,
nltk_data/corpora/sinica_treebank/parsed
At line 5349 "#963:00963..[0]VP(evaluation:Dbb:仍然...", a space " " is
needed between "[0]" and "VP".
The error disappears once a space is inserted at the position said above.


--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings
Reply all
Reply to author
Forward
0 new messages