Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 745 by
zhenzhen...@gmail.com: problem in Sinica Treebank Sample
Corpus
http://code.google.com/p/nltk/issues/detail?id=745
I'm using nltk 2.0b8 that comes with Python 2.6.
---------------------
What steps will reproduce the problem? (e.g. include Python source code)
The problem was encountered when looping through parsed trees from the
corpus
import nltk
from nltk.tree import ParentedTree as PT
from nltk.corpus import sinica_treebank as sinica
for t in sinica.parsed_sents():
for s in PT.convert(t).subtrees():
if (s.node.startswith('VF') and \
s.right_sibling and \
s.right_sibling.node.startswith('VP') and\
isinstance(s[0], str):
print t
----------------------
What is the expected output? What do you see instead?
Error message from File "/usr/lib/pymodules/python2.6/nltk/tree.py", line
568, in _parse_error
raise ValueError(m
ValueError: Tree.parse(): expected '(' but got 'end-of-string'
at index 1.
" "
^
--------------------------
Please use labels and text to provide additional information.
The error was caused by problem in the data file for sinica treebank,
nltk_data/corpora/sinica_treebank/parsed
At line 5349 "#963:00963..[0]VP(evaluation:Dbb:仍然...", a space " " is
needed between "[0]" and "VP".
The error disappears once a space is inserted at the position said above.
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings