Hi all,
I'm trying to use rdflib to parse some Turtle files but I'm getting an
exception like this:
rdflib.plugins.parsers.notation3.BadSyntax: at line 76894 of <>:
Bad syntax (expected '.' or '}' or ']' at end of statement) at ^ in:
This is happening with Turtle versions of the NY Times Locations dataset,
which I downloaded from
http://data.nytimes.com. I think there's some
literal (or URI) in the original data that triggers an exception in the
rdflib N3/Turtle parser. The original published data file is in RDF/XML
which can be parsed just fine by rdflib, but when I convert the data into
Turtle using any of three different tools (rdflib, Jena or rapper) the
resulting Turtle files cannot be parsed by rdflib.
I've put up the original data as well as the different Turtle versions
here:
http://www.seco.tkk.fi/u/oisuomin/rdflib-syntax/
The full tracebacks I get parsing the different Turtle versions are at the
end of this message, as well as in the rdflib-script.txt file in the above
directory.
Unfortunately the data is pretty big (170k triples, about 10MB as Turtle)
and the exceptions didn't help me locate the problematic part of the data.
The data file contains Unicode literals in various non-Western scripts
which may or may not be related to the problem.
Any ideas how to fix this?
Best regards,
Osma Suominen
$ python
Python 2.7.3 (default, Aug 1 2012, 05:14:39)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from rdflib import *
>>> g = Graph()
>>> g.parse('locations-rdflib.ttl', format='n3')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/graph.py",
line 918, in parse
parser.parse(source, self, **args)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 2393, in parse
TurtleParser.parse(self,source,conj_graph,encoding)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 2373, in parse
p.loadStream(source.getByteStream())
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 937, in loadStream
return self.loadBuf(stream.read()) # Not ideal
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 943, in loadBuf
self.feed(buf)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 969, in feed
i = self.directiveOrStatement(s, j)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 987, in directiveOrStatement
return self.checkDot(argstr, j)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 1558, in checkDot
argstr, j, "expected '.' or '}' or ']' at end of statement")
rdflib.plugins.parsers.notation3.BadSyntax: at line 76894 of <>:
Bad syntax (expected '.' or '}' or ']' at end of statement) at ^ in:
"...lat>,
<
http://hu.wikipedia.org/wiki/Eilat>,
^<
http://id.wikipedia.org/wiki/Eilat>,
<
http://it.wik..."
>>> g = Graph()
>>> g.parse('locations-jena.ttl', format='n3')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/graph.py",
line 918, in parse
parser.parse(source, self, **args)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 2393, in parse
TurtleParser.parse(self,source,conj_graph,encoding)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 2373, in parse
p.loadStream(source.getByteStream())
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 937, in loadStream
return self.loadBuf(stream.read()) # Not ideal
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 943, in loadBuf
self.feed(buf)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 969, in feed
i = self.directiveOrStatement(s, j)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 987, in directiveOrStatement
return self.checkDot(argstr, j)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 1558, in checkDot
argstr, j, "expected '.' or '}' or ']' at end of statement")
rdflib.plugins.parsers.notation3.BadSyntax: at line 3211 of <>:
Bad syntax (expected '.' or '}' or ']' at end of statement) at ^ in:
"...622290083051> ;
cc:license <
http://creativecommons.org^/licenses/by/3.0/us/> ;
nyt:mapping_strategy
..."
>>> g = Graph()
>>> g.parse('locations-rapper.ttl', format='n3')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/graph.py",
line 918, in parse
parser.parse(source, self, **args)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 2393, in parse
TurtleParser.parse(self,source,conj_graph,encoding)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 2373, in parse
p.loadStream(source.getByteStream())
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 937, in loadStream
return self.loadBuf(stream.read()) # Not ideal
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 943, in loadBuf
self.feed(buf)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 969, in feed
i = self.directiveOrStatement(s, j)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 987, in directiveOrStatement
return self.checkDot(argstr, j)
File
"/usr/local/lib/python2.7/dist-packages/rdflib-3.2.3-py2.7.egg/rdflib/plugins/parsers/notation3.py",
line 1558, in checkDot
argstr, j, "expected '.' or '}' or ']' at end of statement")
rdflib.plugins.parsers.notation3.BadSyntax: at line 52803 of <>:
Bad syntax (expected '.' or '}' or ']' at end of statement) at ^ in:
"...
ms.wikipedia.org/wiki/Berlin>, <
http://pt.wikipedia.org/wiki^/Berlim>,
<
http://qu.wikipedia.org/wiki/Berlin>, <
http://ro...."
>>>
--
Osma Suominen |
Osma.S...@aalto.fi | +358 40 5255 882
Aalto University, Department of Media Technology, Semantic Computing Research Group
Room 2541, Otaniementie 17, Espoo, Finland; P.O. Box 15500, FI-00076 Aalto, Finland