rdflib.Graph().load() and UTF-8 -- is that possible?

263 views
Skip to first unread message

pici...@gmail.com

unread,
Jan 11, 2019, 1:48:45 PM1/11/19
to rdflib-dev
Hi!

I try to parse and check some sentences in dbpedia graph. While using rdflib.Graph().load(resource) where resource='http://dbpedia.org/resource/Adam_Małysz' I get an error: 'ascii' codec can't encode character '\u0142' in position 21: ordinal not in range(128).
My question is - am I missing something, or there actually is no way to change that ascii encoding to utf-8? I even tried to find the erroneous part in lib's code under debugger, but with no result (the number of steps is massively overwhelming for me, as I don't know the lib well).

Kind regards
Matt

Richard Dijkstra

unread,
Mar 3, 2019, 4:23:44 PM3/3/19
to rdflib-dev
Matt,

What is the query code? The following works fine for me.

from SPARQLWrapper import SPARQLWrapper, JSON

sparql = SPARQLWrapper("http://dbpedia.org/sparql")
#sparql = SPARQLWrapper("http://dbpedia.org/resource/Adam_Małysz")

sparql.setQuery("""
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?label
    WHERE { <http://dbpedia.org/resource/Adam_Małysz> rdfs:label ?label }
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

for result in results["results"]["bindings"]:
    print(result["label"]["value"])

Richard

pici...@gmail.com

unread,
Mar 5, 2019, 6:34:26 PM3/5/19
to rdflib-dev
It was not exactly that (I omitted the unnecessary mess), but the flow is the same:

import rdflib


graph = rdflib.Graph()

person_ns = rdflib.URIRef('http://dbpedia.org/ontology/Person')
object_ns = rdflib.URIRef('http://dbpedia.org/ontology/Place')
organisation_ns = rdflib.URIRef('http://dbpedia.org/ontology/Organisation')
resource = rdflib.URIRef("http://dbpedia.org/resource/Adam Małysz")

graph.value(resource, rdflib.RDFS.label)
graph.load(resource) # << error here

for subject, predicate, object_ in graph:
if subject == resource and object_ in [person_ns, object_ns, organisation_ns]:
response += [
{
'serie': serie,
'subject': u''.join(subject).encode('utf-8'),
'predicate': u''.join(predicate).encode('utf-8'),
'type': u''.join(object_).encode('utf-8')
}
]
break

return response

Later, the sparql.setQuery(..) way worked for my friend.

Jörn Hees

unread,
Mar 6, 2019, 7:49:01 AM3/6/19
to rdfli...@googlegroups.com
Hi,

> On 6 Mar 2019, at 00:34, pici...@gmail.com wrote:
>
> resource = rdflib.URIRef("http://dbpedia.org/resource/Adam Małysz")

2 problems here:
- space isn't allowed in URI
- ł isn't allowed in URI


> graph.value(resource, rdflib.RDFS.label)
> graph.load(resource) # << error here

this mostly looks like an inconsistency between py2 and py3 to me. My guess is that you use py3 where this errs with a UnicodeEncodeError.
In other words: it's possible that you have to "quote" the IRI into a URI yourself here, even though it happened implicitly in py2.


Best,
Jörn

pici...@gmail.com

unread,
Mar 6, 2019, 9:55:20 AM3/6/19
to rdflib-dev
> 2 problems here:
> - space isn't allowed in URI
There should be underscore ("_") as in the original post. I did a mistake while rewriting. When it comes to "ł" sign, I probably should make that formatted as %C5%82.

> this mostly looks like an inconsistency between py2 and py3 to me. My guess is that you use py3 where this errs with a UnicodeEncodeError.
> In other words: it's possible that you have to "quote" the IRI into a URI yourself here, even though it happened implicitly in py2.
Yes, py3. ;)

Thank you. Let's leave that. I got a 5.0 grade for that project already and probably will not use that part of the code anymore (especially that querying by sparql.query() seems better) ;)

Reply all
Reply to author
Forward
0 new messages