Encoding issues loading nquads with unicode

6 views
Skip to first unread message

Ric Roberts

unread,
Jul 10, 2015, 11:31:24 AM7/10/15
to sta...@clarkparsia.com
If you load an nquads file which contains a string containing unicode characters (e.g. curly quotes) like this:

stardog-admin db create -n testquads -t D \

   -o strict.parsing=false \

   -o query.all.graphs=true \

   -o reasoning.type=none \

  ~Desktop/quad.nq  \


where quad.nq contains:


...then query it back over the SPARQL http interface like this:

curl -H "Accept:text/turtle" http://localhost:5820/testquad/query?query=prefix%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0Aprefix%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0Aprefix%20xsd%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0Aprefix%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0A%0ACONSTRUCT%20%7B%3Fs%20%3Fp%20%3Fo%7D%20WHERE%20%7B%3Fs%20%3Fp%20%3Fo%7D


(that's the encoded string for "CONSTRUCT {?s ?p ?o} WHERE {?s ?p ?o}" ) 


...then you get back null characters in the response (having trouble pasting it, also see attached screenshot!):


<http://example.com/subject> <http://example.com/predicate> "Experimental." .



...Or as n-triples (with Accept:text/plain):


<http://example.com/subject> <http://example.com/predicate> "\uFFFD\uFFFD\uFFFDExperimental\uFFFD\uFFFD\uFFFD." .



I believe turtle should be in UTF8. And while n-triples (as text/plain MIME type at least) could escape them with \u, why is it repeated 3 times?


What are we doing wrong? Is this a bug in stardog?


Thanks!

Ric


Screen Shot 2015-07-10 at 16.20.16.png

Ric Roberts

unread,
Jul 10, 2015, 11:35:54 AM7/10/15
to sta...@clarkparsia.com
Also, is there a reason Stardog doesn't support application/n-triples to request n-triples? (Stardog returns a 406 when you use that as an Accept header).

Michael Grove

unread,
Jul 10, 2015, 3:45:53 PM7/10/15
to stardog
On Fri, Jul 10, 2015 at 11:35 AM, Ric Roberts <r...@swirrl.com> wrote:
Also, is there a reason Stardog doesn't support application/n-triples to request n-triples? (Stardog returns a 406 when you use that as an Accept header).

Yes, this wasn't an accepted mime-type for ntriples (or nquads) until RDF 1.1, which we don't support yet.  We could add the alias for this, but we're using RIO under the hood and in Sesame 2.7.15 it's still expecting US-ASCII.

Probably easier to just use Turtle.

As for the encoding issue, it looks like the character encoding was getting munged between the HTTP server and when the db was parsing the contents of the file for loading it into the db.  The ticket for this is #2382 and we'll fix it for the next release.

Cheers,

Mike
 

--
-- --
You received this message because you are subscribed to the C&P "Stardog" group.
To post to this group, send email to sta...@clarkparsia.com
To unsubscribe from this group, send email to
stardog+u...@clarkparsia.com
For more options, visit this group at
http://groups.google.com/a/clarkparsia.com/group/stardog?hl=en

Ric Roberts

unread,
Jul 14, 2015, 10:35:59 AM7/14/15
to sta...@clarkparsia.com
Thanks - will watch out for the fix. :)
Reply all
Reply to author
Forward
0 new messages