There was some discussion on the Pedantic Web list about annotating
documents with content types used.
A few alternatives were discussed including:
... dct:format "application/rdf+xml" .
... dct:format [
a dct:IMT;
rdf:value "application/rdf+xml"
].
However, using URIs instead of blank-nodes or literals would seem to be
much more beneficial here since these resources are likely to reappear
very often across different datasets.
Keith Alexander pointed out this page:
Which looks like (to me) the perfect solution. One could do something like:
... dct:format <http://www.w3.org/ns/formats/RDF_XML> .
Unfortunately, some of the dereferenced documents for the URIs contained
within have syntax errors in RDF/XML.
For example, when checking the URI:
http://www.w3.org/ns/formats/data/RDF_XML
The RDF/XML validator gives:
Would it be possible to fix these documents, or maybe forward as
appropriate?
Cheers,
Aidan
I have no idea what is going on. If you take any of those RDF files, and copy the text into the text box for the same validator, it checks all right. When using tabulator in Firefox, it reads it. When I use Firefox to directly display the XML file, it does not experience any problem (though Firefox has a built-in XML parser).
I will have to ask the maintainers of the service for some help here.
Ivan
----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
An option might be to add a step to the makefile that strips the BOM. This can probably be done with a line of perl or awk or whatever.
But yeah the Right Thing to do would be to get the validator fixed.
Richard
Reminds me of the Douglas Adams quote:
"""The major difference between a thing that might go wrong and a thing
that cannot possibly go wrong is that when a thing that cannot possibly
go wrong goes wrong it usually turns out to be impossible to get at or
repair."""
Cheers,
Aidan
On 25/04/2012 13:07, Richard Cyganiak wrote:
> On 25 Apr 2012, at 13:05, Ivan Herman wrote:
>> Ah. I was wondering about something like that. Thanks.
>>
>> The problem is that the RDF/XML files are generated and not edited by hand; the 'real' meat is in the HTML files which are RDFa, and I generate the turtle and RDF/XML files through a Makefile before uploading them. Ie, it is probably done by the RDF/XML serializer of RDFLib, and that is pretty difficult to dig into...
>
> An option might be to add a step to the makefile that strips the BOM. This can probably be done with a line of perl or awk or whatever.
>
> But yeah the Right Thing to do would be to get the validator fixed.
>
> Richard
>
>
>
>
>>
>> Ivan
>>
>> On Apr 25, 2012, at 13:57 , Richard Cyganiak wrote:
>>
>>> Aidan,
>>>
>>> Ivan's file is (pedantically speaking) fine. The error reported by the RDF Validator is a symptom of a Jena bug. The issue is triggered by the presence of a Byte Order Mark at the beginning of the file:
>>>
>>> http://en.wikipedia.org/wiki/Byte_order_mark
>>>
>>> See here for a nice explanation from Rob Vesse, and his related bug report:
>>>
>>> http://www.dotnetrdf.org/blogitem.asp?blogID=37
>>> https://issues.apache.org/jira/browse/JENA-12
>>>
>>> In fairness, the simplest fix would be for Ivan to edit the RDF files and remove the initial byte order mark. Googling for �remove byte order mark� shows various ways of doing that.
I don't think any of these files (neither .rdf nor .ttl) have a BOM at
the beginning of the file.
http://people.w3.org/rishida/utils/bomtester/index.php?filename=http%3A%2F%2Fwww.w3.org%2Fns%2Fformats%2Fdata%2FRDF_XML.rdf
The W3C RDF Validator has also no bug in dealing with RDF/XML files that
have a BOM.
Any other ideas what's going on with
http://www.w3.org/ns/formats/data/RDF_XML.rdf ?
Best,
Andreas
On 4/25/12 1:57 PM, Richard Cyganiak wrote:
> Aidan,
>
> Ivan's file is (pedantically speaking) fine. The error reported by the RDF Validator is a symptom of a Jena bug. The issue is triggered by the presence of a Byte Order Mark at the beginning of the file:
>
> http://en.wikipedia.org/wiki/Byte_order_mark
>
> See here for a nice explanation from Rob Vesse, and his related bug report:
>
> http://www.dotnetrdf.org/blogitem.asp?blogID=37
> https://issues.apache.org/jira/browse/JENA-12
>
> In fairness, the simplest fix would be for Ivan to edit the RDF files and remove the initial byte order mark. Googling for �remove byte order mark� shows various ways of doing that.
It's not a jena issue (the jira issue concerns newer turtle and related
parsers). Having poked around a bit I think it's an issue with the
validator servlet, which does its own character decoding, but I find the
code a bit impenetrable. [1]
Damian
[1]
<http://dev.w3.org/cvsweb/2006/RDFValidator/WEB-INF/src/org/w3c/rdfvalidator/ARPServlet.java?rev=1.6>
+1.
I tried another file under ns/:
<http://www.w3.org/ns/ma-ont.rdf>
=> "Undecodable data when reading URI at byte 24574 using encoding 'UTF-8'."
And then the rdf namespace:
=> "... byte 0 ..."
But <http://people.w3.org/simon/foaf.rdf> was fine.
Hypothesis: validating rdf under the www.w3.org domain is broken.
It may be unrelated to encoding. The error is triggered by any
IOException reading characters from an input stream reader.
Damian