i tried to query some live DBPedia documents but do not succeed since the retrieved content is not valid RDF/XML.
Validating the DBpedia URI for the brazilian national soccer team [1] returns the following error using the W3C validator [2]
Fatal Error Messages
FatalError: Element or attribute do not match QName production: QName::=(NCName':')?NCName. [Line = 500, Column = 8]
[1]http://dbpedia.org/resource/Brazil_national_football_team
[2]http://www.w3.org/RDF/Validator/ARPServlet?URI=http%3A%2F%2Fdbpedia.org%2Fresource%2FBrazil_national_football_team&PARSE=Parse+URI%3A+&TRIPLES_AND_GRAPH=PRINT_TRIPLES&FORMAT=PNG_EMBED
On 02/24/12 15:14, Juergen Umbrich wrote:
> i tried to query some live DBPedia documents but do not succeed since
> the retrieved content is not valid RDF/XML.
I can confirm that:
$ rapper -c "http://dbpedia.org/resource/French_Guiana"
rapper: Parsing URI http://dbpedia.org/resource/French_Guiana with
parser rdfxml
rapper: Error - URI /data/French_Guiana.xml:158 - Using property element
'Description' without a namespace is forbidden.
rapper: Error - URI http://dbpedia.org/resource/French_Guiana -
Resolving URI failed: Failed writing body (0 != 1188)
rapper: Failed to parse URI http://dbpedia.org/resource/French_Guiana
rdfxml content
I've also seen unescaped &'s in DBpedia.
I've cc'ed dbpedia-d...@lists.sourceforge.net to notify the
DBpedia guys.
Best regards,
Andreas.
PS. there's been several issues mentioned lately, e.g. Axel's mail
from 2012-01-04, or my mail from 2011-08-09 on publi...@w3.org.
FWIW, I had posted a similar post/bug-report (also reporting invalid XML on DBPedia) on the pedantic-web list a while ago:
https://groups.google.com/group/pedantic-web/browse_thread/thread/651ed89bd18e189a#
best,
Axel
all of these errors stem from the problem that not all
RDF triples can be represented in RDF/XML. [1]
(IMHO, a shortcoming in the RDF/XML spec that could
easily have been fixed by introducing something like
<rdf:Property rdf:URI="http://some/uri_(can't_be_xml)">,
similar to <rdf:Description rdf:about="http://some/uri">.)
As Jeen Broekstra wrote on dbpedia-discussion in August 2011 [2]:
"The only reliable way around the problem is to use a serialization
format that does cope with all legal RDF properly, such as N-Triples or
Turtle."
But still, when someone really wants RDF/XML, what
should Virtuoso do with triples that can't be serialized?
In some cases, there actually is a possible representation.
For example, the property URI
http://dbpedia.org/property/2ndregionalCupApps
could be represented as
<p:ndregionalCupApps xmlns:p="http://dbpedia.org/property/2">
Weird and confusing for humans, no problem for computers.
In those cases that can't be represented in RDF/XML,
the spec says 'throw a "this graph cannot be serialized
in RDF/XML" exception or error' [1]. Probably not a good
solution for us. I think Virtuoso should omit such triples
from RDF/XML, but include something like a comment
in their place that they were omitted and are available in
other formats (like NTriples).
Regards,
Christopher
[1] http://www.w3.org/TR/REC-rdf-syntax/#section-Serialising
[2] http://sourceforge.net/mailarchive/forum.php?thread_name=4E443EE5.9020309%40gmail.com&forum_name=dbpedia-discussion
Hi,
all of these errors stem from the problem that not all RDF triples can be represented in RDF/XML. [1]
(IMHO, a shortcoming in the RDF/XML spec that could easily have been fixed by introducing something like
<rdf:Property rdf:URI="http://some/uri_(can't_be_xml)">, similar to <rdf:Description rdf:about="http://some/uri">.)
Another alternative is to output these triples in reified form, like:
<rdf:Statement>
<rdf:subject rdf:resource="http://dbpedia.org/resource/a"/>
<rdf:predicate rdf:resource="http://dbpedia.org/property/1234"/>
<rdf:object rdf:resource="http://dbpedia.org/resource/b"/>
</rdf:Statement>
Best regards,
Niklas
> I think we should omit such triples from RDF/XML, but
> include something like a comment in their place that
> they were omitted and are available in other formats
> (like NT).
:-)
looks great, but alas, the reified form of a statement is not the same
thing as that statement. From the RDF semantics spec [1]:
A reification of a triple does not entail the triple, and is not
entailed by it. The reification only says that the triple token exists
and what it is about, not that it is true.
For example, see test002 and test005 referenced in the RDF/XML
specification [2]. They only differ in that test005 also includes the
reified form of the statement, but they represent different graphs.
RDF and RDF/XML do not un-reify statements. (Maybe some tools do, I
don't know.)
Summary from the RDF/XML spec:
There are some RDF Graphs [...] that cannot be serialized in RDF/XML. [3]
All we can do is omit such triples, or make sure that we do not use
property names that are affected by this problem.
Regards,
Christopher
[1] http://www.w3.org/TR/rdf-mt/#Reif
[2] http://www.w3.org/TR/rdf-syntax-grammar/#emptyPropertyElt
[3] http://www.w3.org/TR/rdf-syntax-grammar/#section-Serialising
2012/4/9 Niklas Lindström <linds...@gmail.com>:
<rdf:Statement>
<rdf:subject rdf:resource="http://dbpedia.org/resource/a"/>
<rdf:predicate rdf:resource="http://dbpedia.org/property/1234"/>
<rdf:object rdf:resource="http://dbpedia.org/resource/b"/>
<my:isTrue rdf:datatype="...#boolean">false</my:isTrue>
</rdf:Statement>
Un-reifying triples would be problematic in the general case.
Cheers,
Aidan
On 09/04/2012 21:38, Jona Christopher Sahnwaldt wrote:
> Hi,
>
> looks great, but alas, the reified form of a statement is not the same
> thing as that statement. From the RDF semantics spec [1]:
>
> A reification of a triple does not entail the triple, and is not
> entailed by it. The reification only says that the triple token exists
> and what it is about, not that it is true.
>
> For example, see test002 and test005 referenced in the RDF/XML
> specification [2]. They only differ in that test005 also includes the
> reified form of the statement, but they represent different graphs.
> RDF and RDF/XML do not un-reify statements. (Maybe some tools do, I
> don't know.)
>
> Summary from the RDF/XML spec:
>
> There are some RDF Graphs [...] that cannot be serialized in RDF/XML. [3]
>
> All we can do is omit such triples, or make sure that we do not use
> property names that are affected by this problem.
>
> Regards,
> Christopher
>
> [1] http://www.w3.org/TR/rdf-mt/#Reif
> [2] http://www.w3.org/TR/rdf-syntax-grammar/#emptyPropertyElt
> [3] http://www.w3.org/TR/rdf-syntax-grammar/#section-Serialising
>
> 2012/4/9 Niklas Lindstr�m<linds...@gmail.com>:
Yes, you're absolutely right. I knew there was a difference between a
reified form and the triple it represents, but I wasn't sure if it was
entailed. Thanks for the clarification. Either way I hadn't expected
tools to interpret them directly (and it seems they mustn't).
I mostly thought of reification as being a bit simpler than using OWL,
but with no such entailment it doesn't fully work. Still, keeping the
triples in reified form (with some provenance attached), as a
structural comment if you will, may be better than omitting them?
Otherwise, Simon's suggestion to generate owl:sameAs statements for
stand-in properties seems like a good idea.
(Though admittedly I haven't had to deal with this specific problem,
so I can't say much about what's most valuable in practice.)
Best regards,
Niklas
>> 2012/4/9 Niklas Lindström<linds...@gmail.com>:
HTTP status code "406 Not Acceptable" could be used for that.
Anything that takes account of the unrepresentability and changes the
data is requiring the client to be aware of possible alternative
representation. That's a bit painful without some indication.
Maybe the server should just choose to provide a format that is possible
- the client will know from the "Accept:"
Andy
PS "303 See Other" is another possibility ... oh wait ... that's already
been used elsewhere.