DBpedia content not valid RDF/XML

234 views
Skip to first unread message

Juergen Umbrich

unread,
Feb 24, 2012, 9:14:55 AM2/24/12
to PedanticWeb Mailinglist
Hi all

i tried to query some live DBPedia documents but do not succeed since the retrieved content is not valid RDF/XML.

Validating the DBpedia URI for the brazilian national soccer team [1] returns the following error using the W3C validator [2]

Fatal Error Messages
FatalError: Element or attribute do not match QName production: QName::=(NCName':')?NCName. [Line = 500, Column = 8]

[1]http://dbpedia.org/resource/Brazil_national_football_team
[2]http://www.w3.org/RDF/Validator/ARPServlet?URI=http%3A%2F%2Fdbpedia.org%2Fresource%2FBrazil_national_football_team&PARSE=Parse+URI%3A+&TRIPLES_AND_GRAPH=PRINT_TRIPLES&FORMAT=PNG_EMBED

Andreas Harth

unread,
Feb 26, 2012, 6:19:41 AM2/26/12
to pedant...@googlegroups.com, dbpedia-d...@lists.sourceforge.net
Hi Juergen,

On 02/24/12 15:14, Juergen Umbrich wrote:
> i tried to query some live DBPedia documents but do not succeed since
> the retrieved content is not valid RDF/XML.

I can confirm that:

$ rapper -c "http://dbpedia.org/resource/French_Guiana"
rapper: Parsing URI http://dbpedia.org/resource/French_Guiana with
parser rdfxml
rapper: Error - URI /data/French_Guiana.xml:158 - Using property element
'Description' without a namespace is forbidden.
rapper: Error - URI http://dbpedia.org/resource/French_Guiana -
Resolving URI failed: Failed writing body (0 != 1188)
rapper: Failed to parse URI http://dbpedia.org/resource/French_Guiana
rdfxml content

I've also seen unescaped &'s in DBpedia.

I've cc'ed dbpedia-d...@lists.sourceforge.net to notify the
DBpedia guys.

Best regards,
Andreas.

PS. there's been several issues mentioned lately, e.g. Axel's mail
from 2012-01-04, or my mail from 2011-08-09 on publi...@w3.org.

Axel Polleres

unread,
Feb 26, 2012, 3:02:16 PM2/26/12
to pedant...@googlegroups.com, dbpedia-d...@lists.sourceforge.net
Hi all,

FWIW, I had posted a similar post/bug-report (also reporting invalid XML on DBPedia) on the pedantic-web list a while ago:
https://groups.google.com/group/pedantic-web/browse_thread/thread/651ed89bd18e189a#

best,
Axel

Christopher Sahnwaldt

unread,
Feb 28, 2012, 12:15:04 PM2/28/12
to Pedantic Web Group
Hi,

all of these errors stem from the problem that not all
RDF triples can be represented in RDF/XML. [1]
(IMHO, a shortcoming in the RDF/XML spec that could
easily have been fixed by introducing something like
<rdf:Property rdf:URI="http://some/uri_(can't_be_xml)">,
similar to rdf:Description.)

As Jeen Broekstra wrote on this list in August 2011 [2]:

"The only reliable way around the problem is to use a serialization
format that does cope with all legal RDF properly, such as N-Triples
or
Turtle."

But still, when someone really wants RDF/XML, what
should poor Virtuoso do with triples that can't be serialized?

In some cases, there actually is a possible representation.
For example, the property URI
http://dbpedia.org/property/2ndregionalCupApps
could be represented as
<p:ndregionalCupApps xmlns:p="http://dbpedia.org/property/2">
Weird and confusing for humans, no problem for computers.

In those cases that can't be represented in RDF/XML,
the spec says 'If the URI ends in a non-NCName character
then throw a "this graph cannot be serialized in RDF/XML"
exception or error' [1]. Probably not a good solution for us.
I think we should omit such triples from RDF/XML, but
include something like a comment in their place that
they were omitted and are available in other formats
(like NT).

Regards,
Christopher

[1] http://www.w3.org/TR/REC-rdf-syntax/#section-Serialising
[2] http://sourceforge.net/mailarchive/forum.php?thread_name=4E443EE5.9020309%40gmail.com&forum_name=dbpedia-discussion

On Feb 26, 9:02 pm, Axel Polleres <axel.polle...@deri.org> wrote:
> Hi all,
>
> FWIW, I had posted a similar post/bug-report (also reporting invalid XML on DBPedia) on the pedantic-web list a while ago:
>  https://groups.google.com/group/pedantic-web/browse_thread/thread/651...
>
> best,
> Axel
>
> On 26 Feb 2012, at 12:19, Andreas Harth wrote:
>
>
>
>
>
>
>
> > Hi Juergen,
>
> > On 02/24/12 15:14, Juergen Umbrich wrote:
> > > i tried to query some live DBPedia documents but do not succeed since
> > > the retrieved content is  not valid RDF/XML.
>
> > I can confirm that:
>
> > $ rapper -c "http://dbpedia.org/resource/French_Guiana"
> > rapper: Parsing URIhttp://dbpedia.org/resource/French_Guianawith
> > parser rdfxml
> > rapper: Error - URI /data/French_Guiana.xml:158 - Using property element
> > 'Description' without a namespace is forbidden.
> > rapper: Error - URIhttp://dbpedia.org/resource/French_Guiana-
> > Resolving URI failed: Failed writing body (0 != 1188)
> > rapper: Failed to parse URIhttp://dbpedia.org/resource/French_Guiana
> > rdfxml content
>
> > I've also seen unescaped &'s in DBpedia.
>
> > I've cc'ed dbpedia-discuss...@lists.sourceforge.net to notify the
> > DBpedia guys.
>
> > Best regards,
> > Andreas.
>
> > PS. there's been several issues mentioned lately, e.g. Axel's mail
> > from 2012-01-04, or my mail from 2011-08-09 on public-...@w3.org.

Jona Christopher Sahnwaldt

unread,
Feb 28, 2012, 12:20:53 PM2/28/12
to pedant...@googlegroups.com, dbpedia-d...@lists.sourceforge.net
Hi,

all of these errors stem from the problem that not all
RDF triples can be represented in RDF/XML. [1]
(IMHO, a shortcoming in the RDF/XML spec that could
easily have been fixed by introducing something like
<rdf:Property rdf:URI="http://some/uri_(can't_be_xml)">,

similar to <rdf:Description rdf:about="http://some/uri">.)

As Jeen Broekstra wrote on dbpedia-discussion in August 2011 [2]:

"The only reliable way around the problem is to use a serialization
format that does cope with all legal RDF properly, such as N-Triples or
Turtle."

But still, when someone really wants RDF/XML, what

should Virtuoso do with triples that can't be serialized?

In some cases, there actually is a possible representation.
For example, the property URI
http://dbpedia.org/property/2ndregionalCupApps
could be represented as
<p:ndregionalCupApps xmlns:p="http://dbpedia.org/property/2">
Weird and confusing for humans, no problem for computers.

In those cases that can't be represented in RDF/XML,

the spec says 'throw a "this graph cannot be serialized


in RDF/XML" exception or error' [1]. Probably not a good

solution for us. I think Virtuoso should omit such triples


from RDF/XML, but include something like a comment
in their place that they were omitted and are available in

other formats (like NTriples).

Regards,
Christopher

Simon Spero

unread,
Feb 28, 2012, 5:38:57 PM2/28/12
to pedant...@googlegroups.com
On Tue, Feb 28, 2012 at 12:20 PM, Jona Christopher Sahnwaldt <j...@sahnwaldt.de> wrote:
Hi,

all of these errors stem from the problem that not all RDF triples can be represented in RDF/XML. [1]
(IMHO, a shortcoming in the RDF/XML spec that could easily have been fixed by introducing something like
<rdf:Property rdf:URI="http://some/uri_(can't_be_xml)">, similar to <rdf:Description rdf:about="http://some/uri">.)

 There's a kludge that will work where there  is some owl available; either sameAs or equivalentProperty should do the trick. 
1: Generate a legal property name <foo> that will not collide with other dbprop names  (e.g. _ prefixing, and pseudo_unicode_escaping of invalid chars (including '_').
2: generate a label for the property using the original property name.
3: Generate an owl:equivalentProperty axiom relating this property name to the original URI
4: Generate an owl:sameas for the two property URIs.
5: Use the <foo> property to make the value assertion.

 Virtuoso supports these constructs in its SPARQL engine - I'm not sure how much this costs at run-time - Kingsley?

e.g.

for invalid predicate name http://dbpedia.org/property/1234

Declare un-rdf-serializable  property:

    <owl:ObjectProperty rdf:about="&dbprop;1234">
        <owl:equivalentProperty rdf:resource="&dbprop;_1234"/>
        <owl:sameas rdf:resource="&dbprop;_1234"/>
    </owl:ObjectProperty>

Declare rdf-serializable property:
   
 <owl:ObjectProperty rdf:about="&dbprop;_1234">
        <owl:equivalentProperty rdf:resource="&dbprop;1234"/>
        <owl:sameas rdf:resource="&dbprop;1234"/>
    </owl:ObjectProperty>

Use Faux Property:
     <owl:Thing rdf:about="&dbresource;b">
        <dbprop:_1234 rdf:resource="&dbresource;a"/>
    </owl:Thing>


Axel Polleres

unread,
Apr 9, 2012, 3:28:31 AM4/9/12
to Pedantic Web Group
> But still, when someone really wants RDF/XML, what
> should poor Virtuoso do with triples that can't be serialized?

How about just supressing those triples in the output?
That's help already...

Axel

On Feb 28, 12:15 pm, Christopher Sahnwaldt <jcsahnwa...@gmail.com>
wrote:
> Hi,
>
> all of these errors stem from the problem that not all
> RDF triples can be represented in RDF/XML. [1]
> (IMHO, a shortcoming in the RDF/XML spec that could
> easily have been fixed by introducing something like
> <rdf:Property rdf:URI="http://some/uri_(can't_be_xml)">,
> similar to rdf:Description.)
>
> As Jeen Broekstra wrote on this list in August 2011 [2]:
>
> "The only reliable way around the problem is to use a serialization
> format that does cope with all legal RDF properly, such as N-Triples
> or
> Turtle."
>
> But still, when someone really wants RDF/XML, what
> should poor Virtuoso do with triples that can't be serialized?
>
> In some cases, there actually is a possible representation.
> For example, the property URIhttp://dbpedia.org/property/2ndregionalCupApps
> could be represented as
> <p:ndregionalCupApps xmlns:p="http://dbpedia.org/property/2">
> Weird and confusing for humans, no problem for computers.
>
> In those cases that can't be represented in RDF/XML,
> the spec says 'If the URI ends in a non-NCName character
> then throw a "this graph cannot be serialized in RDF/XML"
> exception or error' [1]. Probably not a good solution for us.
> I think we should omit such triples from RDF/XML, but
> include something like a comment in their place that
> they were omitted and are available in other formats
> (like NT).
>
> Regards,
> Christopher
>
> [1]http://www.w3.org/TR/REC-rdf-syntax/#section-Serialising
> [2]http://sourceforge.net/mailarchive/forum.php?thread_name=4E443EE5.902...

Niklas Lindström

unread,
Apr 9, 2012, 6:13:02 AM4/9/12
to pedant...@googlegroups.com
Hi,

Another alternative is to output these triples in reified form, like:

<rdf:Statement>
<rdf:subject rdf:resource="http://dbpedia.org/resource/a"/>
<rdf:predicate rdf:resource="http://dbpedia.org/property/1234"/>
<rdf:object rdf:resource="http://dbpedia.org/resource/b"/>
</rdf:Statement>

Best regards,
Niklas

Jona Christopher Sahnwaldt

unread,
Apr 9, 2012, 3:16:13 PM4/9/12
to pedant...@googlegroups.com
Yes, that's what I suggested:

> I think we should omit such triples from RDF/XML, but
> include something like a comment in their place that
> they were omitted and are available in other formats
> (like NT).

:-)

Jona Christopher Sahnwaldt

unread,
Apr 9, 2012, 4:38:11 PM4/9/12
to pedant...@googlegroups.com
Hi,

looks great, but alas, the reified form of a statement is not the same
thing as that statement. From the RDF semantics spec [1]:

A reification of a triple does not entail the triple, and is not
entailed by it. The reification only says that the triple token exists
and what it is about, not that it is true.

For example, see test002 and test005 referenced in the RDF/XML
specification [2]. They only differ in that test005 also includes the
reified form of the statement, but they represent different graphs.
RDF and RDF/XML do not un-reify statements. (Maybe some tools do, I
don't know.)

Summary from the RDF/XML spec:

There are some RDF Graphs [...] that cannot be serialized in RDF/XML. [3]

All we can do is omit such triples, or make sure that we do not use
property names that are affected by this problem.

Regards,
Christopher

[1] http://www.w3.org/TR/rdf-mt/#Reif
[2] http://www.w3.org/TR/rdf-syntax-grammar/#emptyPropertyElt
[3] http://www.w3.org/TR/rdf-syntax-grammar/#section-Serialising

2012/4/9 Niklas Lindström <linds...@gmail.com>:

Aidan Hogan

unread,
Apr 9, 2012, 4:51:44 PM4/9/12
to pedant...@googlegroups.com
To build on Jona's comment, consider an example:

<rdf:Statement>
<rdf:subject rdf:resource="http://dbpedia.org/resource/a"/>
<rdf:predicate rdf:resource="http://dbpedia.org/property/1234"/>
<rdf:object rdf:resource="http://dbpedia.org/resource/b"/>

<my:isTrue rdf:datatype="...#boolean">false</my:isTrue>
</rdf:Statement>

Un-reifying triples would be problematic in the general case.

Cheers,
Aidan

On 09/04/2012 21:38, Jona Christopher Sahnwaldt wrote:
> Hi,
>
> looks great, but alas, the reified form of a statement is not the same
> thing as that statement. From the RDF semantics spec [1]:
>
> A reification of a triple does not entail the triple, and is not
> entailed by it. The reification only says that the triple token exists
> and what it is about, not that it is true.
>
> For example, see test002 and test005 referenced in the RDF/XML
> specification [2]. They only differ in that test005 also includes the
> reified form of the statement, but they represent different graphs.
> RDF and RDF/XML do not un-reify statements. (Maybe some tools do, I
> don't know.)
>
> Summary from the RDF/XML spec:
>
> There are some RDF Graphs [...] that cannot be serialized in RDF/XML. [3]
>
> All we can do is omit such triples, or make sure that we do not use
> property names that are affected by this problem.
>
> Regards,
> Christopher
>
> [1] http://www.w3.org/TR/rdf-mt/#Reif
> [2] http://www.w3.org/TR/rdf-syntax-grammar/#emptyPropertyElt
> [3] http://www.w3.org/TR/rdf-syntax-grammar/#section-Serialising
>

> 2012/4/9 Niklas Lindstr�m<linds...@gmail.com>:

Niklas Lindström

unread,
Apr 9, 2012, 5:43:36 PM4/9/12
to pedant...@googlegroups.com
Jona, Aidan,

Yes, you're absolutely right. I knew there was a difference between a
reified form and the triple it represents, but I wasn't sure if it was
entailed. Thanks for the clarification. Either way I hadn't expected
tools to interpret them directly (and it seems they mustn't).

I mostly thought of reification as being a bit simpler than using OWL,
but with no such entailment it doesn't fully work. Still, keeping the
triples in reified form (with some provenance attached), as a
structural comment if you will, may be better than omitting them?

Otherwise, Simon's suggestion to generate owl:sameAs statements for
stand-in properties seems like a good idea.

(Though admittedly I haven't had to deal with this specific problem,
so I can't say much about what's most valuable in practice.)

Best regards,
Niklas

>> 2012/4/9 Niklas Lindström<linds...@gmail.com>:

Andy Seaborne

unread,
Apr 10, 2012, 5:57:28 AM4/10/12
to pedant...@googlegroups.com
An alternative route is let the client know that it isn't possible to
answer the request as defined by the content negotiation.

HTTP status code "406 Not Acceptable" could be used for that.

Anything that takes account of the unrepresentability and changes the
data is requiring the client to be aware of possible alternative
representation. That's a bit painful without some indication.

Maybe the server should just choose to provide a format that is possible
- the client will know from the "Accept:"

Andy

PS "303 See Other" is another possibility ... oh wait ... that's already
been used elsewhere.

Reply all
Reply to author
Forward
0 new messages