On 02/12/2009 16:07, Chris Wallace wrote:
> Today I attempted to load an RDF dataset>
>
>
http://www.cems.uwe.ac.uk/xmlwiki/FOLD/all2008.xml
>
> This validates OK and has been previously loaded into our local Joseki
> server. However it threw up a lot of
>
> Invalid URIs are not permitted in RDF/XML documents. Please replace
> the uri<
http://www.cems.uwe.ac.uk/exist/studentperson.xql?
> name=Carl Ross> with a valid one.
>
> in the RDF some spaces are % encoded:
> <p:Internal_Moderator rdf:resource="
http://www.cems.uwe.ac.uk/
> rdffold/person/Paul%20Raynor"/>
>
> others are not:
>
> <p:intranetPage
> rdf:resource="
http://fold.cems.uwe.ac.uk:8080/exist/
> servlet/db/fold1/prod/person.xql?name=Paul Raynor"/>
>
> I guess these spaces are bad practice anyway but I'm puzzled that they
> validated and loaded into Joseki.
>
> Chris Wallace
> UWE, Bristol
>
My understanding is that an RDF URI Reference (a term defined by the RDF
Core WG - not the URI defined by RFC 2396 / 3986), in colsulatation with
the internationalization WG, tried to anticipate where the IRI spec was
going at the time and so allowed spaces.
But in the end, IRIs didn't allow spaces but that was after RDF Core ended.
Result: "RDF URI Reference" allows spaces even though URIs and IRIs do
not. Which is a bit of a nuisance for backwards compatibility. The IRI
library Jena uses (written by Jeremy because he couldn't find a robust
one already around) has to specifically have an "RDF mode" to allow this.
Personally, I think it is time to change this in Jena but it would break
backwards compatibility which we have always been loathe to do.
There is a difference between encoding and escaping: %20 is not a way
to put a space into a URI. The characters %-2-0 really are in the URI
(c.f. file URLs). Browsers obscure this by being helpful.
Normalization (RFC 3986) allows the replacement of %-encoded chars but
only if the encoded character is legal at that point and a space isn't
(and normalization does not apply to a library - only the end using
application).
Andy