spaces in URIs

8 views
Skip to first unread message

Chris Wallace

unread,
Dec 2, 2009, 11:07:44 AM12/2/09
to n2-dev
Today I attempted to load an RDF dataset>

http://www.cems.uwe.ac.uk/xmlwiki/FOLD/all2008.xml

This validates OK and has been previously loaded into our local Joseki
server. However it threw up a lot of

Invalid URIs are not permitted in RDF/XML documents. Please replace
the uri <http://www.cems.uwe.ac.uk/exist/studentperson.xql?
name=Carl Ross> with a valid one.

in the RDF some spaces are % encoded:
<p:Internal_Moderator rdf:resource="http://www.cems.uwe.ac.uk/
rdffold/person/Paul%20Raynor"/>

others are not:

<p:intranetPage
rdf:resource="http://fold.cems.uwe.ac.uk:8080/exist/
servlet/db/fold1/prod/person.xql?name=Paul Raynor"/>

I guess these spaces are bad practice anyway but I'm puzzled that they
validated and loaded into Joseki.

Chris Wallace
UWE, Bristol

Andy Seaborne

unread,
Dec 2, 2009, 4:12:38 PM12/2/09
to n2-...@googlegroups.com, Chris Wallace


On 02/12/2009 16:07, Chris Wallace wrote:
> Today I attempted to load an RDF dataset>
>
> http://www.cems.uwe.ac.uk/xmlwiki/FOLD/all2008.xml
>
> This validates OK and has been previously loaded into our local Joseki
> server. However it threw up a lot of
>
> Invalid URIs are not permitted in RDF/XML documents. Please replace
> the uri&lt;http://www.cems.uwe.ac.uk/exist/studentperson.xql?
> name=Carl Ross&gt; with a valid one.
>
> in the RDF some spaces are % encoded:
> <p:Internal_Moderator rdf:resource="http://www.cems.uwe.ac.uk/
> rdffold/person/Paul%20Raynor"/>
>
> others are not:
>
> <p:intranetPage
> rdf:resource="http://fold.cems.uwe.ac.uk:8080/exist/
> servlet/db/fold1/prod/person.xql?name=Paul Raynor"/>
>
> I guess these spaces are bad practice anyway but I'm puzzled that they
> validated and loaded into Joseki.
>
> Chris Wallace
> UWE, Bristol
>

My understanding is that an RDF URI Reference (a term defined by the RDF
Core WG - not the URI defined by RFC 2396 / 3986), in colsulatation with
the internationalization WG, tried to anticipate where the IRI spec was
going at the time and so allowed spaces.

But in the end, IRIs didn't allow spaces but that was after RDF Core ended.

Result: "RDF URI Reference" allows spaces even though URIs and IRIs do
not. Which is a bit of a nuisance for backwards compatibility. The IRI
library Jena uses (written by Jeremy because he couldn't find a robust
one already around) has to specifically have an "RDF mode" to allow this.

Personally, I think it is time to change this in Jena but it would break
backwards compatibility which we have always been loathe to do.

There is a difference between encoding and escaping: %20 is not a way
to put a space into a URI. The characters %-2-0 really are in the URI
(c.f. file URLs). Browsers obscure this by being helpful.

Normalization (RFC 3986) allows the replacement of %-encoded chars but
only if the encoded character is legal at that point and a space isn't
(and normalization does not apply to a library - only the end using
application).

Andy

Chris Wallace

unread,
Dec 3, 2009, 7:52:00 AM12/3/09
to n2-dev
Thanks, Andy

I thought the Talis platform used the same Jena but they must be
slightly different then in that regard.

Both 'space's replaced with + and dataset loaded fine.

Chris

Andy Seaborne

unread,
Dec 3, 2009, 7:58:08 AM12/3/09
to n2-...@googlegroups.com


On 03/12/2009 12:52, Chris Wallace wrote:
> Thanks, Andy
>
> I thought the Talis platform used the same Jena but they must be
> slightly different then in that regard.

There are more checks than the base system :-)

>
> Both 'space's replaced with + and dataset loaded fine.

"+" is a space in HTML Form encoding (application/x-www-form-urlencoded).

In URI spec "+" is a normal character (actually, it's a sub-delims).

Andy
>
> Chris
>
Reply all
Reply to author
Forward
0 new messages