Is xsd:string always a bad idea?

5 views
Skip to first unread message

Richard Cyganiak

unread,
Nov 2, 2009, 6:26:30 AM11/2/09
to pedant...@googlegroups.com
A question for the fellow pedants.

I think that using the xsd:string datatype in RDF is always a bad idea
and should be flagged as a warning. Do you agree?

My reasoning:

1. If a literal is specified as xsd:string, then it cannot have a
language tag. Therefore, if you want your application to be
internationalisable, you cannot use xsd:string anyway.

2. If you don't want to use a language tag because the string is not
in a natural language, then declaring it as xsd:string is useless,
because it's semantically equivalent to just using a plain literal.
Thus, parsimony dictates that the literal should be left plain.

3. Many RDF implementations don't do datatype entailment, therefore if
your data uses xsd:string but a SPARQL query uses a plain literal, or
the other way round, then there will be no match, which can probably
be quite annoying and confusing for users. This can be avoided if
everyone, by convention, just uses one of the two alternatives.
Because of (1) above, that alternative can only be the plain literal.

Am I missing some benefit of xsd:string?

Richard

Andrew Gibson

unread,
Nov 2, 2009, 7:22:04 AM11/2/09
to pedant...@googlegroups.com
Hi Richard,

Absolutely! I'm glad someone has pointed this out - and I have
encountered all three of these issues in my daily dealings with RDF. As
my own best practice try to make sure that it does not appear in any RDF
I produce.

Specifically for point 1) I have seen resources duplicated in the RDF so
that they can have both the datatype and the language tag specified
separately.

Specifically for point 3), I have seen mixtures of datasets that in some
places have xsd:string and in some places do not. Practically this means
that you have to make sure you use the str(?s) function in a SPARQL
query to make sure you catch everything in a regex for example. This
seems redundant, as across datasets you would always need to use this,
and you cant (currently) use datatypes in SPARQL queries anyway.

Cheers,
Andrew


--
Dr Andrew Gibson
Universiteit van Amsterdam

Egon Willighagen

unread,
Nov 2, 2009, 7:24:40 AM11/2/09
to pedant...@googlegroups.com
Hi Richard, Andrew,

On Mon, Nov 2, 2009 at 1:22 PM, Andrew Gibson <a.p.g...@uva.nl> wrote:
> Absolutely! I'm glad someone has pointed this out - and I have encountered
> all three of these issues in my daily dealings with RDF. As my own best
> practice try to make sure that it does not appear in any RDF I produce.

This is very interesting... how should I define the rdf:datatype
instead to allow i18n?

Egon

--
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

Andrew Gibson

unread,
Nov 2, 2009, 7:33:59 AM11/2/09
to pedant...@googlegroups.com
Hi Egon,

Egon Willighagen wrote:
> Hi Richard, Andrew,
>
> On Mon, Nov 2, 2009 at 1:22 PM, Andrew Gibson <a.p.g...@uva.nl> wrote:
>> Absolutely! I'm glad someone has pointed this out - and I have encountered
>> all three of these issues in my daily dealings with RDF. As my own best
>> practice try to make sure that it does not appear in any RDF I produce.
>
> This is very interesting... how should I define the rdf:datatype
> instead to allow i18n?

If I understand your question, the proposal is that for a string, you
don't - you leave it as a plain literal, and use a language tag for i18n.

Andrew

> Egon

Hogan, Aidan

unread,
Nov 3, 2009, 11:17:48 AM11/3/09
to pedant...@googlegroups.com
Hi folks,

Regarding xsd:string, I think that it might be useful to specify that
the value for a given property should never use a language-tag; i.e.,
that the value is a structured string which transcends language
considerations. In that case Richard, point (2) is invalid because a
plain-literal can have a language tag. For example, you might give the
property 'gender' a range of xsd:string to ensure that people don't
define the value for gender in their own language, but use some
agreed-upon terms. Although I personally do not like
constraints-for-constraints-sake in RDFS/OWL, I think that xsd:string
has some limited use here to represent the class of literals without
language-tags.

Other than the above arguable clause, I agree completely.

Cheers,
Aidan

Jeni Tennison

unread,
Nov 3, 2009, 2:48:59 PM11/3/09
to pedant...@googlegroups.com
Aidan (etc.),

If it were XML, in the case where you're datatyping the literal to
indicate that it's from a constrained vocabulary, you should probably
be using xsd:token instead (since this implies some whitespace
normalisation which is generally useful) or maybe xsd:NMTOKEN to rule
out the possibility of whitespace within the value.

But as far as I can tell, xsd:token isn't supported by SPARQL, so
actually using xsd:token (and relying on its whitespace normalisation)
might be problematic. For example, the triple:

<#a> eg:gender " female "^^xsd:token

isn't going to match the pattern:

?woman eg:gender "female"^^xsd:token

so perhaps the argument should be that in these cases you *should* use
xsd:string since at least that won't lead you to falsely believe that
some kind of whitespace normalisation will take place.

Actually, I think a better case where xsd:string should be used is if
the literal is something like an XPath expression or a block of
Javascript code. You certainly don't want those to have languages
associated with them (and whitespace is important within them, so no
subtypes are appropriate).

Cheers,

Jeni

--
Jeni Tennison
http://www.jenitennison.com

Richard Cyganiak

unread,
Nov 3, 2009, 6:18:16 PM11/3/09
to pedant...@googlegroups.com
Jeni, Aidan,

Good points all around, thanks. I didn't even know about xsd:token.
It's probably not right to say that SPARQL doesn't support xsd:token,
it's more that your SPARQL processor doesn't know the specific
syntactic equality rules for that datatype. I would be surprised
though if any existing store supported that rule -- if you find such a
store, then we should award its implementer some sort of prize for
Outstanding Pedantry ;-)

Richard

Axel Polleres

unread,
Nov 3, 2009, 8:08:13 PM11/3/09
to pedant...@googlegroups.com
Datatype entailment isn't supported by SPARQL in general in normal
curent implementations...
in principle, because there's no spec defining what that *means*.

SPARQL1.1 is working on an Entailment regimes (time-allowed) that should
cover that, but I have to emphasize that we cannot be sure at this point
whether this will in the end evolve to Rec and how many endpoints/
engines
will then support Datatypes fully.
It has some non-trivialities, unfortunately.

So, yes, relying on whitespace normalisation is problematic indeed.

Axel

Reply all
Reply to author
Forward
0 new messages