I think that using the xsd:string datatype in RDF is always a bad idea
and should be flagged as a warning. Do you agree?
My reasoning:
1. If a literal is specified as xsd:string, then it cannot have a
language tag. Therefore, if you want your application to be
internationalisable, you cannot use xsd:string anyway.
2. If you don't want to use a language tag because the string is not
in a natural language, then declaring it as xsd:string is useless,
because it's semantically equivalent to just using a plain literal.
Thus, parsimony dictates that the literal should be left plain.
3. Many RDF implementations don't do datatype entailment, therefore if
your data uses xsd:string but a SPARQL query uses a plain literal, or
the other way round, then there will be no match, which can probably
be quite annoying and confusing for users. This can be avoided if
everyone, by convention, just uses one of the two alternatives.
Because of (1) above, that alternative can only be the plain literal.
Am I missing some benefit of xsd:string?
Richard
Absolutely! I'm glad someone has pointed this out - and I have
encountered all three of these issues in my daily dealings with RDF. As
my own best practice try to make sure that it does not appear in any RDF
I produce.
Specifically for point 1) I have seen resources duplicated in the RDF so
that they can have both the datatype and the language tag specified
separately.
Specifically for point 3), I have seen mixtures of datasets that in some
places have xsd:string and in some places do not. Practically this means
that you have to make sure you use the str(?s) function in a SPARQL
query to make sure you catch everything in a regex for example. This
seems redundant, as across datasets you would always need to use this,
and you cant (currently) use datatypes in SPARQL queries anyway.
Cheers,
Andrew
--
Dr Andrew Gibson
Universiteit van Amsterdam
On Mon, Nov 2, 2009 at 1:22 PM, Andrew Gibson <a.p.g...@uva.nl> wrote:
> Absolutely! I'm glad someone has pointed this out - and I have encountered
> all three of these issues in my daily dealings with RDF. As my own best
> practice try to make sure that it does not appear in any RDF I produce.
This is very interesting... how should I define the rdf:datatype
instead to allow i18n?
Egon
--
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers
Egon Willighagen wrote:
> Hi Richard, Andrew,
>
> On Mon, Nov 2, 2009 at 1:22 PM, Andrew Gibson <a.p.g...@uva.nl> wrote:
>> Absolutely! I'm glad someone has pointed this out - and I have encountered
>> all three of these issues in my daily dealings with RDF. As my own best
>> practice try to make sure that it does not appear in any RDF I produce.
>
> This is very interesting... how should I define the rdf:datatype
> instead to allow i18n?
If I understand your question, the proposal is that for a string, you
don't - you leave it as a plain literal, and use a language tag for i18n.
Andrew
> Egon
Regarding xsd:string, I think that it might be useful to specify that
the value for a given property should never use a language-tag; i.e.,
that the value is a structured string which transcends language
considerations. In that case Richard, point (2) is invalid because a
plain-literal can have a language tag. For example, you might give the
property 'gender' a range of xsd:string to ensure that people don't
define the value for gender in their own language, but use some
agreed-upon terms. Although I personally do not like
constraints-for-constraints-sake in RDFS/OWL, I think that xsd:string
has some limited use here to represent the class of literals without
language-tags.
Other than the above arguable clause, I agree completely.
Cheers,
Aidan
If it were XML, in the case where you're datatyping the literal to
indicate that it's from a constrained vocabulary, you should probably
be using xsd:token instead (since this implies some whitespace
normalisation which is generally useful) or maybe xsd:NMTOKEN to rule
out the possibility of whitespace within the value.
But as far as I can tell, xsd:token isn't supported by SPARQL, so
actually using xsd:token (and relying on its whitespace normalisation)
might be problematic. For example, the triple:
<#a> eg:gender " female "^^xsd:token
isn't going to match the pattern:
?woman eg:gender "female"^^xsd:token
so perhaps the argument should be that in these cases you *should* use
xsd:string since at least that won't lead you to falsely believe that
some kind of whitespace normalisation will take place.
Actually, I think a better case where xsd:string should be used is if
the literal is something like an XPath expression or a block of
Javascript code. You certainly don't want those to have languages
associated with them (and whitespace is important within them, so no
subtypes are appropriate).
Cheers,
Jeni
--
Jeni Tennison
http://www.jenitennison.com
Good points all around, thanks. I didn't even know about xsd:token.
It's probably not right to say that SPARQL doesn't support xsd:token,
it's more that your SPARQL processor doesn't know the specific
syntactic equality rules for that datatype. I would be surprised
though if any existing store supported that rule -- if you find such a
store, then we should award its implementer some sort of prize for
Outstanding Pedantry ;-)
Richard
SPARQL1.1 is working on an Entailment regimes (time-allowed) that should
cover that, but I have to emphasize that we cannot be sure at this point
whether this will in the end evolve to Rec and how many endpoints/
engines
will then support Datatypes fully.
It has some non-trivialities, unfortunately.
So, yes, relying on whitespace normalisation is problematic indeed.
Axel