Hey folks,
I don't know if the above email is the correct contact address for this
OpenCalais issue. Could you please forward as appropriate? I'm also,
CC'ing the pedantic-web group who informally help co-ordinating/offering
practical advice on publishing RDF.
Firstly, warm thanks to the OpenCalais team for contributing to RDF
publishing and for raising the profile of the Semantic Web/Linked Data
community.
However, I just came across some issues in the OpenCalais RDF published,
e.g., at [1]. It would be great to see some of these issues addressed if
possible.
The first issue related to use of rdfs:domain/rdfs:range. It seems from
your usage of rdfs:domain that you consider a 'union' semantics for a
term. That is to say, if you have:
(1) P domain C1 .
(2) P domain C2 .
and
(3) X P Y .
Then X is either of type C1 *or* C2.
This is incorrect. X will be of type C1 *and* C2. So from (1) (2) and
(3), RDFS/OWL reasoners will infer (4) and (5):
(4) X a C1 .
(5) X a C2 .
To take an example from [1] you say (in RDF/XML)
<rdf:Property rdf:about="
http://s.opencalais.com/1/pred/person">
<rdfs:range rdf:resource="
http://s.opencalais.com/1/type/em/e/Person"/>
<rdfs:label>person</rdfs:label>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/Quotation"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/PersonCommunication"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/Arrest"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/PersonAttributes"/>
<rdfs:domain rdf:resource="
http://s.opencalais.com/1/type/em/r/Trial"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/Conviction"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/PersonEducation"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/PersonEmailAddress"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/PersonRelation"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/Indictment"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/FamilyRelation"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/PersonTravel"/>
<rdfs:comment>Canonic name of a person entity that is a participant in
an event/fact</rdfs:comment>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/PersonCareer"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/EmploymentChange"/>
</rdf:Property>
This means that given the above RDF and a triple of the type
X <
http://s.opencalais.com/1/pred/person> Y .
Then you can infer that:
X a <
http://s.opencalais.com/1/type/em/r/Quotation> .
X a <
http://s.opencalais.com/1/type/em/r/PersonCommunication> .
X a <
http://s.opencalais.com/1/type/em/r/Arrest> .
X a <
http://s.opencalais.com/1/type/em/r/PersonAttributes> .
I'm sure that that's not what you want. More worryingly, you define
domains and ranges for standard and external terms, such as for
owl:sameAs:
<rdf:Property rdf:about="
http://www.w3.org/2002/07/owl#sameAs">
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/e/RadioStation"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/e/Facility"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/e/MusicAlbum"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/er/Geo/Country"/>
<rdfs:range
rdf:resource="
http://www.w3.org/2000/01/rdf-schema#Resource"/>
<rdfs:domain rdf:resource="
http://s.opencalais.com/1/type/em/e/Region"/>
<rdfs:domain rdf:resource="
http://s.opencalais.com/1/type/em/e/Person"/>
...(many more)
</rdf:Property>
and domains for rdf:type
<rdf:Property
rdf:about="
http://www.w3.org/1999/02/22-rdf-syntax-ns#type">
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/PersonRelation"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/Conviction"/>
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/em/r/CompanyAffiliates"/>
...(many more)
</rdf:Property>
The above definitions mean that *every* entity can (naively) be inferred
to be a member of *all* of the domain/range classes defined as above.
Also you define a new domain for foaf:img (which should only be used for
images of people -- foaf:depiction would be more suitable).
<rdf:Property rdf:about="
http://xmlns.com/foaf/0.1/img">
<rdfs:domain
rdf:resource="
http://s.opencalais.com/1/type/er/Product/Electronics"/>
<rdfs:range
rdf:resource="
http://www.w3.org/2000/01/rdf-schema#Resource"/>
</rdf:Property>
Essentially, the document served by (e.g.) [1] needs a pretty major
overhaul -- probably just simplifying it -- to remove such issues. This
document is important in that it defines the semantics of the terms. It
is important that the OWL/RDFS definitions correctly reflects what you
intend for the term, even if the term is only 'lightly' defined -- a
simple and correct definition of terms is much more valuable than a
complex and incorrect definition of terms.
The second major issue relates to redundancy in the triples published.
Currently, every term published under the namespace of [2] has its own
document, and each such document contains the same information -- for
example, given "Accept application/rdf+xml", [3] redirects (303) to [1].
Ideally, either:
(a) Each term-specific document (e.g., [1]) contains mostly information
relating to that term (e.g., [3]); OR
(b) Each term (e.g., [3]) redirects (303) to *one* document (e.g., [4])
containing information for *all* terms.
This avoids crawlers accessing redundant information, and reduces the
server load/bandwidth required for you to serve such information.
If you have any questions about the above issues please don't hesitate
to contact us.
Cheers,
Aidan
-
http://pedantic-web.org/
[1]
http://d.opencalais.com/1/type/em/r/CompanyTechnology.rdf
[2]
http://d.opencalais.com/1/type/em/r/
[3]
http://d.opencalais.com/1/type/em/r/CompanyTechnology
[3]
http://d.opencalais.com/1/type/em/r/index.rdf
--
Subscription settings:
http://groups.google.com/group/pedantic-web/subscribe?hl=en