[pedantic-web] OpenCalais... multiple domains for rdf:type/owl:same... redundant publishing

34 views
Skip to first unread message

Hogan, Aidan

unread,
Apr 19, 2010, 10:55:32 AM4/19/10
to ques...@opencalais.com, pedant...@googlegroups.com
Hey folks,

I don't know if the above email is the correct contact address for this
OpenCalais issue. Could you please forward as appropriate? I'm also,
CC'ing the pedantic-web group who informally help co-ordinating/offering
practical advice on publishing RDF.

Firstly, warm thanks to the OpenCalais team for contributing to RDF
publishing and for raising the profile of the Semantic Web/Linked Data
community.

However, I just came across some issues in the OpenCalais RDF published,
e.g., at [1]. It would be great to see some of these issues addressed if
possible.

The first issue related to use of rdfs:domain/rdfs:range. It seems from
your usage of rdfs:domain that you consider a 'union' semantics for a
term. That is to say, if you have:

(1) P domain C1 .
(2) P domain C2 .

and

(3) X P Y .

Then X is either of type C1 *or* C2.

This is incorrect. X will be of type C1 *and* C2. So from (1) (2) and
(3), RDFS/OWL reasoners will infer (4) and (5):

(4) X a C1 .
(5) X a C2 .

To take an example from [1] you say (in RDF/XML)

<rdf:Property rdf:about="http://s.opencalais.com/1/pred/person">
<rdfs:range rdf:resource="http://s.opencalais.com/1/type/em/e/Person"/>
<rdfs:label>person</rdfs:label>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/Quotation"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/PersonCommunication"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/Arrest"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/PersonAttributes"/>
<rdfs:domain rdf:resource="http://s.opencalais.com/1/type/em/r/Trial"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/Conviction"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/PersonEducation"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/PersonEmailAddress"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/PersonRelation"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/Indictment"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/FamilyRelation"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/PersonTravel"/>
<rdfs:comment>Canonic name of a person entity that is a participant in
an event/fact</rdfs:comment>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/PersonCareer"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/EmploymentChange"/>
</rdf:Property>

This means that given the above RDF and a triple of the type
X <http://s.opencalais.com/1/pred/person> Y .

Then you can infer that:
X a <http://s.opencalais.com/1/type/em/r/Quotation> .
X a <http://s.opencalais.com/1/type/em/r/PersonCommunication> .
X a <http://s.opencalais.com/1/type/em/r/Arrest> .
X a <http://s.opencalais.com/1/type/em/r/PersonAttributes> .

I'm sure that that's not what you want. More worryingly, you define
domains and ranges for standard and external terms, such as for
owl:sameAs:

<rdf:Property rdf:about="http://www.w3.org/2002/07/owl#sameAs">
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/e/RadioStation"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/e/Facility"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/e/MusicAlbum"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/er/Geo/Country"/>
<rdfs:range
rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>
<rdfs:domain rdf:resource="http://s.opencalais.com/1/type/em/e/Region"/>
<rdfs:domain rdf:resource="http://s.opencalais.com/1/type/em/e/Person"/>

...(many more)

</rdf:Property>

and domains for rdf:type

<rdf:Property
rdf:about="http://www.w3.org/1999/02/22-rdf-syntax-ns#type">
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/PersonRelation"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/Conviction"/>
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/em/r/CompanyAffiliates"/>

...(many more)

</rdf:Property>

The above definitions mean that *every* entity can (naively) be inferred
to be a member of *all* of the domain/range classes defined as above.

Also you define a new domain for foaf:img (which should only be used for
images of people -- foaf:depiction would be more suitable).

<rdf:Property rdf:about="http://xmlns.com/foaf/0.1/img">
<rdfs:domain
rdf:resource="http://s.opencalais.com/1/type/er/Product/Electronics"/>
<rdfs:range
rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/>
</rdf:Property>

Essentially, the document served by (e.g.) [1] needs a pretty major
overhaul -- probably just simplifying it -- to remove such issues. This
document is important in that it defines the semantics of the terms. It
is important that the OWL/RDFS definitions correctly reflects what you
intend for the term, even if the term is only 'lightly' defined -- a
simple and correct definition of terms is much more valuable than a
complex and incorrect definition of terms.

The second major issue relates to redundancy in the triples published.
Currently, every term published under the namespace of [2] has its own
document, and each such document contains the same information -- for
example, given "Accept application/rdf+xml", [3] redirects (303) to [1].

Ideally, either:
(a) Each term-specific document (e.g., [1]) contains mostly information
relating to that term (e.g., [3]); OR
(b) Each term (e.g., [3]) redirects (303) to *one* document (e.g., [4])
containing information for *all* terms.

This avoids crawlers accessing redundant information, and reduces the
server load/bandwidth required for you to serve such information.

If you have any questions about the above issues please don't hesitate
to contact us.

Cheers,
Aidan
- http://pedantic-web.org/

[1] http://d.opencalais.com/1/type/em/r/CompanyTechnology.rdf
[2] http://d.opencalais.com/1/type/em/r/
[3] http://d.opencalais.com/1/type/em/r/CompanyTechnology
[3] http://d.opencalais.com/1/type/em/r/index.rdf



--
Subscription settings: http://groups.google.com/group/pedantic-web/subscribe?hl=en

sumit

unread,
Apr 20, 2010, 11:07:55 AM4/20/10
to Pedantic Web Group
Hi Aidan,
Recently we released an updated OWL schema at
http://www.opencalais.com/files/owl.opencalais-4.3a.xml. Please refer
to this schema where we have fixed the domain/class issues you bring
up and should simplify the crawling process a bit.

Let me know if this help.
Sumit Shah
Open Calais Team
> -http://pedantic-web.org/

Hogan, Aidan

unread,
Apr 20, 2010, 6:47:56 PM4/20/10
to pedant...@googlegroups.com, shah....@gmail.com
Hi Sumit,

Thanks for the quick response. Yep, had a quick scan and certainly [1] looks much cleaner than the previous ontology. However, there are still some issues relating to dereferencing and content-type reporting. I don't know how feasible they could be to fix on your side...

Firstly, predicates such as [2] dereference -- with Accept:application/rdf+xml -- to blank documents such as [3].

Secondly, classes such as [4] still dereference to the old document with the problems discussed -- e.g., [5]. This will still cause problems for crawlers and your servers.

Ideally these terms should dereference (303) to the most up-to-date description of themselves -- e.g., [1].

Thirdly, [1] returns a content-type of text/xml. I know that somewhere in the OWL spec, it says that application/rdf+xml and text/xml are suitable content-types for RDF/XML. I personally would ignore that... We have a specific content type for RDF/XML so why not use it? My preference would be to see [1] return content-type application/rdf+xml. You could probably achieve this out-of-the-box by changing the file extension to '.rdf' if possible.

With the above dereferencing and stuff fixed, ideally the old ontology can then be removed to avoid confusion about the definition of terms.

Cheers,
Aidan

[1] http://www.opencalais.com/files/owl.opencalais-4.3a.xml
[2] http://d.opencalais.com/pred/effect
[3] http://d.opencalais.com/pred/effect.rdf
[4] http://d.opencalais.com/1/type/em/e/MarkupEntity
[5] http://d.opencalais.com/1/type/em/e/MarkupEntity.rdf
Reply all
Reply to author
Forward
0 new messages