2009/12/2 Richard Cyganiak <
ric...@cyganiak.de>:
> Hi Ed,
>
> A newspaper page (even an abstract one, that is, a Manifestation rather than
> Item in the FRBR sense) is not the same as a web page.
>
> You have a web page whose topic is a newspaper page.
>
> The newspaper page perhaps is an information resource as well, according to
> the AWWW definition of information resource, but knowing that doesn't
> actually change anything; a web page describing an information resource is
> no different from a web page describing, let's say, a person, from the web
> architecture POV.
>
> About Xiaoshu's view, his way of looking at the world is an alternative
> possibility to the one that's commonly used in LOD -- I will call that one
> the “LOD view”, but you might call it “Richard's view” if you prefer. So,
> both the LOD view and Xiaoshu's view are reasonable, but *combining* both
> leads to all sorts of issues, so you should pick one and stick to it. The
> difference, explained on the example of FRBR, is that Xiaoshu would have a
> single identifier, and different properties to distinguish the “facets” of
> the resource:
>
> <
http://example.com/book123>
> frbr:workIssued "1872";
> frbr:expressionIssued "1878";
> frbr:manifestationIssued "2006";
> frbr:itemIssued "2009";
> .
I was told once that the reason that LOD and Semantic Web in general,
hadn't chosen this path was that current reasoning techniques were
unable to cope with it due to the range and domain restrictions on the
predicates inferring types on the URI that together might be confusing
to a reasoner that assumes one URI/Resource will have a set of classes
that are not disjoint, as opposed to a more liberal reasoning strategy
where the effects of the classes would be confined to the faceted
statements. In this example, the possible range of "frbr:Expression"
on frbr:expressionIssued, would have the class implication confined to
that statement instead of being broadcast to have effects on all other
statements that had that URI as the subject or object.. Because of
thse issues, LOD would therefore be better to have explicit single
typing, rather than implicit multiple typing. Is that an accurate
reason for the different choice?
In the real world, people are able to distinguish between implicit
types based on properties. For example, they create sentences like
"book 123 had a work issued in 1872, and had an expression issued in
1878", and people pick up the implicit typing on the single "book 123"
object for each part of the sentence, rather than creating new objects
in their heads just for the sentence to make sense. LOD would
encourage all of the typing parts of the sentence to be pushed from
the predicate URI into the subject URI, partially so that generic
vocabularies can be reused (without the usual pitfalls of generic
vocabularies). In LOD the sentence that someone would say might be
"the work aspect of the book 123 was issued in 1872 and the expression
aspect of the book 123 was issued in 1878, etc.", ie, making it
explicit that it is a new object being referred to. If AI can never
hope to pick up on implicit typing, then we may be stuck with the
separate preissued URI's for each facet as in the following example.
This modelling, unlike the previous modelling, would also need
rdf:type statements for each resource and interlinks to describe the
relationships between the URI's, to make queries work the way they are
intended.
<
http://example.com/book123#work> rdf:type frbr:Work .
<
http://example.com/book123#expression> rdf:type frbr:Expression .
<
http://example.com/book123#manifestation> rdf:type frbr:Manifestation .
<
http://example.com/book123#item> rdf:type frbr:Item .
<
http://example.com/book123#work> frbr:hasExpression
<
http://example.com/book123#expression> .
<
http://example.com/book123#work> frbr:hasManifestation
<
http://example.com/book123#manifestation> .
<
http://example.com/book123#work> frbr:hasItem
<
http://example.com/book123#item> .
I personally do not think it is valuable enough to the web to have new
URI's created for every part of everything, when we could just as
easily distinguish between them based on specialised vocabularies that
inherit some behaviour from the generic vocabularies where possible,
but provide a specialised view that doesn't contradict with other
predicates as the generic dc vocabulary will inevitably do if you
tried to use it in the other system. If you asked for dc:issued in
Xiaoshu's view and the frbr:*issued predicates derived from dc:issued,
then you would legitimately get the issue date of each facet in the
results set, as it was unclear which issued date you were asking for.
You could still get specific frbr:*issued dates though, without
looking for an rdf:type statement as you need to if you go with the
LOD view and rely on overall object typing to give context to generic
vocabularies.
The idea of creating a new URI for each facet of an item, if the only
reason was that it enables users to utilise generic vocabularies,
seems to be overkill to me when the alternative only creates a single
new property URI when a new facet is defined. If you have a million
books whose work, expression, manifestation and resolved "item/web
page/web document" all have to be described, the LOD system will make
use at least 4 million and 1 URI's, and at least 11 million RDF
statements, the alternative system uses 1 million and 4 URI's, and
only 4 million RDF statements... And if a particular facet of the book
wasn't previously described, it requires the creation of another
million URI"s, and 2+N million RDF statements, whereas the alternative
only requires 1 new URI and 1 millon RDF statements. Given that the
information content is equivalent to humans, it is ironic that the
information explosion is only predicated on a current lack of
artificial intelligence.
How would someone easily indicate that they owned a copy of the book
using the LOD conventions? Would the only way be for them to check the
particular publication date, along with the version number that the
printers gave the book, and rely on inferencing to derive the link to
the actual work using this information? After all, there can be
copyfixes done between printings without altering the overall book
identifier... ;)
It reminds me of a Yes Minister skit where Jim Hacker gets confused
when someone asks him to respond with his personal "hat" instead of
his ministerial "hat". I am surprised that foaf and sioc havn't bumped
into this issue until recently with the "role" discussion, as it will
inevitably force the simple foaf:Person class to become a lot more
complicated if it follows the LOD practice, as noone is just a
foaf:Person, they always have particular properties that specialise
them into classes such as "Prime Minister of the UK" or "Researcher"
etc., and people refer to them using those classes, so that for one
someone can keep their personal life separate from their professional
life and still be able to represent it for interests sake on the
Semantic Web.
Ironically, perhaps, the LOD view is semantically compatible with
reasonsers based on Xiaoshu's view, but any instances utilising
Xiaoshu's view will destroy the results for current reasoners relying
on world-level class inferences at the resource level in the LOD view.
If reasoners were more advanced, both could live together, as the LOD
view which cautiously creates a new identifier for an item every time
a new facet is discovered, wouldn't conflict with the statement level
faceting in reasoners following Xiaoshu's view.
The following set of non-generic predicated statements is quite
understandable to a human... What are the theory level bottlenecks
preventing computers from understanding it fully? Is it a fault with
RDF theory that resources are presumed to not be schizophrenic by
nature as they are in human languages?
Ie, the layer specific knowledge is given without relying on the URI
changing to reflect this.
<URI> <http:resolvedDate> "2009-11-26" .
<URI> <http:statusCode> "200" .
<URI> <xhtml:title> "Work about semantic conflicts with implicit
typing by Researcher B" .
<URI> <frbr:issued> "2009-11-22" .
<URI> <frbr:issuedTitle> "Work about semantic conflicts with implicit typing" .
<URI> <frbr:issuingAuthor> "Researcher B" .
Noone has been able to give me a satisfactory answer about 303 by the
way, and why the following unique inconsistency is allowed to occur in
the LOD semantics.
<URI> <http:resolvedDate> "2009-11-26" .
<URI> <http:statusCode> "303" .
<URI> <frbr:issued> "2009-11-22" .
<URI> <frbr:issuedTitle> "Work about semantic conflicts with implicit typing" .
<URI> <frbr:issuingAuthor> "Researcher B" .
<URI> <foaf:page> <URI2> .
<URI2> <http:resolvedDate> "2009-11-26" .
<URI2> <http:statusCode> "200" .
<URI2> <xhtml:title> "Work about semantic conflicts with implicit
typing by Researcher B" .
Why is it that the <URI> <http:statusCode> "303" triple is valid (or
specifically not valid) but it is conversely invalid or valid if it is
"200" or any other number if the response indicates something that
can't transmitted across the wire in a response like thoughts,
actions, and matter. If RDF is serious about keeping track of
provenance metadata it has to be able to answer this question
consistently.
Cheers,
Peter