Attached are the original queries, with the formatting fixed. Looks
like the angle brackets should be escaped in the original.
Cheers,
L.
2009/10/2 Chris Wallace <kit.w...@googlemail.com>:
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh...@talis.com
http://www.talis.com
The examples are not valid SPARQL - some loss of formatting I guess-
e.g. the first example
prefix sch-ont: http://education.data.gov.uk/ontology/school#
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
http://education.data.gov.uk/placeholder-id/administrativeDistrict/City-of-London;
}
ORDER BY ?name
should be
prefix sch-ont: <http://education.data.gov.uk/ontology/school#>
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://education.data.gov.uk/placeholder-id/administrativeDistrict/
City-of-London>;
}
ORDER BY ?name
(I hope the added angle brackets get through this interface!)
2009/10/2 kit.wallace <kit.w...@googlemail.com>:
> Leigh , It would be great to see the design and rationale for the
> school vocabulary, especially if this is to be a key example of the
> Govt Data RDF approach. There are clearly design choices here -such as
> using only newly minted terms in the vocabulary .
The Edubase data was converted to RDF by a team of people at HP
including Stewart Williams, Dave Reynolds and Brian McBride. I suspect
at least one of them are on this list (or soon will be).
Cheers,
L.
we've already starting playing at
Schools
-
I'll update the group with our expeirences as they come in.
-
S
Chris, these are just general purpose utilitites not part of the gov
activity. Hopefully we'll see lots more like them as more open data is
made available. They are open source if anyone wants to take them and
build on them (i'll send urls later - am writing this on my phone)
ian
> Leigh , It would be great to see the design and rationale for the
> school vocabulary, especially if this is to be a key example of the
> Govt Data RDF approach. There are clearly design choices here -such as
> using only newly minted terms in the vocabulary .
>
> Chris
>
>
> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>
--
2009/10/2 Simon Grice (mashup*) <si...@mashupevent.com>:
> is there a list of data available via the SPARQL interface ?
Do you mean metadata for the list of datasets? Thats on the roadmap.
The CKAN infrastructure is handling the basic dataset directory, but
we'll be converting that to RDF and loading it into a store with a
SPARQL endpoint so you'll be able to query to find specific datasets.
>
> we've already starting playing at
>
> http://belocal.com/
>
> Schools
Nice!
Cheers,
L.
and structure of that data
S
kit.wallace wrote:
> Leigh , It would be great to see the design and rationale for the
> school vocabulary, especially if this is to be a key example of the
> Govt Data RDF approach. There are clearly design choices here -such as
> using only newly minted terms in the vocabulary .
A couple of things to add to Stuart's response.
The vocabulary is in no way set in stone. It's a very preliminary
version and needs more validation with the data owners, amongst others.
So any suggestions you have for terms we should be reusing, or
cross-linking to, would be great.
For this starting point we erred on the side of keeping things as close
to the original data as possible. Partly because that meant we could
automate a lot of the vocabulary generation (and data generation), and
partly because we didn't have access to the data schema or experts on
the dataset so didn't want to guess at semantics without some validation.
The plan is to cross link the vocabulary with other vocabularies. The
"foundation" ontology has cross links to other upper level ontologies
like opencyc, proton etc. We'll add similar links to the schools
vocabulary itself shortly.
[There's subtleties in there about when you can safely use the rdfs/owl
vocabulary mapping terms. For now we are just cross-linking using an
ontology annotation.]
We did look at other schools related ontologies to see which ones we
could either use or cross link to. There isn't that much out there that
we could find. The most relevant is probably the schools part of the
dbpedia ontology [1]. There is surprisingly little overlap with that.
The main properties we could use would be things like
dbpedia:headteacher. However, the person-specific information in Edubase
is being held back from the RDF right now (both data protection issues
and the fact that representing people gets into a whole can of worms
that we wanted to avoid for the alpha release).
There are some areas that are known to need further work, especially the
representation of location. We've started to do some cross-linking with
the OS administrative geography ontology which is a start in that direction.
Cheers,
Dave
1. Express triples within HTML docs using RDFa
2. Use of <link/> within <head/> to expose a variety of resource URLs
that expose a variety of metadata representations (where each item in
the metadata doc is endowed with a generic HTTP URI i.e., a
de-referencable URI)
3. Transparent Content Negotiation rules on the server the allow user
agents and servers arrive at preferred metadata representations based on
quality of service algorithms within HTTP headers.
Even if you don't have an RDF model based data representation, nothing
stops you using the methods above to expose the structured data for a
given data space.
Kingsley
> <mailto:leigh...@talis.com>>
>
>
> Hi,
>
> 2009/10/2 kit.wallace <kit.w...@googlemail.com
> <mailto:kit.w...@googlemail.com>>:
> > Leigh , It would be great to see the design and rationale for the
> > school vocabulary, especially if this is to be a key example of the
> > Govt Data RDF approach. There are clearly design choices here
> -such as
> > using only newly minted terms in the vocabulary .
>
> The Edubase data was converted to RDF by a team of people at HP
> including Stewart Williams, Dave Reynolds and Brian McBride. I suspect
> at least one of them are on this list (or soon will be).
>
> Cheers,
>
> L.
>
> --
> Leigh Dodds
> Programme Manager, Talis Platform
> Talis
> leigh...@talis.com <mailto:leigh...@talis.com>
> http://www.talis.com
>
>
--
Regards,
Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com
I see there is a lot of SPARQL in the air today.
Note:
http://demo.openlinksw.com/sparql_demo/
It has a number of things for new and advanced SPARQL users.
[Or should I use "Kit"?]
kit.wallace wrote:
> Thanks Dave and Stuart for providing those insights to your first cut
> design and the vocab. If I can take up your offer to provide a bit of
> feedback, mindful that especially in RDF there are many ways to skin
> the cat.
Thanks. All feedback useful, I'll respond inline.
> This is predicated by a few principles which I have at the back of my
> mind:
>
> a) The data structure should be discoverable from the data itself -
> separate vocabs are nice but don't describe the usage of all the
> vocabs in a given dataset. Navigating the different structures for
> vocabs can present its own problems.
Somewhat agree but vocabs express things that the data itself can't,
like type hierarchies so you should expect to check vocabs as well.
In the medium term I'd hope to see good tools for browsing and reusing
the vocabs relevant to the data.gov data.
> b) The conceptual model should be independent of the particular data
> model or serialisation used - the same data will be available in
> various formats, of which RDF is just one. RDF creates its own
> problems in representation of course
Yes.
> c) RDF and linked data are separate concerns - linking is about using
> globally unique identifiers for entities and properties and these can
> be used in XML or CSV as well as RDF
Yes, though I find thinking about it from the RDF point of view first
helps. It forces you to think about links between global identifiers and
not accidentally rely on file structure.
> 1) type/category conflation
>
> The bootstrapping query select distinct ?x where { ?s a ?x} lists 41
> different values. However one can guess that only 4 of these are
> types in the entity type(class) sense ( School, CensusRecord,
> vcard:Address and vcard:VCard). By inspection the rest appear to be a
> number of School categories - TypeOfEstablishment,
> IndependentSchoolType, BoardingEstablishment and TrainingSchool.
>
> Reading the vocab, it appears that these types could be equivalently
> represented as boolean (if not mutually exclusive) or enumerated
> values (if exclusive). There are many other school properties which
> could represented as types in this way, so it's not clear to me why
> some properties have been promoted to types and others left as
> predicates.
>
> Looking at the eduBase site, it appears that TypeofEstablishment is an
> enumerated value and hence a set of exclusive categories, but
> represented as rdf:types they are independent boolean categories, so
> some semantics have been lost. It could be put back with owl
> but ... On the other hand, BoardingEstablishment seems to be
> equivalent to the hasBoarders predicate.
>
> The use of rdf:type instead of boolean or enumerated value predicates
> seems to have lost some meaning as well as inflating the type space.
> An export as say CSV could use columns for these properties, so a
> design which created URIs for the properties and values would be
> reusable in the CSV exports as well.
This is a tricky one so apologies if the response gets a bit long.
First, I should say that we did this initial work without a data
dictionary and without access to the people who own the data. Our plan
is to sit down with them "real soon now", get more insight into the
semantics of the data, and then update the model to match. However, it
was better to get something usable out there to get feedback like this
rather than wait for perfection ;-)
Second, there are some principles to apply when deciding to represent
something as a class rather than a property. There's lots written on
this that I'm not about to do justice to but basically the checks I tend
to use are:
- is this intrinsic to the nature of the entity?
- would this classification affect the relevance of other properties?
- do these form a natural hierarchy?
The classic example is the Ontology 101 example wine ontology. Should
you represent Red Wine as a subClass of Wine or as a colour property of
Wine? The common answer to this is to use subClass because one thinks of
Red and White wines so differently and different properties apply - you
care about tannin levels in Red wine and sweetness in White.
Now let's separate the TypeOfEstablishment case from the two definitely
Boolean cases (BoardingEstablishment and TrainingSchool).
For TypeOfEstablishment then we had a few reasons for going with the
class-based representation. Firstly, the name "TypeOf" seemed like a
give-away to how the data owners might thing of it :-) Secondly, it
seems like these types would affect the relevance of other properties,
for example an LA_Nursery_School would have nursery provision and would
not be expected to have a sixth form. Thirdly there is the possibility
that the different fine-grained types of Independent school might fit
under the top level classes of Independent School (though we haven't
managed to do that).
I agree that in the Edubase data these types are mutually exclusive and
this information has been lost. If we stick with the use of classes then
we should add allDisjoint axioms in the ontology. I didn't do that
initially because that is easier to do in OWL 2 but OWL 2 support is not
widespread. But also I wondered if some of these are really mutually
exclusive or could in principle overlap in other datasets.
As for "inflating the type space" ... well yes but I'm not sure that's a
problem. I would argue that for data like this the SPARQL end point
should contain both the data and the ontology, but as separate named
graphs. That way you could find all types and separate root types from
subtypes but still query just the data if you need to.
None of that is definitive of course. In particular it is not clear how
"intrinsic" these types are. So if after a little more consultation is
the consensus is to shift those to classification labels (i.e. yet more
SKOS Concepts) instead of classes then that'd be fine.
Then there are a couple of boolean cases BoardingEstablishment. In that
case I went with a class on the grounds that BoardingEstablishments have
boarders and there are several properties about boarders, numbers of
boarders and you could imagine more (such as the boarding component of
fees). So on the "it changes what properties you would look for"
argument we made that a class. This is even more subjective than for
TypeOfEstablishment so I definitely wouldn't die in a ditch over that one!
> 2) rdfs:label - this is a handy predicate to use because it gives
> every significant subject a descriptive, humanly readable label. Some
> types have an rdfs:label but notably school does not - I think it
> would be good practice if entities did include this predicate to
> assist generic browsers. Following on from that, it would be easy to
> add a label to types and predicates to provide a basic level of
> documentation retrievable from the dataset.
Odd. In the SVN copy of the ontology, and as far as I can see in the
version Stuart posted, then :School does have an rdfs:label, certainly
everything in the ontology is supposed to.
Or do you mean the school instances ... checks ... argh you are right
they don't and should do. Will get that fixed.
> 3) which properties are treated as time-dependent seems rather
> arbitrary - I would be less interested in what it used to cost to go
> to a public school than I would be to see which schools had moved in
> or out of specialMeasures. As I can see from other posts to this list,
> handling time is a core problem for RDF (and any data model) but I
> doubt that a piecemeal approach would be useful. In any case, it would
> surely present more problems when refreshing the dataset since it now
> can't be a simple replacement. Perhaps separate dated snapshots as
> separate graphs would be a preferable initial approach?.
Yes, tricky, a little arbitrary perhaps but not done without thought.
Essentially absolutely any property could change - name, headmaster,
location, ward. Part of the design pattern for government URI sets
includes some notion of dated versions so you can get the latest version
but also go back to earlier ones. So indeed in the medium term I'd
expect the sort of structure you suggest.
However, I'd argue that there should also be a first class historical
record in the RDF (but again ideally in a different graph) to make it
easier to do queries over that history. There are various ways of
representing such historical values and we didn't want to hold up the
publication while deciding on the right ones for those.
However, if every single property were represented as a full
context-dependent n-ary relation the data would be a lot harder to
understand and query so going with a "now" snapshot, which is the data
we had, seemed reasonable.
Then there are are properties which are explictly time dependent they
are sampled at known times and we might expect to have multiple samples
eventually even if we don't know about other changes. Specifically the
census data like class sizes and probably fees are in that category. I
think you are right that specialMeasures probably should be as well.
So the question is do we make absolutely everything a time-dependent
property and gain uniformity at the cost of ease of use. Or do we pick
out the ones that seem most like time-varying statistics and treat those
as a special case. We went with the latter but as the feedback
accumulates that decision should definitely be revisited.
As you've probably seen from the statistics discussion we should also
publish the time-varying statistics SCOVO style so perhaps that would be
the right solution so this is an area that will have to change anyway
(or at least be augmented).
> I hope these comments make a bit of sense and are helpful.
Yes, thanks very much. I hope the responses are useful too.
Dave
> Second, there are some principles to apply when deciding to represent something as a class rather than a property. There's lots written on this that I'm not about to do justice to but basically the checks > I tend to use are:
> - is this intrinsic to the nature of the entity?
> - would this classification affect the relevance of other properties?
> - do these form a natural hierarchy?
> The classic example is the Ontology 101 example wine ontology. Should you represent Red Wine as a subClass of Wine or as a colour property of Wine? The common answer to this is to use > subClass because one thinks of Red and White wines so differently and different properties apply - you care about tannin levels in Red wine and sweetness in White.
For cases like this I prefer to follow Alan Rector's methodolgy of untangling hierarchies. That is instead of creating hierarchies that may potentially contain lots of multiple inheritence you explicitly model the properties of a class and let the reasoner worry about the hierarchy. Here the idea is to separate out classes into their atom components (so for wine this might be the colour). Create class hierarchies for the atom classes and then for OWL axioms for the non-atomic classes, e.g.:
RedWine = Wine and hasColour value red (in Manchester OWL syntax)
> Now let's separate the TypeOfEstablishment case from the two definitely Boolean cases (BoardingEstablishment and TrainingSchool).
>For TypeOfEstablishment then we had a few reasons for going with the class-based representation. Firstly, the name "TypeOf" seemed like a give-away to how the data owners might thing of it :-) >Secondly, it seems like these types would affect the relevance of other properties, for example an LA_Nursery_School would have nursery provision and would not be expected to have a sixth form. >Thirdly there is the possibility that the different fine-grained types of Independent school might fit under the top level classes of Independent School (though we haven't managed to do that).
So in this case (as we did at OS) I'd be incline to separate out the form of the object from it's function and again let the reasoner worry about the hierarchy. In this case you can create a hierarchy for building structures (though this will be quite shallow) and another one for function/purpose. You can then assemble these classes when defining the LA_Nursery_School. So assuming you are talking about the school building:
LA_Nursery_School = Building and hasPurpose some NurseryEducation
You can then add in other information for the purpose/function classes:
SixthFormEducation subClassOf Education
NurseryEducation subClassOf Education
NurseryEducation disjointFrom SixthFormEducation
Education disjointFrom Retail ...
etc.
I know OWL 2 support is not very wide spread amongst linked data applications (but I'm sure it will get there - hint hint :)) But we can at least publish OWL ontologies to give explicit definitions of classes to make it clear what we mean by them. I personally think the linked data community (correct me if I'm wrong) don't see the full importance of this yet. There have been many data integration examples where assumptions were made that things called the same thing were the same thing. I've found modelling classes in the way mentioned above is useful for even just computing your main class hierarchy without getting yourself in a tangled mess. Happy to write more later as I wrote this way too early on my day off :)
John
Apologies if I'm missing the point, but...
Thinking about this from a OO software point of view there's another
approach that might fit this sort of thing. Instead of a School being a
"type of" "BoardingEstablishment", or having a "BoardingEstablishment"
property that's TRUE/FALSE, couldn't the School "have a" (link to a)
"BoardingEstablishment" that in turn has properties about the number of
boarders, different location/building/name, additional fees, etc.?
In other words, don't try to lump all these boarding properties and
information into the School object itself, but link to another object that
holds the boarding-specific information. In OO terms this would be a
zero-to-many link, zero for a School with no boarding facilities. A
University-type boarding school, that might have different "halls of
residence" with different student numbers/locations/names/fees, could be
handled as a School that has multiple BoardingEstablishments.
"Inheritance versus composition" is the technical phrase for what I'm
trying to say. The "best" solution will depend on the available data, both
now and in the future...
Cheers!
Anthony
--
www.fonant.com - Quality web sites
Fonant Ltd is registered in England and Wales, company No. 7006596
Registered office: Grafton Lodge, 15 Grafton Road, Worthing, West Sussex,
BN11 1QR
I agree that this is a nice way to work, but the resulting ontology
can be a bit opaque if you're not used to description logic notation.
Could be that current tools have better visualisers though, I've not
tried for some time. My experience was that the output RDF was a bit
hard for other people to interpret in RDF syntax too.
- Steve
You have to be a bit careful when thinking about OO practice, a lot of
the OO use of the word "class" is at best misleading. E.g. it's common
to see in OO tutorials something like starting with a square class,
with one size value (e.g. x), and then adding rectangle as a subclass
of square with an additional size value (y).
From a mathematical point of view this is clearly incorrect (not all
rectangles are squares), but OO doesn't typically enforce that, it's
often not really a problem in applications, and a lot of practitioners
don't fully understand the meaning of classes and taxonomies.
- Steve
John Goodwin wrote:
> Dave Reynolds <dave.e....@googlemail.com> wrote
>
>
>> Second, there are some principles to apply when deciding to represent something as a class rather than a property. There's lots written on this that I'm not about to do justice to but basically the checks > I tend to use are:
>> - is this intrinsic to the nature of the entity?
>> - would this classification affect the relevance of other properties?
>> - do these form a natural hierarchy?
>
>> The classic example is the Ontology 101 example wine ontology. Should you represent Red Wine as a subClass of Wine or as a colour property of Wine? The common answer to this is to use > subClass because one thinks of Red and White wines so differently and different properties apply - you care about tannin levels in Red wine and sweetness in White.
>
> For cases like this I prefer to follow Alan Rector's methodolgy of untangling hierarchies. That is instead of creating hierarchies that may potentially contain lots of multiple inheritence you explicitly model the properties of a class and let the reasoner worry about the hierarchy.
Interesting. I've read a few of Alan's papers and I had always thought
he was advocating simple class hierarchies (each concept has one parent,
at each level concepts are distinct not exhaustive, stick to elementary
concepts) rather that doing away with classes altogether.
> Here the idea is to separate out classes into their atom components (so for wine this might be the colour). Create class hierarchies for the atom classes and then for OWL axioms for the non-atomic classes, e.g.:
>
> RedWine = Wine and hasColour value red (in Manchester OWL syntax)
Sure that works technically but in the wine case the argument is that
RedWine and WhiteWine are really different elementary concepts (which
you then place below Wine). Then you would have colours like "tawny red"
etc for RedWine and "straw" etc for WhiteWine. Again this is because the
range of values for other properties, indeed their cardinality, is
affected by this classification. That can be expressed in OWL and
reasoned over perfectly well without picking RedWine/WhiteWine out as
classes but part of the point of an ontology is to communicate how you
are conceptualizing the world.
What's more, from a linked data point of view, you would presumably
publish the results of the class inference closure as part of the
dataset. So a data browser would still see the "rdf:type RedWine"
assertions whether or not they were generated by inference or by direct
assertion. Whereas Chris' discussion is partly about whether its useful
to see them as classes at all.
>> Now let's separate the TypeOfEstablishment case from the two definitely Boolean cases (BoardingEstablishment and TrainingSchool).
>
>> For TypeOfEstablishment then we had a few reasons for going with the class-based representation. Firstly, the name "TypeOf" seemed like a give-away to how the data owners might thing of it :-) >Secondly, it seems like these types would affect the relevance of other properties, for example an LA_Nursery_School would have nursery provision and would not be expected to have a sixth form. >Thirdly there is the possibility that the different fine-grained types of Independent school might fit under the top level classes of Independent School (though we haven't managed to do that).
>
> So in this case (as we did at OS) I'd be incline to separate out the form of the object from it's function and again let the reasoner worry about the hierarchy. In this case you can create a hierarchy for building structures (though this will be quite shallow) and another one for function/purpose. You can then assemble these classes when defining the LA_Nursery_School. So assuming you are talking about the school building:
>
> LA_Nursery_School = Building and hasPurpose some NurseryEducation
I don't think we have any information on the buildings. I think all the
information is about the nature of the legal entity. However, we are
hampered by not yet understanding the semantics of this data - are these
categories fundamental (e.g. created by legislation) or simply descriptive?
> You can then add in other information for the purpose/function classes:
>
> SixthFormEducation subClassOf Education
> NurseryEducation subClassOf Education
> NurseryEducation disjointFrom SixthFormEducation
> Education disjointFrom Retail ...
>
> etc.
Yes, that's a nice way of doing it. I tried to keep our structure
isomorphic to what we had with Edubase but perhaps should look at
sketching something more general like this (and just preserving the
Edubase classification as a SKOS classification as, I think, Chris would
prefer).
It all comes down to purpose as usual. Our purpose was to reflect
Edubase. A better purpose would be to represent school information for
some range of usages and a representation that e.g. cleanly separates
notions of nursery, secondary, sixth form provision may be better.
Trouble is that mapping what's in Edubase to an intuitive approach like
this will require some help from people who understand why it's the way
it is. Why are Welsh_Establishments disjoint? What's the structural and
legal relationship between Primary Referral Units and the rest of the
School? Etc.
Maybe a good plan would be to switch TypeOfEstablishment back to a SKOS
categorization, sketch a more intuitive classification approach and then
put the two together once the meaning of the some of the data is a bit
clearer.
> I know OWL 2 support is not very wide spread amongst linked data applications (but I'm sure it will get there - hint hint :)) But we can at least publish OWL ontologies to give explicit definitions of classes to make it clear what we mean by them.
Publish OWL definitely, the compromise was to avoid OWL 2 constructs for
now.
> I personally think the linked data community (correct me if I'm wrong) don't see the full importance of this yet. There have been many data integration examples where assumptions were made that things called the same thing were the same thing.
Oh I agree. The challenge is to make sure the data makes enough sense
for people just coming at the data without studying the ontology but
also to lower the barrier to people using the associated ontology.
> I've found modelling classes in the way mentioned above is useful for even just computing your main class hierarchy without getting yourself in a tangled mess.
I'd claim that thinking of everything as a property can lead to just as
tangled a hierarchy, you just create the tangle through inference rather
than manually :-) Perhaps this is a discussion to have over a pint of
beer sometime. Perhaps we need a uk-gov-data-developers MeetUp.
> Happy to write more later as I wrote this way too early on my day off :)
I think we have different ontologies for the concept "day off" :-)
Cheers,
Dave
You do need to be careful mixing OO and ontologies but I see Steve has
already mentioned that.
> In other words, don't try to lump all these boarding properties and
> information into the School object itself, but link to another object
> that holds the boarding-specific information. In OO terms this would be
> a zero-to-many link, zero for a School with no boarding facilities. A
> University-type boarding school, that might have different "halls of
> residence" with different student numbers/locations/names/fees, could be
> handled as a School that has multiple BoardingEstablishments.
>
> "Inheritance versus composition" is the technical phrase for what I'm
> trying to say. The "best" solution will depend on the available data,
> both now and in the future...
Yes, definitely. That's why we separated out things like the Trust
associated with a trust school as a different entity.
I agree with your example, if you were trying to model a university you
would have separate notions of the University legal entity, the
University Sites and the Halls of Residence.
In the schools case then we don't really have the data to say much about
different sub-parts of the schools or really be sure they are different
parts. In particular, I had been assuming that "BoardingEstablishment"
was a matter of status of the School as a legal entity. I.e. you could
be approved for boarding independent of whether you have any actual
boarders right now. So I do think that is a classification of school
independent of whether in the long term you might be able to model the
"boarding unit" of the school as a separate entity, perhaps associated
with a separate set of buildings which in turn would have different
locations.
The nice thing about Linked data is that we don't have to do all this
ourselves. What the Edubase data gives us is a core set of URIs and
associated reference data for the Schools. Now people can hang more data
and more refined modelling off that backbone.
Supposing that someone somewhere does have information on the boarding
parts of boarding schools and could describe those as separate entities
with associated buildings, administrator etc. They could then publish
that as a dataset with links "boardingUnitOf" pointing back to the
Schools URIs. The web of data can grow to encompass these more refined
pictures without necessarily having to go back to some central authority.
This is one way that linked data/semantic web stuff differs from OO
practice. You can do external compositions.
Cheers,
Dave
Yes, it's important to remember that OO analysis is all about the problem
domain, and not real life. In software a "rectangle" can quite
legitimately be a "type of" "square", even though it isn't in the "real
world" of mathematics. :)
> This is one way that linked data/semantic web stuff differs from OO
> practice. You can do external compositions.
Yes, and I like that: I'm a fan of composition over inheritance for most
things. Which I why I suggested links to describe boarding facilities at
schools, rather than using particular classes of school or school
properties. Although I can see that if there isn't much information about
boarding facilities then a class or property provides the information
simply as part of the school data, without needing a more complex query to
extract it.
> Hi Steve,
>
> > I agree that this is a nice way to work, but the resulting
> ontology can be a bit opaque if you're not used to description logic
> notation.
>
> Sure. Protege has a number of ways to visualise the axioms and some
> people at OS (and the OWL community) have been working on CNLs for
> OWL ontologies. Rabbit is the one that Glen and other at OS started
> to develop. Sydney syntax is another one.
Great, I will give some newer tools a go.
> > Could be that current tools have better visualisers though, I've
> not tried for some time. My experience was that the output RDF was a
> bit hard for other people to interpret in RDF syntax too.
>
> Personally I quite like the Manchester OWL syntax..once you get used
> to it. There's been a lot of debate about whether RDF is really an
> appropriate syntax for OWL.
To me OWL's utility vanishes if you can no longer represent it in RDF.
In triples it can be relevant, even if you can't reason with it at
runtime, say you have a really large dataset, just for example :) If
there's some other syntax it can no longer be queried with SPARQL, so
can't be used to aid/inform querying.
That's a bit offtopic for this list though.
- Steve
Backward-chaining based reasoning (at runtime) across a very large data
sets is something Virtuoso has offered for a while re. sameAs,
subClassOf, equivalentClass, equivalentProperty, plus IFPs for a pretty
long time now [1].
Your statement above is really Quad Store specifc re. reasoning over
large data sets.
Links:
1. http://bit.ly/38Jlw4 -- Expanding disparate data about Michael
Jackson using a 7.5B+ live data set by reasoning using "sameAs" or
fuzzier foaf:name designated as an Inverse Functional Property based
local context rule
Re. the above, note the co-reference and indirect co-reference tabs,
also note how the HTTP URIs resolve to the same union of disparate data
about the subject in line with the precision that either approach
enables (so the fuzzier one is less accurate as demonstrated via some
obvious entries ).
Kingsley
> If there's some other syntax it can no longer be queried with SPARQL,
> so can't be used to aid/inform querying.
>
> That's a bit offtopic for this list though.
>
> - Steve
>
It's been a few years since I worked on back-chaining systems, but
IIRC you didn't get to do efficient subsumption reasoning over large
datasets that way, which is what we were discussing. Could well be
that it's now commonplace for all I know.
But yes, you can do transitive stuff and IFPs pretty easily, and in
quad stores too.
- Steve
Sure, and OWL still works fine as an ontology language if that
happens, but it may be harder to use it for Linked Data. The other
issue being inconsistencies. While it would be nice if all ontologies
and data were consistent w.r.t. each other, it seems a little
optimistic.
> As for large datasets - I know that two of the three profiles of OWL
> 2 are optimised for large datasets. Perhaps linked data apps should
> stick to those - when it comes to really large datasets. I think
> people are writing inference engines especially optimised for those
> languages (e.g. CEL, OWLGres etc.). No idea if they are forward or
> backward chaining though.
Yes, OWL2-RL seems particularly appropriate for that, I was looking at
the rec. the other day. I can't see us implementing it anytime soon
though, unless someone has a really strong usecase though. Some people
are working with our store and OWL, but I think OWL1.
Anyway, wandering really off topic now, I'll shut up :)
- Steve
I just came across this in the news, looks like NYC has exposed a
whole bunch of data about restaurants, public health records, public
parks, and so on. The dataset has been released alongside a $20,000
price for applications making use of the data.
http://www.nyc.gov/html/datamine/html/home/home.shtml
Regards,
Mischa
_________________________________
Mischa Tuffield
Email: mis...@mmt.me.uk
Homepage: http://mmt.me.uk/
FOAF: http://mmt.me.uk/foaf.rdf#mischa