SPARQL

Chris Wallace

unread,

Oct 2, 2009, 10:55:39 AM10/2/09

to UK Government Data Developers

Looks like a good start to what will be a massive project as more
datasets are referenced directly rather than via the department sites,
and more is available in some machine-readable format - amazing how
much PDF and XLS is used.

I've started to play with the SPARQL educational data

http://data.hmg.gov.uk/blog/using-sparql-our-education-datasets

and have a few queries.

The examples are not valid SPARQL - some loss of formatting I guess-
e.g. the first example

prefix sch-ont: http://education.data.gov.uk/ontology/school#
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
http://education.data.gov.uk/placeholder-id/administrativeDistrict/City-of-London;
}
ORDER BY ?name

should be

prefix sch-ont: <http://education.data.gov.uk/ontology/school#>
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://education.data.gov.uk/placeholder-id/administrativeDistrict/
City-of-London>;
}
ORDER BY ?name

(I hope the added angle brackets get through this interface!)

The output is only shown in SPARQL result format - options to
transform to a table with XSLT would be good

Also, I could not find any link to the school ontology itself ?

Leigh Dodds

unread,

Oct 2, 2009, 11:10:00 AM10/2/09

to uk-government-...@googlegroups.com

Hi,

Attached are the original queries, with the formatting fixed. Looks
like the angle brackets should be escaped in the original.

Cheers,

L.

2009/10/2 Chris Wallace <kit.w...@googlemail.com>:

> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>

--
Leigh Dodds
Programme Manager, Talis Platform
Talis
leigh...@talis.com
http://www.talis.com

queries.txt

Ian Davis

unread,

Oct 2, 2009, 11:13:40 AM10/2/09

to uk-government-...@googlegroups.com

On Fri, Oct 2, 2009 at 3:55 PM, Chris Wallace <kit.w...@googlemail.com> wrote:

The examples are not valid SPARQL - some loss of formatting I guess-
e.g. the first example

prefix sch-ont: http://education.data.gov.uk/ontology/school#
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
http://education.data.gov.uk/placeholder-id/administrativeDistrict/City-of-London;
}
ORDER BY ?name

should be

prefix sch-ont: <http://education.data.gov.uk/ontology/school#>
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://education.data.gov.uk/placeholder-id/administrativeDistrict/
City-of-London>;
}
ORDER BY ?name

(I hope the added angle brackets get through this interface!)

Yes it looks like angle brackets were not escaped properly in the blog post.

My understanding is that the data should be available as browseable Linked Data at some point soon, but the focus right now is on actually getting access to the data and getting early feedback (disclosure/disclaimer: I work for Talis who are hosting the RDF but I am not involved in the planning or implementation of the data.gov.uk site so this is my opinion only)

In the meantime you can explore the data a bit more using some little utilities that I have written. Dipper is a javascript based brower for data in the Talis Platform (it has some problems with older versions of IE). See <http://api.talis.com/stores/iand-dev1/items/dipper.html#s=govuk-education&q=http%3A%2F%2Feducation.data.gov.uk%2Fid%2Fschool%2F_522678> for an example

Another tool I wrote is called lodgrid which just does a simple search over the data and formats the results in a grid, picking the most frequently occuring properties for the columns. You can try it at <http://iandavis.com/2009/lodgrid/?store=govuk-environment&query=horticultural&columns=5>

Note that until the linked data parts of data.gov.uk are fully in place the links to the resources won't work. However you can click on the image of the bird in the first column to get to the Dipper utility.

Hope this helps,

Ian
--
Ian Davis, Chief Technology Officer, Talis
tel: +44 (0) 870 400 5000
cell: +44 (0) 7525 941 919

I'm trialling Google Apps using a temporary email address. Email sent to my usual address (ian....@talis.com) will still reach me.

kit.wallace

unread,

Oct 2, 2009, 11:32:00 AM10/2/09

to UK Government Data Developers

Ian, I really like Dipper are these tools going to be part of the
final delivery?

Leigh , It would be great to see the design and rationale for the
school vocabulary, especially if this is to be a key example of the
Govt Data RDF approach. There are clearly design choices here -such as
using only newly minted terms in the vocabulary .

Chris

Leigh Dodds

unread,

Oct 2, 2009, 11:35:51 AM10/2/09

to uk-government-...@googlegroups.com

Hi,

2009/10/2 kit.wallace <kit.w...@googlemail.com>:

> Leigh , It would be great to see the design and rationale for the
> school vocabulary, especially if this is to be a key example of the
> Govt Data RDF approach. There are clearly design choices here -such as
> using only newly minted terms in the vocabulary .

The Edubase data was converted to RDF by a team of people at HP
including Stewart Williams, Dave Reynolds and Brian McBride. I suspect
at least one of them are on this list (or soon will be).

Cheers,

L.

Simon Grice (mashup*)

unread,

Oct 2, 2009, 11:46:30 AM10/2/09

to uk-government-...@googlegroups.com

hi
is there a list of data available via the SPARQL interface ?

we've already starting playing at

http://belocal.com/

Schools

-

I'll update the group with our expeirences as they come in.

-

S

Ian Davis

unread,

Oct 2, 2009, 11:53:02 AM10/2/09

to uk-government-...@googlegroups.com

On Friday, October 2, 2009, kit.wallace <kit.w...@googlemail.com> wrote:
>
> Ian, I really like Dipper are these tools going to be part of the
> final delivery?
>

Chris, these are just general purpose utilitites not part of the gov
activity. Hopefully we'll see lots more like them as more open data is
made available. They are open source if anyone wants to take them and
build on them (i'll send urls later - am writing this on my phone)

ian

> Leigh , It would be great to see the design and rationale for the
> school vocabulary, especially if this is to be a key example of the
> Govt Data RDF approach. There are clearly design choices here -such as
> using only newly minted terms in the vocabulary .
>
> Chris
>
>

> ______________________________________________________________________
> This email has been scanned by the MessageLabs Email Security System.
> For more information please visit http://www.messagelabs.com/email
> ______________________________________________________________________
>

--

Leigh Dodds

unread,

Oct 2, 2009, 11:54:22 AM10/2/09

to uk-government-...@googlegroups.com

Hi,

2009/10/2 Simon Grice (mashup*) <si...@mashupevent.com>:

> is there a list of data available via the SPARQL interface ?

Do you mean metadata for the list of datasets? Thats on the roadmap.
The CKAN infrastructure is handling the basic dataset directory, but
we'll be converting that to RDF and loading it into a store with a
SPARQL endpoint so you'll be able to query to find specific datasets.

>
> we've already starting playing at
>
> http://belocal.com/
>
> Schools

Nice!

Cheers,

L.

Simon Grice (mashup*)

unread,

Oct 2, 2009, 12:10:46 PM10/2/09

to uk-government-...@googlegroups.com

well was actually thinking of just the list of data that can be access
at the moment through SPARQL

and structure of that data

S

Stuart Williams

unread,

Oct 2, 2009, 4:26:42 PM10/2/09

to uk-government-...@googlegroups.com

Hello Leigh, Simon,

Sorry to be slow to pick up, it's been a bit of a heads down day...

It's unfortuante the that the linked data URIs and ontology files aren't up yet (afaik) at URIs wher you can pull them down and take a look. I've attached "school.owl" which should in time be served up from the appropriate location.

We started the work form a CSV dump of the edubase data set. Apart from staring at it for a while, we ran some tools over it to spot some patterns: apparent use of controlled vocabularies; numeric values; booleans and pseudo booleans (true/false/not-applicable); type markers (classes).... and the like. We also gave some light thought to change over time. Some properties represent values that may change over time eg. a fee payable to an independent school:

school:_100001
      rdf:type school-ont:School ,
                  school-ont:TypeOfEstablishment_Other_Independent_School ;
.....
      school-ont:highDayFee
              [ foundation:datum
                        [ foundation:date "2009-05-19"^^xsd:date ;
                          foundation:val "11565"^^xsd:int
                        ]
              ] ;

Whilst at present the Edubase data set only seems to present on value, we have some provision tor represent the change of such values over time. There is more work to be done there.

Likewise we have collected the census numbers into a structure stamped with that date of the census was taken. Originally we'd modelled the census data as a number of indepent time varying properties - but that led (as you might expect) to a triple explosion and also seem possibly wrong depending on how the value were updated - individually or en-mass.

We have used SKOS for the controlled vocabulary terms. These have been given URI names that derived from the values appearing in the corresponding CVS columns and the column name.

The analysis phase basically proposed which patterns to apply to which columns. Then bit of manual supervision clean things up a bit. The ontology was then generated from the tuned result. The ontology together with some rules was then used to process the data to generate the RDF that you are seeing.

I hope that's of some help, I'm writing this a bit quickly and at the end of a long day. I'm sure that my colleagues will probably have more to add - but I hope this is at least of some help for now.

Best regards

Stuart
--

2009/10/2 Leigh Dodds <leigh...@talis.com>

schoolOntology.ttl

Dave Reynolds

unread,

Oct 3, 2009, 8:47:59 AM10/3/09

to uk-government-...@googlegroups.com

Hi Kit,

kit.wallace wrote:
> Leigh , It would be great to see the design and rationale for the
> school vocabulary, especially if this is to be a key example of the
> Govt Data RDF approach. There are clearly design choices here -such as
> using only newly minted terms in the vocabulary .

A couple of things to add to Stuart's response.

The vocabulary is in no way set in stone. It's a very preliminary
version and needs more validation with the data owners, amongst others.
So any suggestions you have for terms we should be reusing, or
cross-linking to, would be great.

For this starting point we erred on the side of keeping things as close
to the original data as possible. Partly because that meant we could
automate a lot of the vocabulary generation (and data generation), and
partly because we didn't have access to the data schema or experts on
the dataset so didn't want to guess at semantics without some validation.

The plan is to cross link the vocabulary with other vocabularies. The
"foundation" ontology has cross links to other upper level ontologies
like opencyc, proton etc. We'll add similar links to the schools
vocabulary itself shortly.
[There's subtleties in there about when you can safely use the rdfs/owl
vocabulary mapping terms. For now we are just cross-linking using an
ontology annotation.]

We did look at other schools related ontologies to see which ones we
could either use or cross link to. There isn't that much out there that
we could find. The most relevant is probably the schools part of the
dbpedia ontology [1]. There is surprisingly little overlap with that.
The main properties we could use would be things like
dbpedia:headteacher. However, the person-specific information in Edubase
is being held back from the RDF right now (both data protection issues
and the fact that representing people gets into a whole can of worms
that we wanted to avoid for the alpha release).

There are some areas that are known to need further work, especially the
representation of location. We've started to do some cross-linking with
the OS administrative geography ontology which is a start in that direction.

Cheers,
Dave

[1] http://wiki.dbpedia.org/Ontology

Kingsley Idehen

unread,

Oct 3, 2009, 11:47:38 AM10/3/09

to uk-government-...@googlegroups.com

Stuart Williams wrote:
> Hello Leigh, Simon,
>
> Sorry to be slow to pick up, it's been a bit of a heads down day...
>
> It's unfortuante the that the linked data URIs and ontology files
> aren't up yet (afaik) at URIs wher you can pull them down and take a
> look. I've attached "school.owl" which should in time be served up
> from the appropriate location.

There are some basic Linked Data discovery patterns that need to be
adhered to here that include any combination of the following:

1. Express triples within HTML docs using RDFa
2. Use of <link/> within <head/> to expose a variety of resource URLs
that expose a variety of metadata representations (where each item in
the metadata doc is endowed with a generic HTTP URI i.e., a
de-referencable URI)
3. Transparent Content Negotiation rules on the server the allow user
agents and servers arrive at preferred metadata representations based on
quality of service algorithms within HTTP headers.

Even if you don't have an RDF model based data representation, nothing
stops you using the methods above to expose the structured data for a
given data space.

Kingsley

> <mailto:leigh...@talis.com>>
>
>
> Hi,
>
> 2009/10/2 kit.wallace <kit.w...@googlemail.com
> <mailto:kit.w...@googlemail.com>>:

> > Leigh , It would be great to see the design and rationale for the
> > school vocabulary, especially if this is to be a key example of the
> > Govt Data RDF approach. There are clearly design choices here
> -such as
> > using only newly minted terms in the vocabulary .
>
> The Edubase data was converted to RDF by a team of people at HP
> including Stewart Williams, Dave Reynolds and Brian McBride. I suspect
> at least one of them are on this list (or soon will be).
>
> Cheers,
>
> L.
>
> --
> Leigh Dodds
> Programme Manager, Talis Platform
> Talis

> leigh...@talis.com <mailto:leigh...@talis.com>
> http://www.talis.com
>
>

--

Regards,

Kingsley Idehen Weblog: http://www.openlinksw.com/blog/~kidehen
President & CEO
OpenLink Software Web: http://www.openlinksw.com

kit.wallace

unread,

Oct 6, 2009, 4:06:41 PM10/6/09

to UK Government Data Developers

Thanks Dave and Stuart for providing those insights to your first cut
design and the vocab. If I can take up your offer to provide a bit of
feedback, mindful that especially in RDF there are many ways to skin
the cat.

This is predicated by a few principles which I have at the back of my
mind:

a) The data structure should be discoverable from the data itself -
separate vocabs are nice but don't describe the usage of all the
vocabs in a given dataset. Navigating the different structures for
vocabs can present its own problems.

b) The conceptual model should be independent of the particular data
model or serialisation used - the same data will be available in
various formats, of which RDF is just one. RDF creates its own
problems in representation of course

c) RDF and linked data are separate concerns - linking is about using
globally unique identifiers for entities and properties and these can
be used in XML or CSV as well as RDF

1) type/category conflation

The bootstrapping query select distinct ?x where { ?s a ?x} lists 41
different values. However one can guess that only 4 of these are
types in the entity type(class) sense ( School, CensusRecord,
vcard:Address and vcard:VCard). By inspection the rest appear to be a
number of School categories - TypeOfEstablishment,
IndependentSchoolType, BoardingEstablishment and TrainingSchool.

Reading the vocab, it appears that these types could be equivalently
represented as boolean (if not mutually exclusive) or enumerated
values (if exclusive). There are many other school properties which
could represented as types in this way, so it's not clear to me why
some properties have been promoted to types and others left as
predicates.

Looking at the eduBase site, it appears that TypeofEstablishment is an
enumerated value and hence a set of exclusive categories, but
represented as rdf:types they are independent boolean categories, so
some semantics have been lost. It could be put back with owl
but ... On the other hand, BoardingEstablishment seems to be
equivalent to the hasBoarders predicate.

The use of rdf:type instead of boolean or enumerated value predicates
seems to have lost some meaning as well as inflating the type space.
An export as say CSV could use columns for these properties, so a
design which created URIs for the properties and values would be
reusable in the CSV exports as well.

2) rdfs:label - this is a handy predicate to use because it gives
every significant subject a descriptive, humanly readable label. Some
types have an rdfs:label but notably school does not - I think it
would be good practice if entities did include this predicate to
assist generic browsers. Following on from that, it would be easy to
add a label to types and predicates to provide a basic level of
documentation retrievable from the dataset.

3) which properties are treated as time-dependent seems rather
arbitrary - I would be less interested in what it used to cost to go
to a public school than I would be to see which schools had moved in
or out of specialMeasures. As I can see from other posts to this list,
handling time is a core problem for RDF (and any data model) but I
doubt that a piecemeal approach would be useful. In any case, it would
surely present more problems when refreshing the dataset since it now
can't be a simple replacement. Perhaps separate dated snapshots as
separate graphs would be a preferable initial approach?.

I hope these comments make a bit of sense and are helpful.

Chris Wallace
Senior Lecturer
Department of Information Science and Digital Media
UWE Bristol

Kingsley Idehen

unread,

Oct 6, 2009, 4:22:44 PM10/6/09

to uk-government-...@googlegroups.com

All,

I see there is a lot of SPARQL in the air today.

Note:
http://demo.openlinksw.com/sparql_demo/

It has a number of things for new and advanced SPARQL users.

Dave Reynolds

unread,

Oct 6, 2009, 5:39:04 PM10/6/09

to uk-government-...@googlegroups.com

Hi Chris,

[Or should I use "Kit"?]

kit.wallace wrote:
> Thanks Dave and Stuart for providing those insights to your first cut
> design and the vocab. If I can take up your offer to provide a bit of
> feedback, mindful that especially in RDF there are many ways to skin
> the cat.

Thanks. All feedback useful, I'll respond inline.

> This is predicated by a few principles which I have at the back of my
> mind:
>
> a) The data structure should be discoverable from the data itself -
> separate vocabs are nice but don't describe the usage of all the
> vocabs in a given dataset. Navigating the different structures for
> vocabs can present its own problems.

Somewhat agree but vocabs express things that the data itself can't,
like type hierarchies so you should expect to check vocabs as well.

In the medium term I'd hope to see good tools for browsing and reusing
the vocabs relevant to the data.gov data.

> b) The conceptual model should be independent of the particular data
> model or serialisation used - the same data will be available in
> various formats, of which RDF is just one. RDF creates its own
> problems in representation of course

Yes.

> c) RDF and linked data are separate concerns - linking is about using
> globally unique identifiers for entities and properties and these can
> be used in XML or CSV as well as RDF

Yes, though I find thinking about it from the RDF point of view first
helps. It forces you to think about links between global identifiers and
not accidentally rely on file structure.

> 1) type/category conflation
>
> The bootstrapping query select distinct ?x where { ?s a ?x} lists 41
> different values. However one can guess that only 4 of these are
> types in the entity type(class) sense ( School, CensusRecord,
> vcard:Address and vcard:VCard). By inspection the rest appear to be a
> number of School categories - TypeOfEstablishment,
> IndependentSchoolType, BoardingEstablishment and TrainingSchool.
>
> Reading the vocab, it appears that these types could be equivalently
> represented as boolean (if not mutually exclusive) or enumerated
> values (if exclusive). There are many other school properties which
> could represented as types in this way, so it's not clear to me why
> some properties have been promoted to types and others left as
> predicates.
>
> Looking at the eduBase site, it appears that TypeofEstablishment is an
> enumerated value and hence a set of exclusive categories, but
> represented as rdf:types they are independent boolean categories, so
> some semantics have been lost. It could be put back with owl
> but ... On the other hand, BoardingEstablishment seems to be
> equivalent to the hasBoarders predicate.
>
> The use of rdf:type instead of boolean or enumerated value predicates
> seems to have lost some meaning as well as inflating the type space.
> An export as say CSV could use columns for these properties, so a
> design which created URIs for the properties and values would be
> reusable in the CSV exports as well.

This is a tricky one so apologies if the response gets a bit long.

First, I should say that we did this initial work without a data
dictionary and without access to the people who own the data. Our plan
is to sit down with them "real soon now", get more insight into the
semantics of the data, and then update the model to match. However, it
was better to get something usable out there to get feedback like this
rather than wait for perfection ;-)

Second, there are some principles to apply when deciding to represent
something as a class rather than a property. There's lots written on
this that I'm not about to do justice to but basically the checks I tend
to use are:
- is this intrinsic to the nature of the entity?
- would this classification affect the relevance of other properties?
- do these form a natural hierarchy?

The classic example is the Ontology 101 example wine ontology. Should
you represent Red Wine as a subClass of Wine or as a colour property of
Wine? The common answer to this is to use subClass because one thinks of
Red and White wines so differently and different properties apply - you
care about tannin levels in Red wine and sweetness in White.

Now let's separate the TypeOfEstablishment case from the two definitely
Boolean cases (BoardingEstablishment and TrainingSchool).

For TypeOfEstablishment then we had a few reasons for going with the
class-based representation. Firstly, the name "TypeOf" seemed like a
give-away to how the data owners might thing of it :-) Secondly, it
seems like these types would affect the relevance of other properties,
for example an LA_Nursery_School would have nursery provision and would
not be expected to have a sixth form. Thirdly there is the possibility
that the different fine-grained types of Independent school might fit
under the top level classes of Independent School (though we haven't
managed to do that).

I agree that in the Edubase data these types are mutually exclusive and
this information has been lost. If we stick with the use of classes then
we should add allDisjoint axioms in the ontology. I didn't do that
initially because that is easier to do in OWL 2 but OWL 2 support is not
widespread. But also I wondered if some of these are really mutually
exclusive or could in principle overlap in other datasets.

As for "inflating the type space" ... well yes but I'm not sure that's a
problem. I would argue that for data like this the SPARQL end point
should contain both the data and the ontology, but as separate named
graphs. That way you could find all types and separate root types from
subtypes but still query just the data if you need to.

None of that is definitive of course. In particular it is not clear how
"intrinsic" these types are. So if after a little more consultation is
the consensus is to shift those to classification labels (i.e. yet more
SKOS Concepts) instead of classes then that'd be fine.

Then there are a couple of boolean cases BoardingEstablishment. In that
case I went with a class on the grounds that BoardingEstablishments have
boarders and there are several properties about boarders, numbers of
boarders and you could imagine more (such as the boarding component of
fees). So on the "it changes what properties you would look for"
argument we made that a class. This is even more subjective than for
TypeOfEstablishment so I definitely wouldn't die in a ditch over that one!

> 2) rdfs:label - this is a handy predicate to use because it gives
> every significant subject a descriptive, humanly readable label. Some
> types have an rdfs:label but notably school does not - I think it
> would be good practice if entities did include this predicate to
> assist generic browsers. Following on from that, it would be easy to
> add a label to types and predicates to provide a basic level of
> documentation retrievable from the dataset.

Odd. In the SVN copy of the ontology, and as far as I can see in the
version Stuart posted, then :School does have an rdfs:label, certainly
everything in the ontology is supposed to.

Or do you mean the school instances ... checks ... argh you are right
they don't and should do. Will get that fixed.

> 3) which properties are treated as time-dependent seems rather
> arbitrary - I would be less interested in what it used to cost to go
> to a public school than I would be to see which schools had moved in
> or out of specialMeasures. As I can see from other posts to this list,
> handling time is a core problem for RDF (and any data model) but I
> doubt that a piecemeal approach would be useful. In any case, it would
> surely present more problems when refreshing the dataset since it now
> can't be a simple replacement. Perhaps separate dated snapshots as
> separate graphs would be a preferable initial approach?.

Yes, tricky, a little arbitrary perhaps but not done without thought.

Essentially absolutely any property could change - name, headmaster,
location, ward. Part of the design pattern for government URI sets
includes some notion of dated versions so you can get the latest version
but also go back to earlier ones. So indeed in the medium term I'd
expect the sort of structure you suggest.

However, I'd argue that there should also be a first class historical
record in the RDF (but again ideally in a different graph) to make it
easier to do queries over that history. There are various ways of
representing such historical values and we didn't want to hold up the
publication while deciding on the right ones for those.
However, if every single property were represented as a full
context-dependent n-ary relation the data would be a lot harder to
understand and query so going with a "now" snapshot, which is the data
we had, seemed reasonable.

Then there are are properties which are explictly time dependent they
are sampled at known times and we might expect to have multiple samples
eventually even if we don't know about other changes. Specifically the
census data like class sizes and probably fees are in that category. I
think you are right that specialMeasures probably should be as well.

So the question is do we make absolutely everything a time-dependent
property and gain uniformity at the cost of ease of use. Or do we pick
out the ones that seem most like time-varying statistics and treat those
as a special case. We went with the latter but as the feedback
accumulates that decision should definitely be revisited.

As you've probably seen from the statistics discussion we should also
publish the time-varying statistics SCOVO style so perhaps that would be
the right solution so this is an area that will have to change anyway
(or at least be augmented).

> I hope these comments make a bit of sense and are helpful.

Yes, thanks very much. I hope the responses are useful too.

Dave

John Goodwin

unread,

Oct 7, 2009, 2:27:51 AM10/7/09

to uk-government-...@googlegroups.com

Dave Reynolds <dave.e....@googlemail.com> wrote

> Second, there are some principles to apply when deciding to represent something as a class rather than a property. There's lots written on this that I'm not about to do justice to but basically the checks > I tend to use are:
> - is this intrinsic to the nature of the entity?
> - would this classification affect the relevance of other properties?
> - do these form a natural hierarchy?

> The classic example is the Ontology 101 example wine ontology. Should you represent Red Wine as a subClass of Wine or as a colour property of Wine? The common answer to this is to use > subClass because one thinks of Red and White wines so differently and different properties apply - you care about tannin levels in Red wine and sweetness in White.

For cases like this I prefer to follow Alan Rector's methodolgy of untangling hierarchies. That is instead of creating hierarchies that may potentially contain lots of multiple inheritence you explicitly model the properties of a class and let the reasoner worry about the hierarchy. Here the idea is to separate out classes into their atom components (so for wine this might be the colour). Create class hierarchies for the atom classes and then for OWL axioms for the non-atomic classes, e.g.:

RedWine = Wine and hasColour value red (in Manchester OWL syntax)

> Now let's separate the TypeOfEstablishment case from the two definitely Boolean cases (BoardingEstablishment and TrainingSchool).

>For TypeOfEstablishment then we had a few reasons for going with the class-based representation. Firstly, the name "TypeOf" seemed like a give-away to how the data owners might thing of it :-) >Secondly, it seems like these types would affect the relevance of other properties, for example an LA_Nursery_School would have nursery provision and would not be expected to have a sixth form. >Thirdly there is the possibility that the different fine-grained types of Independent school might fit under the top level classes of Independent School (though we haven't managed to do that).

So in this case (as we did at OS) I'd be incline to separate out the form of the object from it's function and again let the reasoner worry about the hierarchy. In this case you can create a hierarchy for building structures (though this will be quite shallow) and another one for function/purpose. You can then assemble these classes when defining the LA_Nursery_School. So assuming you are talking about the school building:

LA_Nursery_School = Building and hasPurpose some NurseryEducation

You can then add in other information for the purpose/function classes:

SixthFormEducation subClassOf Education
NurseryEducation subClassOf Education
NurseryEducation disjointFrom SixthFormEducation
Education disjointFrom Retail ...

etc.

I know OWL 2 support is not very wide spread amongst linked data applications (but I'm sure it will get there - hint hint :)) But we can at least publish OWL ontologies to give explicit definitions of classes to make it clear what we mean by them. I personally think the linked data community (correct me if I'm wrong) don't see the full importance of this yet. There have been many data integration examples where assumptions were made that things called the same thing were the same thing. I've found modelling classes in the way mentioned above is useful for even just computing your main class hierarchy without getting yourself in a tangled mess. Happy to write more later as I wrote this way too early on my day off :)

John

kit.wallace

unread,

Oct 7, 2009, 2:42:22 AM10/7/09

to UK Government Data Developers

Dave, many thanks for sharing these insights into the modelling.

Just a quick addition to the type/predicate debate, the problem I see
is that in promoting TypeofEstablishment values to subclassses, you
lose the concept of TypeofEstablishment itself (now only residual in
the stem of the subclassnames) as well as the exclusivity constraint
without gaining anything useful in the current dataset. I cant now
list the possible values of this conceptual property and to describe
the concept, I'd have to create an (abstract) subclass in the vocab to
hang the description on I cant create an interface which allows a
visitor to select a specific type (without the vocab). There may be
gains in some future if as you say it turns out that future edubase
modellers change their minds but there are so many possible futures.
Better I think to retain the alignment with the current edubase
conceptual model as long as it is the master from which the RDF is one
serialisation.

Chris

Anthony Cartmell

unread,

Oct 7, 2009, 3:52:09 AM10/7/09

to uk-government-...@googlegroups.com

>> 1) type/category conflation
<snip>

> This is a tricky one so apologies if the response gets a bit long.

<snip>

> Then there are a couple of boolean cases BoardingEstablishment. In that
> case I went with a class on the grounds that BoardingEstablishments have
> boarders and there are several properties about boarders, numbers of
> boarders and you could imagine more (such as the boarding component of
> fees). So on the "it changes what properties you would look for"
> argument we made that a class. This is even more subjective than for
> TypeOfEstablishment so I definitely wouldn't die in a ditch over that
> one!

Apologies if I'm missing the point, but...

Thinking about this from a OO software point of view there's another
approach that might fit this sort of thing. Instead of a School being a
"type of" "BoardingEstablishment", or having a "BoardingEstablishment"
property that's TRUE/FALSE, couldn't the School "have a" (link to a)
"BoardingEstablishment" that in turn has properties about the number of
boarders, different location/building/name, additional fees, etc.?

In other words, don't try to lump all these boarding properties and
information into the School object itself, but link to another object that
holds the boarding-specific information. In OO terms this would be a
zero-to-many link, zero for a School with no boarding facilities. A
University-type boarding school, that might have different "halls of
residence" with different student numbers/locations/names/fees, could be
handled as a School that has multiple BoardingEstablishments.

"Inheritance versus composition" is the technical phrase for what I'm
trying to say. The "best" solution will depend on the available data, both
now and in the future...

Cheers!

Anthony
--
www.fonant.com - Quality web sites
Fonant Ltd is registered in England and Wales, company No. 7006596
Registered office: Grafton Lodge, 15 Grafton Road, Worthing, West Sussex,
BN11 1QR

Steve Harris

unread,

Oct 7, 2009, 5:02:31 AM10/7/09

to uk-government-...@googlegroups.com

On 7 Oct 2009, at 07:27, John Goodwin wrote:
>
>> Second, there are some principles to apply when deciding to
>> represent something as a class rather than a property. There's lots
>> written on this that I'm not about to do justice to but basically
>> the checks > I tend to use are:
>> - is this intrinsic to the nature of the entity?
>> - would this classification affect the relevance of other properties?
>> - do these form a natural hierarchy?
>
>> The classic example is the Ontology 101 example wine ontology.
>> Should you represent Red Wine as a subClass of Wine or as a colour
>> property of Wine? The common answer to this is to use > subClass
>> because one thinks of Red and White wines so differently and
>> different properties apply - you care about tannin levels in Red
>> wine and sweetness in White.
>
> For cases like this I prefer to follow Alan Rector's methodolgy of
> untangling hierarchies. That is instead of creating hierarchies that
> may potentially contain lots of multiple inheritence you explicitly
> model the properties of a class and let the reasoner worry about the
> hierarchy. Here the idea is to separate out classes into their atom
> components (so for wine this might be the colour). Create class
> hierarchies for the atom classes and then for OWL axioms for the non-
> atomic classes, e.g.:
>
> RedWine = Wine and hasColour value red (in Manchester OWL syntax)

I agree that this is a nice way to work, but the resulting ontology
can be a bit opaque if you're not used to description logic notation.

Could be that current tools have better visualisers though, I've not
tried for some time. My experience was that the output RDF was a bit
hard for other people to interpret in RDF syntax too.

- Steve

Steve Harris

unread,

Oct 7, 2009, 5:19:39 AM10/7/09

to uk-government-...@googlegroups.com

On 7 Oct 2009, at 08:52, Anthony Cartmell wrote:
>
>>> 1) type/category conflation
> <snip>
>> This is a tricky one so apologies if the response gets a bit long.
> <snip>
>> Then there are a couple of boolean cases BoardingEstablishment. In
>> that case I went with a class on the grounds that
>> BoardingEstablishments have boarders and there are several
>> properties about boarders, numbers of boarders and you could
>> imagine more (such as the boarding component of fees). So on the
>> "it changes what properties you would look for" argument we made
>> that a class. This is even more subjective than for
>> TypeOfEstablishment so I definitely wouldn't die in a ditch over
>> that one!
>
> Apologies if I'm missing the point, but...
>
> Thinking about this from a OO software point of view there's another
> approach that might fit this sort of thing. Instead of a School
> being a "type of" "BoardingEstablishment", or having a
> "BoardingEstablishment" property that's TRUE/FALSE, couldn't the
> School "have a" (link to a) "BoardingEstablishment" that in turn has
> properties about the number of boarders, different location/building/
> name, additional fees, etc.?

You have to be a bit careful when thinking about OO practice, a lot of
the OO use of the word "class" is at best misleading. E.g. it's common
to see in OO tutorials something like starting with a square class,
with one size value (e.g. x), and then adding rectangle as a subclass
of square with an additional size value (y).

From a mathematical point of view this is clearly incorrect (not all
rectangles are squares), but OO doesn't typically enforce that, it's
often not really a problem in applications, and a lot of practitioners
don't fully understand the meaning of classes and taxonomies.

- Steve

Dave Reynolds

unread,

Oct 7, 2009, 5:53:53 AM10/7/09

to uk-government-...@googlegroups.com

Hi John,

John Goodwin wrote:
> Dave Reynolds <dave.e....@googlemail.com> wrote
>
>
>> Second, there are some principles to apply when deciding to represent something as a class rather than a property. There's lots written on this that I'm not about to do justice to but basically the checks > I tend to use are:
>> - is this intrinsic to the nature of the entity?
>> - would this classification affect the relevance of other properties?
>> - do these form a natural hierarchy?
>
>> The classic example is the Ontology 101 example wine ontology. Should you represent Red Wine as a subClass of Wine or as a colour property of Wine? The common answer to this is to use > subClass because one thinks of Red and White wines so differently and different properties apply - you care about tannin levels in Red wine and sweetness in White.
>
> For cases like this I prefer to follow Alan Rector's methodolgy of untangling hierarchies. That is instead of creating hierarchies that may potentially contain lots of multiple inheritence you explicitly model the properties of a class and let the reasoner worry about the hierarchy.

Interesting. I've read a few of Alan's papers and I had always thought
he was advocating simple class hierarchies (each concept has one parent,
at each level concepts are distinct not exhaustive, stick to elementary
concepts) rather that doing away with classes altogether.

> Here the idea is to separate out classes into their atom components (so for wine this might be the colour). Create class hierarchies for the atom classes and then for OWL axioms for the non-atomic classes, e.g.:
>
> RedWine = Wine and hasColour value red (in Manchester OWL syntax)

Sure that works technically but in the wine case the argument is that
RedWine and WhiteWine are really different elementary concepts (which
you then place below Wine). Then you would have colours like "tawny red"
etc for RedWine and "straw" etc for WhiteWine. Again this is because the
range of values for other properties, indeed their cardinality, is
affected by this classification. That can be expressed in OWL and
reasoned over perfectly well without picking RedWine/WhiteWine out as
classes but part of the point of an ontology is to communicate how you
are conceptualizing the world.

What's more, from a linked data point of view, you would presumably
publish the results of the class inference closure as part of the
dataset. So a data browser would still see the "rdf:type RedWine"
assertions whether or not they were generated by inference or by direct
assertion. Whereas Chris' discussion is partly about whether its useful
to see them as classes at all.

>> Now let's separate the TypeOfEstablishment case from the two definitely Boolean cases (BoardingEstablishment and TrainingSchool).
>
>> For TypeOfEstablishment then we had a few reasons for going with the class-based representation. Firstly, the name "TypeOf" seemed like a give-away to how the data owners might thing of it :-) >Secondly, it seems like these types would affect the relevance of other properties, for example an LA_Nursery_School would have nursery provision and would not be expected to have a sixth form. >Thirdly there is the possibility that the different fine-grained types of Independent school might fit under the top level classes of Independent School (though we haven't managed to do that).
>
> So in this case (as we did at OS) I'd be incline to separate out the form of the object from it's function and again let the reasoner worry about the hierarchy. In this case you can create a hierarchy for building structures (though this will be quite shallow) and another one for function/purpose. You can then assemble these classes when defining the LA_Nursery_School. So assuming you are talking about the school building:
>
> LA_Nursery_School = Building and hasPurpose some NurseryEducation

I don't think we have any information on the buildings. I think all the
information is about the nature of the legal entity. However, we are
hampered by not yet understanding the semantics of this data - are these
categories fundamental (e.g. created by legislation) or simply descriptive?

> You can then add in other information for the purpose/function classes:
>
> SixthFormEducation subClassOf Education
> NurseryEducation subClassOf Education
> NurseryEducation disjointFrom SixthFormEducation
> Education disjointFrom Retail ...
>
> etc.

Yes, that's a nice way of doing it. I tried to keep our structure
isomorphic to what we had with Edubase but perhaps should look at
sketching something more general like this (and just preserving the
Edubase classification as a SKOS classification as, I think, Chris would
prefer).

It all comes down to purpose as usual. Our purpose was to reflect
Edubase. A better purpose would be to represent school information for
some range of usages and a representation that e.g. cleanly separates
notions of nursery, secondary, sixth form provision may be better.

Trouble is that mapping what's in Edubase to an intuitive approach like
this will require some help from people who understand why it's the way
it is. Why are Welsh_Establishments disjoint? What's the structural and
legal relationship between Primary Referral Units and the rest of the
School? Etc.

Maybe a good plan would be to switch TypeOfEstablishment back to a SKOS
categorization, sketch a more intuitive classification approach and then
put the two together once the meaning of the some of the data is a bit
clearer.

> I know OWL 2 support is not very wide spread amongst linked data applications (but I'm sure it will get there - hint hint :)) But we can at least publish OWL ontologies to give explicit definitions of classes to make it clear what we mean by them.

Publish OWL definitely, the compromise was to avoid OWL 2 constructs for
now.

> I personally think the linked data community (correct me if I'm wrong) don't see the full importance of this yet. There have been many data integration examples where assumptions were made that things called the same thing were the same thing.

Oh I agree. The challenge is to make sure the data makes enough sense
for people just coming at the data without studying the ontology but
also to lower the barrier to people using the associated ontology.

> I've found modelling classes in the way mentioned above is useful for even just computing your main class hierarchy without getting yourself in a tangled mess.

I'd claim that thinking of everything as a property can lead to just as
tangled a hierarchy, you just create the tangle through inference rather
than manually :-) Perhaps this is a discussion to have over a pint of
beer sometime. Perhaps we need a uk-gov-data-developers MeetUp.

> Happy to write more later as I wrote this way too early on my day off :)

I think we have different ontologies for the concept "day off" :-)

Cheers,
Dave

Dave Reynolds

unread,

Oct 7, 2009, 6:20:36 AM10/7/09

to uk-government-...@googlegroups.com

Anthony Cartmell wrote:
>
>>> 1) type/category conflation
> <snip>
>> This is a tricky one so apologies if the response gets a bit long.
> <snip>
>> Then there are a couple of boolean cases BoardingEstablishment. In
>> that case I went with a class on the grounds that
>> BoardingEstablishments have boarders and there are several properties
>> about boarders, numbers of boarders and you could imagine more (such
>> as the boarding component of fees). So on the "it changes what
>> properties you would look for" argument we made that a class. This is
>> even more subjective than for TypeOfEstablishment so I definitely
>> wouldn't die in a ditch over that one!
>
> Apologies if I'm missing the point, but...
>
> Thinking about this from a OO software point of view there's another
> approach that might fit this sort of thing. Instead of a School being a
> "type of" "BoardingEstablishment", or having a "BoardingEstablishment"
> property that's TRUE/FALSE, couldn't the School "have a" (link to a)
> "BoardingEstablishment" that in turn has properties about the number of
> boarders, different location/building/name, additional fees, etc.?

You do need to be careful mixing OO and ontologies but I see Steve has
already mentioned that.

> In other words, don't try to lump all these boarding properties and
> information into the School object itself, but link to another object
> that holds the boarding-specific information. In OO terms this would be
> a zero-to-many link, zero for a School with no boarding facilities. A
> University-type boarding school, that might have different "halls of
> residence" with different student numbers/locations/names/fees, could be
> handled as a School that has multiple BoardingEstablishments.
>
> "Inheritance versus composition" is the technical phrase for what I'm
> trying to say. The "best" solution will depend on the available data,
> both now and in the future...

Yes, definitely. That's why we separated out things like the Trust
associated with a trust school as a different entity.

I agree with your example, if you were trying to model a university you
would have separate notions of the University legal entity, the
University Sites and the Halls of Residence.

In the schools case then we don't really have the data to say much about
different sub-parts of the schools or really be sure they are different
parts. In particular, I had been assuming that "BoardingEstablishment"
was a matter of status of the School as a legal entity. I.e. you could
be approved for boarding independent of whether you have any actual
boarders right now. So I do think that is a classification of school
independent of whether in the long term you might be able to model the
"boarding unit" of the school as a separate entity, perhaps associated
with a separate set of buildings which in turn would have different
locations.

The nice thing about Linked data is that we don't have to do all this
ourselves. What the Edubase data gives us is a core set of URIs and
associated reference data for the Schools. Now people can hang more data
and more refined modelling off that backbone.

Supposing that someone somewhere does have information on the boarding
parts of boarding schools and could describe those as separate entities
with associated buildings, administrator etc. They could then publish
that as a dataset with links "boardingUnitOf" pointing back to the
Schools URIs. The web of data can grow to encompass these more refined
pictures without necessarily having to go back to some central authority.

This is one way that linked data/semantic web stuff differs from OO
practice. You can do external compositions.

Cheers,
Dave

Anthony Cartmell

unread,

Oct 7, 2009, 6:42:47 AM10/7/09

to uk-government-...@googlegroups.com

> You have to be a bit careful when thinking about OO practice, a lot of
> the OO use of the word "class" is at best misleading. E.g. it's common
> to see in OO tutorials something like starting with a square class, with
> one size value (e.g. x), and then adding rectangle as a subclass of
> square with an additional size value (y).
>
> From a mathematical point of view this is clearly incorrect (not all
> rectangles are squares), but OO doesn't typically enforce that, it's
> often not really a problem in applications, and a lot of practitioners
> don't fully understand the meaning of classes and taxonomies.

Yes, it's important to remember that OO analysis is all about the problem
domain, and not real life. In software a "rectangle" can quite
legitimately be a "type of" "square", even though it isn't in the "real
world" of mathematics. :)

> This is one way that linked data/semantic web stuff differs from OO
> practice. You can do external compositions.

Yes, and I like that: I'm a fan of composition over inheritance for most
things. Which I why I suggested links to describe boarding facilities at
schools, rather than using particular classes of school or school
properties. Although I can see that if there isn't much information about
boarding facilities then a class or property provides the information
simply as part of the school data, without needing a more complex query to
extract it.

John Goodwin

unread,

Oct 7, 2009, 9:37:00 AM10/7/09

to uk-government-...@googlegroups.com

Hi Steve,

> I agree that this is a nice way to work, but the resulting ontology can be a bit opaque if you're not used to description logic notation.

Sure. Protege has a number of ways to visualise the axioms and some people at OS (and the OWL community) have been working on CNLs for OWL ontologies. Rabbit is the one that Glen and other at OS started to develop. Sydney syntax is another one.

> Could be that current tools have better visualisers though, I've not tried for some time. My experience was that the output RDF was a bit hard for other people to interpret in RDF syntax too.

Personally I quite like the Manchester OWL syntax..once you get used to it. There's been a lot of debate about whether RDF is really an appropriate syntax for OWL.

John

John Goodwin

unread,

Oct 7, 2009, 9:52:14 AM10/7/09

to uk-government-...@googlegroups.com

Hi Dave,

>> For cases like this I prefer to follow Alan Rector's methodolgy of untangling hierarchies. That is instead of creating hierarchies that may potentially contain lots of multiple inheritence you explicitly model the properties of a class and let the reasoner worry about the hierarchy.

> Interesting. I've read a few of Alan's papers and I had always thought he was advocating simple class hierarchies (each concept has one parent, at each level concepts are distinct not exhaustive, stick > to elementary concepts) rather that doing away with classes altogether.

Yes indeed. That's what I meant, but typing at 6:30 from my bed was possible not the best time to discuss OWL ontologies :) But yes using single inheritence for atomic classes and then construct other more complex classes using the conceptual lego.

> What's more, from a linked data point of view, you would presumably publish the results of the class inference closure as part of the dataset. So a data browser would still see the "rdf:type RedWine" > assertions whether or not they were generated by inference or by direct assertion. Whereas Chris' discussion is partly about whether its useful to see them as classes at all.

Agreed (I'll use the 6:30am disclaimer again :)).

> I don't think we have any information on the buildings. I think all the information is about the nature of the legal entity. However, we are hampered by not yet understanding the semantics of this data - > are these categories fundamental (e.g. created by legislation) or simply descriptive?

I've always been descriptive in the definitions of classes, but it would be interesting to see if there is any legislation defined these categories. We have found in the past that concepts such as "multiple occupancy address" have different meanings at OS, royal mail and the valuation office - so guess there may not be any legislation. That, or we are free to ignore it.

>Yes, that's a nice way of doing it. I tried to keep our structure isomorphic to what we had with Edubase but perhaps should look at sketching something more general like this (and just preserving the >Edubase classification as a SKOS classification as, I think, Chris would prefer).

[snip]

I've not seen the Edubase data so hard to comment. I think what you've describe is often an issue when converting scruffy legacy data to neat OWL/RDF. There have been a few times where it has pained me to literally convert legacy data as I see the creation of the RDF as a chance to do things properly. Not that I'm saying Edubase wasn't done properly though... So I appreciate what you're saying.

> Publish OWL definitely, the compromise was to avoid OWL 2 constructs for now.

I see - sorry misunderstood. I'd probably say that OWL (never mind OWL 2) is not widely adopted in the linked data community yet.

>Oh I agree. The challenge is to make sure the data makes enough sense for people just coming at the data without studying the ontology but also to lower the barrier to people using the associated >ontology.

yes, and of course having lots of explicit detailed OWL axioms doesn't necessarily make anything easier to understand.

>I'd claim that thinking of everything as a property can lead to just as tangled a hierarchy, you just create the tangle through inference rather than manually :-) Perhaps this is a discussion to have over a >pint of beer sometime. Perhaps we need a uk-gov-data-developers MeetUp.

that sounds like a plan :) I think there is a balance between an over-engineered OWL ontology with complex axioms and just simple class hierarchies. Not quite sure what it is yet. Personally I've always found it easy to think about how I was to describe my domain (e.g. for buildings it might be form and function). Create hierarchies for the form and functions with disjoint siblings and then build the complex Building classes from these atomic classes.

> I think we have different ontologies for the concept "day off" :-)

:)

John

Steve Harris

unread,

Oct 7, 2009, 10:43:15 AM10/7/09

to uk-government-...@googlegroups.com

On 7 Oct 2009, at 14:37, John Goodwin wrote:

> Hi Steve,
>
> > I agree that this is a nice way to work, but the resulting
> ontology can be a bit opaque if you're not used to description logic
> notation.
>
> Sure. Protege has a number of ways to visualise the axioms and some
> people at OS (and the OWL community) have been working on CNLs for
> OWL ontologies. Rabbit is the one that Glen and other at OS started
> to develop. Sydney syntax is another one.

Great, I will give some newer tools a go.

> > Could be that current tools have better visualisers though, I've
> not tried for some time. My experience was that the output RDF was a
> bit hard for other people to interpret in RDF syntax too.
>
> Personally I quite like the Manchester OWL syntax..once you get used
> to it. There's been a lot of debate about whether RDF is really an
> appropriate syntax for OWL.

To me OWL's utility vanishes if you can no longer represent it in RDF.
In triples it can be relevant, even if you can't reason with it at
runtime, say you have a really large dataset, just for example :) If
there's some other syntax it can no longer be queried with SPARQL, so
can't be used to aid/inform querying.

That's a bit offtopic for this list though.

- Steve

Kingsley Idehen

unread,

Oct 7, 2009, 12:04:16 PM10/7/09

to uk-government-...@googlegroups.com

Steve Harris wrote:
>
> On 7 Oct 2009, at 14:37, John Goodwin wrote:
>
>> Hi Steve,
>>
>> > I agree that this is a nice way to work, but the resulting ontology
>> can be a bit opaque if you're not used to description logic notation.
>>
>> Sure. Protege has a number of ways to visualise the axioms and some
>> people at OS (and the OWL community) have been working on CNLs for
>> OWL ontologies. Rabbit is the one that Glen and other at OS started
>> to develop. Sydney syntax is another one.
>
> Great, I will give some newer tools a go.
>
>> > Could be that current tools have better visualisers though, I've
>> not tried for some time. My experience was that the output RDF was a
>> bit hard for other people to interpret in RDF syntax too.
>>
>> Personally I quite like the Manchester OWL syntax..once you get used
>> to it. There's been a lot of debate about whether RDF is really an
>> appropriate syntax for OWL.
>
> To me OWL's utility vanishes if you can no longer represent it in RDF.
> In triples it can be relevant, even if you can't reason with it at
> runtime, say you have a really large dataset, just for example :)

Steve,

Backward-chaining based reasoning (at runtime) across a very large data
sets is something Virtuoso has offered for a while re. sameAs,
subClassOf, equivalentClass, equivalentProperty, plus IFPs for a pretty
long time now [1].

Your statement above is really Quad Store specifc re. reasoning over
large data sets.

Links:

1. http://bit.ly/38Jlw4 -- Expanding disparate data about Michael
Jackson using a 7.5B+ live data set by reasoning using "sameAs" or
fuzzier foaf:name designated as an Inverse Functional Property based
local context rule

Re. the above, note the co-reference and indirect co-reference tabs,
also note how the HTTP URIs resolve to the same union of disparate data
about the subject in line with the precision that either approach
enables (so the fuzzier one is less accurate as demonstrated via some
obvious entries ).

Kingsley

> If there's some other syntax it can no longer be queried with SPARQL,
> so can't be used to aid/inform querying.
>
> That's a bit offtopic for this list though.
>
> - Steve
>

Steve Harris

unread,

Oct 7, 2009, 12:22:18 PM10/7/09

to uk-government-...@googlegroups.com

On 7 Oct 2009, at 17:04, Kingsley Idehen wrote:
>>
>> To me OWL's utility vanishes if you can no longer represent it in
>> RDF. In triples it can be relevant, even if you can't reason with
>> it at runtime, say you have a really large dataset, just for
>> example :)
> Steve,
>
> Backward-chaining based reasoning (at runtime) across a very large
> data sets is something Virtuoso has offered for a while re. sameAs,
> subClassOf, equivalentClass, equivalentProperty, plus IFPs for a
> pretty long time now [1].

It's been a few years since I worked on back-chaining systems, but
IIRC you didn't get to do efficient subsumption reasoning over large
datasets that way, which is what we were discussing. Could well be
that it's now commonplace for all I know.

But yes, you can do transitive stuff and IFPs pretty easily, and in
quad stores too.

- Steve

John Goodwin

unread,

Oct 7, 2009, 12:40:56 PM10/7/09

to uk-government-...@googlegroups.com

> To me OWL's utility vanishes if you can no longer represent it in RDF. In triples it can be relevant, even if you can't reason with it at runtime, say you have a really large dataset, just for example :) If there's some > other syntax it can no longer be queried with SPARQL, so can't be used to aid/inform querying.

Good point. I think a lot of the people that aren't worried about OWL being in RDF are the ones mainly interested in TBox reasoning.

As for large datasets - I know that two of the three profiles of OWL 2 are optimised for large datasets. Perhaps linked data apps should stick to those - when it comes to really large datasets. I think people are writing inference engines especially optimised for those languages (e.g. CEL, OWLGres etc.). No idea if they are forward or backward chaining though.

John

Steve Harris

unread,

Oct 7, 2009, 12:50:40 PM10/7/09

to uk-government-...@googlegroups.com

On 7 Oct 2009, at 17:40, John Goodwin wrote:
>
> > To me OWL's utility vanishes if you can no longer represent it in
> RDF. In triples it can be relevant, even if you can't reason with it
> at runtime, say you have a really large dataset, just for
> example :) If there's some > other syntax it can no longer be
> queried with SPARQL, so can't be used to aid/inform querying.
>
> Good point. I think a lot of the people that aren't worried about
> OWL being in RDF are the ones mainly interested in TBox reasoning.

Sure, and OWL still works fine as an ontology language if that
happens, but it may be harder to use it for Linked Data. The other
issue being inconsistencies. While it would be nice if all ontologies
and data were consistent w.r.t. each other, it seems a little
optimistic.

> As for large datasets - I know that two of the three profiles of OWL
> 2 are optimised for large datasets. Perhaps linked data apps should
> stick to those - when it comes to really large datasets. I think
> people are writing inference engines especially optimised for those
> languages (e.g. CEL, OWLGres etc.). No idea if they are forward or
> backward chaining though.

Yes, OWL2-RL seems particularly appropriate for that, I was looking at
the rec. the other day. I can't see us implementing it anytime soon
though, unless someone has a really strong usecase though. Some people
are working with our store and OWL, but I think OWL1.

Anyway, wandering really off topic now, I'll shut up :)

- Steve

Mischa Home

unread,

Oct 7, 2009, 1:02:35 PM10/7/09

to uk-government-...@googlegroups.com

Hello All,

I just came across this in the news, looks like NYC has exposed a
whole bunch of data about restaurants, public health records, public
parks, and so on. The dataset has been released alongside a $20,000
price for applications making use of the data.

http://www.nyc.gov/html/datamine/html/home/home.shtml

Regards,

Mischa

_________________________________
Mischa Tuffield
Email: mis...@mmt.me.uk
Homepage: http://mmt.me.uk/
FOAF: http://mmt.me.uk/foaf.rdf#mischa

Reply all

Reply to author

Forward