qb:DimensionProperty subClassOf qb:CodedProperty ?

BillRoberts

unread,

Sep 22, 2011, 11:18:14 AM9/22/11

to Publishing Statistical Data

Hi there

In the RDF Data Cube ontology, we have:

qb:DimensionProperty rdfs:subClassOf qb:CodedProperty;

Why is that? It seems wrong to me. If I understand correctly, coded
properties are those properties whose values can be described with a
code list. But there are many dimension properties with values that
are not coded or codelist-able.

Thanks for any clarification

Bill

Dave Reynolds

unread,

Sep 22, 2011, 12:48:32 PM9/22/11

to publishing-st...@googlegroups.com

Hi Bill,

This is intended to mirror the SDMX Information Model, which has a
notion of CodedArtefact and requires all Dimensions to be
CodedArtefacts.

However, this is not really restrictive.

CodedArtefact and thus qb:CodedProperty just means that the values are
from some well defined set. It doesn't *require* those to be represented
as a SKOS code list. They can equally well be some other class of
resources (e.g. http://reference.data.gov.uk intervals) or indeed
literal values.

The qb:codeList property allows us to point to skos:ConceptSchemes but
there's no cardinality restriction requiring all qb:CodedProperty
instances to have an associated qb:codeList. It's optional. The value
set of a qb:CodedProperty is more commonly given via rdfs:range so you
can use any suitable rdfs:Class, owl:Class or indeed rdfs:Datatype to
define the ... um ... defined set of values.

This could be made clearer in the documentation :)

Dave

Richard Cyganiak

unread,

Sep 22, 2011, 2:08:24 PM9/22/11

to publishing-st...@googlegroups.com

On 22 Sep 2011, at 16:18, BillRoberts wrote:
> But there are many dimension properties with values that are not coded or codelist-able.

With the exception of time, I don't think that's true.

Can you give an example of some other dimension whose values don't come from a controlled/managed set of terms that ought to be represented as a SKOS concept scheme or RDFS class?

Best,
Richard

BillRoberts

unread,

Sep 22, 2011, 2:33:24 PM9/22/11

to Publishing Statistical Data

I suppose position/location is the other common/obvious one apart from
time.

But you might also have say body weight as a dimension, while having
blood pressure as a measure say.

Or air temperature as a dimension and sales of ice cream as a measure.

There are lots!

I suppose since the values of such dimensions are typically measured
with some kind of measuring instrument of finite accuracy, you could
think of always putting such things into 'bins', but it seems to me
that such things as time, position etc are effectively continuous.

BillRoberts

unread,

Sep 23, 2011, 3:05:28 AM9/23/11

to Publishing Statistical Data

But I see your point Richard. Maybe I'm thinking too much like a
physicist instead of a statistician!

In practice most of these continuous variables are 'chunked': time
into years or months, space into a list of points or regions, age into
5 year bands etc etc

So it's making a bit more sense to me now.

On Sep 22, 7:08 pm, Richard Cyganiak <rich...@cyganiak.de> wrote:

Richard Cyganiak

unread,

Sep 23, 2011, 2:02:11 PM9/23/11

to publishing-st...@googlegroups.com

Hi Bill,

On 23 Sep 2011, at 08:05, BillRoberts wrote:
> But I see your point Richard. Maybe I'm thinking too much like a
> physicist instead of a statistician!
>
> In practice most of these continuous variables are 'chunked': time
> into years or months, space into a list of points or regions, age into
> 5 year bands etc etc

Exactly. Statistics tend to be aggregate data, where many individual “events” or “facts” (which often have continuous attributes) have been lumped together into a single observation. The values along a number of dimensions have been “classified” into discrete ranges, and everything that falls into the same bucket (cube cell) has been “tabulated” into a single total or average number, and we're interested only in these totals.

This aggregation can remove a lot of valuable detail, but also makes it easier to ask higher-level questions (especially for dimensions where the classification is hierarchical), and may make the datasets smaller and may anonymize the data to some extent.

If you have some values that you *truly* want to model as continuous, then you should ask yourself if you aren't really looking at a measure rather than a dimension.

Best,
Richard

Benedikt Kämpgen

unread,

Oct 19, 2011, 5:01:47 AM10/19/11

to publishing-st...@googlegroups.com

Hi,

I have a follow-up question regarding dimensions and code lists:

In QB, a dimension value used by an observation typically is an instance of
skos:Concept from a skos:ConceptScheme. I have seen some examples of
datasets [1,2], that then link from such instances of skos:Concept with
owl:sameAs to entities they represent, e.g.
<http://dbpedia.org/resource/Spain>. I guess this is fine from a practical
point of view, but is it not semantically incorrect; I am wondering whether
this is really intended and will lead to problems, later, e.g., if we want
to define hierarchies on dimension values.

Regards,

Benedikt

[1] <http://estatwrap.ontologycentral.com/data/tsieb010>
[2] <http://estatwrap.ontologycentral.com/dic/geo#ES>

--
AIFB, Karlsruhe Institute of Technology (KIT)
Phone: +49 721 608-47946
Email: benedikt...@kit.edu
Web: http://www.aifb.kit.edu/web/Hauptseite/en

Dave Reynolds

unread,

Oct 19, 2011, 5:16:00 AM10/19/11

to publishing-st...@googlegroups.com

On Wed, 2011-10-19 at 11:01 +0200, Benedikt Kämpgen wrote:
> Hi,
>
> I have a follow-up question regarding dimensions and code lists:
>
> In QB, a dimension value used by an observation typically is an instance of
> skos:Concept from a skos:ConceptScheme.

Not required, can also be instances of some defined [rdfs|owl]:Class

> I have seen some examples of
> datasets [1,2], that then link from such instances of skos:Concept with
> owl:sameAs to entities they represent, e.g.
> <http://dbpedia.org/resource/Spain>. I guess this is fine from a practical
> point of view, but is it not semantically incorrect;

Indeed, not correct.

> I am wondering whether
> this is really intended and will lead to problems, later,

In the datasets we've published we've tended to use "normal" resources
directly for things like geographies and time periods and only use
skos:Concepts for things that are definitely classification schemes -
e.g. gender or age groups.

> e.g., if we want
> to define hierarchies on dimension values.

This is one area where I think the current QB vocabulary could do with
some extension. It would be nice to be able to define the property that
is used for hierarchical relationships between dimensions values when
those are not skos:Concepts (and thus skos:broader/narrower).

Dave

--
Epimorphics Ltd www.epimorphics.com
Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Tel: 01275 399069 Mobile: 07906 628814

Epimorphics Ltd. is a limited company registered in England (number
7016688)
Registered address: Court Lodge, 105 High Street, Portishead, Bristol
BS20 6PT, UK

Benedikt Kämpgen

unread,

Oct 28, 2011, 6:39:13 AM10/28/11

to publishing-st...@googlegroups.com, Dominik Siegele

Hi Dave,

Thanks for your answer.

> This is one area where I think the current QB vocabulary could do with
> some extension. It would be nice to be able to define the property that
> is used for hierarchical relationships between dimensions values when
> those are not skos:Concepts (and thus skos:broader/narrower).

Dito.

For example, we have now tried to model it for Eurostat correctly, not using skos:ConceptScheme, but the actual regions from nuts:NUTSRegion, see [1] and definition of geo dimension.

However, with this approach we cannot say anymore, that only certain region are used in the dataset.

Best,

Benedikt

[1] http://estatwrap.ontologycentral.com/dsd/tsieb010

Benedikt Kämpgen

unread,

Dec 6, 2011, 3:37:49 PM12/6/11

to publishing-st...@googlegroups.com, Dominik Siegele

Dear all,

I would like to ask for your opinions on the following approach to use qb:CodeList together with non-information URIs and Literal values (which is related to the problem described so far in this thread):

Given the task to represent date as Date Literal, geo as specific instances of NUTSRegion, and sex as instances of skos:Concept for the male/female/total. We have this task e.g. at [1] where we are representing Eurostat [2] data using the RDF Data Cube Vocabulary (QB).

The approach that we now consider to implement:
*Optional: rdfs:range for DimensionProperty in order to have an understanding of what kinds of things are represented by the members, e.g., xsd:date for dc:date and NUTSRegion for geo.
*qb:codeList for DimensionProperty in order to list the possible skos:Concepts that represent values of the dimension, e.g., estat:y2003 for one specific year, estat:AT for one specific country, and estat:F for one specific gender
*skos:Concepts have as rdfs:seeAlso instances linked that they represent, e.g., estat:AT rdfs:seeAlso dbpedia:Austria and as rdfs:label Literal values linked that they represent, e.g., estat:y2003 rdfs:label "2003"^^xsd:date
* The observations can either use the represented instances directly, e.g., dbpedia:Austria and "2003"^^xsd:date, or they can use the skos:Concept representations, e.g., estat:F

This approach brings the following advantages:
* We can limit the number of literal values of a specific dimension
* We can have relationships between dimension values, e.g., for hierarchies, and still use the literal values or the non-information URIs in the observations
* Publishers may still represent skos:Concepts as possible dimension values and can link them using owl:sameAs to the actual represented values. Although this may be wrong, as it would state the term (e.g., skos:Concept Germany) and the actual thing (dbpedia:Germany) as being the same thing, applications that would work with the explained approach would also work here.

I would be glad to hear your opinions on this.

Regards,

Benedikt

[1] <http://estatwrap.ontologycentral.com/page/teilm020>
[2] <http://estatwrap.ontologycentral.com/>

Richard Cyganiak

unread,

Dec 6, 2011, 5:05:41 PM12/6/11

to publishing-st...@googlegroups.com, Dominik Siegele

Hi Benedikt,

On 6 Dec 2011, at 20:37, Benedikt Kämpgen wrote:
> Given the task to represent date as Date Literal, geo as specific instances of NUTSRegion, and sex as instances of skos:Concept for the male/female/total. We have this task e.g. at [1] where we are representing Eurostat [2] data using the RDF Data Cube Vocabulary (QB).
>
> The approach that we now consider to implement:
> *Optional: rdfs:range for DimensionProperty in order to have an understanding of what kinds of things are represented by the members, e.g., xsd:date for dc:date and NUTSRegion for geo.

That makes sense. I would always specify this when no qb:codeList is present.

> *qb:codeList for DimensionProperty in order to list the possible skos:Concepts that represent values of the dimension, e.g., estat:y2003 for one specific year, estat:AT for one specific country, and estat:F for one specific gender

I would use qb:codeList only with skos:ConceptSchemes. It looks like your intention is to create concept schemes for all dimensions, including time. I think that's ok.

> *skos:Concepts have as rdfs:seeAlso instances linked that they represent, e.g., estat:AT rdfs:seeAlso dbpedia:Austria

I would use skos:closeMatch (or skos:exactMatch if you're a radical; or skos:relatedMatch if you're a coward) instead of rdfs:seeAlso.

This has the consequence of typing dbpedia:Austria as a skos:Concept, but that surely is fine, given the definition of skos:Concept:

[[
A SKOS concept can be viewed as an idea or notion; a unit of thought. However, what constitutes a unit of thought is subjective, and this definition is meant to be suggestive, rather than restrictive.
]]

Some might say: “A country is not an idea! It exists in the real world!” But I don't find that such arguments hold water. Countries are created and abolished through legislation and treaties; and decades can pass where large parts of mankind disagree on the question whether a particular entity is a country or not. Countries are really just the taxonomist's business objects of political geographers.

> and as rdfs:label Literal values linked that they represent, e.g., estat:y2003 rdfs:label "2003"^^xsd:date

Use skos:notation instead of rdfs:label. Note that "2003"^^xsd:date is ill-typed. It has to be "2003-01-01"^^xsd:date, or "2003"^^xsd:gYear.

> * The observations can either use the represented instances directly, e.g., dbpedia:Austria and "2003"^^xsd:date, or they can use the skos:Concept representations, e.g., estat:F

I agree that this makes sense in the case of literals (dates in particular). For URIs, it seems overly complicated. Why not just define a concept scheme that directly includes dbpedia:Austria as a concept using skos:inScheme?

> This approach brings the following advantages:
> * We can limit the number of literal values of a specific dimension

Right, and I like this. The logic would be: If a dimension property has a qb:codeList and is used with literal values, then assume that the literal values are the skos:notations of the concepts in the code list.

> * We can have relationships between dimension values, e.g., for hierarchies, and still use the literal values or the non-information URIs in the observations

Yup.

> * Publishers may still represent skos:Concepts as possible dimension values and can link them using owl:sameAs to the actual represented values.

Do not *EVER* link to a skos:Concept using owl:sameAs! ;-)

Seriously, skos:xxxMatch is always better for that purpose.

Best,
Richard

Dave Reynolds

unread,

Dec 6, 2011, 5:52:44 PM12/6/11

to publishing-st...@googlegroups.com, Dominik Siegele

Hi Benedikt,

I generally agree with your approach and with Richard's comments.

The notion of defining skos:Concepts but then using the literal values
in the data is a little odd but I can see some point to it.

The one thing I would point out is that for dates the Interval URI Set
[1] and associated service may be useful to you. We've tended to use
that for all the Data Cube sets that we've published. One advantage to
using the resources as the dimension values instead of date literals is
that it makes to possible to query the data via other properties of
those resources. For example with data at a day resolution we can
include the Interval Set properties in the published data and so pick
out values for a month or year or government year without having to do
time calculations in the sparql. If your data is only at calendar year
resolution that may be less relevant to you.

Cheers,
Dave

[1]
http://www.epimorphics.com/web/wiki/using-interval-set-uris-statistical-data

Richard Cyganiak

unread,

Dec 6, 2011, 7:29:44 PM12/6/11

to publishing-st...@googlegroups.com, Dominik Siegele

Hi Dave,

On 6 Dec 2011, at 22:52, Dave Reynolds wrote:
> The one thing I would point out is that for dates the Interval URI Set
> [1] and associated service may be useful to you. We've tended to use
> that for all the Data Cube sets that we've published.

I was wondering about this practice.

One concern I have here is the following: I'd like to be able to query using date literals, even if using the interval URIs in observations. To make that possible, I'd have to load these date literals into my SPARQL store. Essentially I'd need RDF descriptions of the interval resources that occur in my data. (I know that the interval URIs are resolvable, and each has an associated RDF description. But that doesn't really help me – I don't want to crawl them all to load them into my store.)

Do you have any advice how to deal with this?

Are there downloads of (parts of) the interval data?

Best,
Richard

BillRoberts

unread,

Dec 7, 2011, 3:26:53 AM12/7/11

to Publishing Statistical Data

I've had the same experience/difficulties as Richard with use of the
interval URIs and so I'm also interested to hear suggestions on good
ways to tackle this.

Bill

On Dec 7, 12:29 am, Richard Cyganiak <rich...@cyganiak.de> wrote:
> Hi Dave,
>
> On 6 Dec 2011, at 22:52, Dave Reynolds wrote:
>
> > The one thing I would point out is that for dates the Interval URI Set
> > [1] and associated service may be useful to you. We've tended to use
> > that for all the Data Cube sets that we've published.
>
> I was wondering about this practice.
>
> One concern I have here is the following: I'd like to be able to query using date literals, even if using the interval URIs in observations. To make that possible, I'd have to load these date literals into my SPARQL store. Essentially I'd need RDF descriptions of the interval resources that occur in my data. (I know that the interval URIs are resolvable, and each has an associated RDF description. But that doesn't really help me – I don't want to crawl them all to load them into my store.)
>
> Do you have any advice how to deal with this?
>
> Are there downloads of (parts of) the interval data?
>
> Best,
> Richard
>
>
>
>
>
>
>
> > One advantage to
> > using the resources as the dimension values instead of date literals is
> > that it makes to possible to query the data via other properties of
> > those resources. For example with data at a day resolution we can
> > include the Interval Set properties in the published data and so pick
> > out values for a month or year or government year without having to do
> > time calculations in the sparql. If your data is only at calendar year
> > resolution that may be less relevant to you.
>
> > Cheers,
> > Dave
>
> > [1]

> >http://www.epimorphics.com/web/wiki/using-interval-set-uris-statistic...

> >>> Email: benedikt.kaemp...@kit.edu

> >>>> Email: benedikt.kaemp...@kit.edu

> >>>>>> Email: benedikt.kaemp...@kit.edu
> >>>>>> Web:
>
> ...
>
> read more »

Dave Reynolds

unread,

Dec 7, 2011, 3:38:20 AM12/7/11

to publishing-st...@googlegroups.com, Dominik Siegele

Hi Richard,

On Wed, 2011-12-07 at 00:29 +0000, Richard Cyganiak wrote:
> Hi Dave,
>
> On 6 Dec 2011, at 22:52, Dave Reynolds wrote:
> > The one thing I would point out is that for dates the Interval URI Set
> > [1] and associated service may be useful to you. We've tended to use
> > that for all the Data Cube sets that we've published.
>
> I was wondering about this practice.
>
> One concern I have here is the following: I'd like to be able to query using date literals, even if using the interval URIs in observations. To make that possible, I'd have to load these date literals into my SPARQL store. Essentially I'd need RDF descriptions of the interval resources that occur in my data. (I know that the interval URIs are resolvable, and each has an associated RDF description. But that doesn't really help me – I don't want to crawl them all to load them into my store.)
>
> Do you have any advice how to deal with this?

In the sets we have published we include some core properties of the
time resources, including the dateTime literals, as part of the dataset
to make it possible to query that way without having to go live to the
reference time service. For us the space cost of this has been
manageable and it has been a reasonable trade-off.

If you have to reference government years or other intervals than
calendar gYear/gMonths then some approach like this seems particularly
worthwhile.

For pure calendar aligned standard intervals then it obviously buys you
less other than uniformity with the non-neatly aligned cases.

> Are there downloads of (parts of) the interval data?

No its all dynamically generated.

Cheers,
Dave

Benedikt Kämpgen

unread,

Dec 12, 2011, 4:15:04 PM12/12/11

to publishing-st...@googlegroups.com, Dominik Siegele

Hi Dave,

Thanks a lot for your answer.

We were thinking of a similar approach using the OWL Time Ontology [1], so your hint and web service comes in quite handy. I see that you even have reused OWL Time a little.

Best,

Benedikt

PS: Some URIs do not resolve, e.g., interval:monthOfYear <http://reference.data.gov.uk/def/intervals/June>. I am not sure whether this is done on purpose.

[1] <http://www.w3.org/TR/owl-time/>

--
AIFB, Karlsruhe Institute of Technology (KIT)
Phone: +49 721 608-47946
Email: benedikt...@kit.edu
Web: http://www.aifb.kit.edu/web/Hauptseite/en

> -----Original Message-----
> From: publishing-st...@googlegroups.com [mailto:publishing-
> statisti...@googlegroups.com] On Behalf Of Dave Reynolds
> Sent: Wednesday, December 07, 2011 8:38 AM
> To: publishing-st...@googlegroups.com
> Cc: Dominik Siegele
> Subject: Re: [publishing-statistical-data] Re: qb:DimensionProperty
> subClassOf qb:CodedProperty ?
>

Dave Reynolds

unread,

Dec 12, 2011, 4:39:32 PM12/12/11

to publishing-st...@googlegroups.com, Dominik Siegele

On Mon, 2011-12-12 at 22:15 +0100, Benedikt Kämpgen wrote:
> Hi Dave,
>
> Thanks a lot for your answer.
>
> We were thinking of a similar approach using the OWL Time Ontology [1], so your hint and web service comes in quite handy. I see that you even have reused OWL Time a little.

Indeed.

BTW all credit for design and implementation of the service is down to
Stuart Williams.

> PS: Some URIs do not resolve, e.g., interval:monthOfYear <http://reference.data.gov.uk/def/intervals/June>. I am not sure whether this is done on purpose.

No. I thought they used to resolve. Seem to get a redirect to an
internal server possible somewhere in the depths of the organization
doing the hosting. Will flag up the problem.

Dave

Benedikt Kämpgen

unread,

Dec 12, 2011, 4:45:46 PM12/12/11

to publishing-st...@googlegroups.com, Dominik Siegele

Hi Richard,

Thanks a lot for the elaborate answer, I especially liked:

> country or not. Countries are really just the taxonomist's business
objects of
> political geographers."

Though, this might not hold for concrete things, e.g., if you refer to a
human being. Are you saying, one should avoid using concrete objects for
classification? We are thinking that it sometimes may be useful to represent
an concrete object in the data using different skos:Concepts, e.g., for
defining separate hierarchies.

Best,

Benedikt

--
AIFB, Karlsruhe Institute of Technology (KIT)
Phone: +49 721 608-47946
Email: benedikt...@kit.edu
Web: http://www.aifb.kit.edu/web/Hauptseite/en

> -----Original Message-----
> From: publishing-st...@googlegroups.com [mailto:publishing-
> statisti...@googlegroups.com] On Behalf Of Richard Cyganiak
> Sent: Tuesday, December 06, 2011 10:06 PM
> To: publishing-st...@googlegroups.com
> Cc: Dominik Siegele
> Subject: Re: [publishing-statistical-data] Re: qb:DimensionProperty
> subClassOf qb:CodedProperty ?
>

Stuart Williams

unread,

Dec 13, 2011, 5:26:56 AM12/13/11

to Bill Roberts, Richard Cyganiak, publishing-st...@googlegroups.com

On 07/12/2011 08:26, BillRoberts wrote:
> I've had the same experience/difficulties as Richard with use of the
> interval URIs and so I'm also interested to hear suggestions on good
> ways to tackle this.
>
> Bill

I think this is a more general problem for any reference data (eg. OS linked
data admin geo; ONS admin areas; companies house info...). You have to make a
choice whether you are going to pull it all together into one place (triple
soup) so that you can make more useful queries over the combination - or pull
smaller amounts toward your computation as needed. The degenerate case becomes
everything of interest in a one large (logical) store so that you can query it -
the interval data stresses that a little because there's quite a lot of it. But
I think we need patterns for using reference datasets in place rather than
*having* to proliferate copies (or worse, aliases) that fail to be kept up to
date. I think it essential that in the long run we have good patterns for
retrieval and computation over multiple distributed 'little' graph - otherwise
what is the 'link' in linked data!

Stuart
--

>
> On Dec 7, 12:29 am, Richard Cyganiak<rich...@cyganiak.de> wrote:
>> Hi Dave,
>>
>> On 6 Dec 2011, at 22:52, Dave Reynolds wrote:
>>
>>> The one thing I would point out is that for dates the Interval URI Set
>>> [1] and associated service may be useful to you. We've tended to use
>>> that for all the Data Cube sets that we've published.
>> I was wondering about this practice.
>>

>> One concern I have here is the following: I'd like to be able to query using date literals, even if using the interval URIs in observations. To make that possible, I'd have to load these date literals into my SPARQL store. Essentially I'd need RDF descriptions of the interval resources that occur in my data. (I know that the interval URIs are resolvable, and each has an associated RDF description. But that doesn't really help me ï¿½ I don't want to crawl them all to load them into my store.)

>>
>> Do you have any advice how to deal with this?
>>
>> Are there downloads of (parts of) the interval data?
>>
>> Best,
>> Richard
>>
>>
>>
>>
>>
>>
>>
>>> One advantage to
>>> using the resources as the dimension values instead of date literals is
>>> that it makes to possible to query the data via other properties of
>>> those resources. For example with data at a day resolution we can
>>> include the Interval Set properties in the published data and so pick
>>> out values for a month or year or government year without having to do
>>> time calculations in the sparql. If your data is only at calendar year
>>> resolution that may be less relevant to you.
>>> Cheers,
>>> Dave
>>> [1]
>>> http://www.epimorphics.com/web/wiki/using-interval-set-uris-statistic...
>>> On Tue, 2011-12-06 at 22:05 +0000, Richard Cyganiak wrote:
>>>> Hi Benedikt,

>>>> On 6 Dec 2011, at 20:37, Benedikt Kï¿½mpgen wrote:
>>>>> Given the task to represent date as Date Literal, geo as specific instances of NUTSRegion, and sex as instances of skos:Concept for the male/female/total. We have this task e.g. at [1] where we are representing Eurostat [2] data using the RDF Data Cube Vocabulary (QB).
>>>>> The approach that we now consider to implement:
>>>>> *Optional: rdfs:range for DimensionProperty in order to have an understanding of what kinds of things are represented by the members, e.g., xsd:date for dc:date and NUTSRegion for geo.
>>>> That makes sense. I would always specify this when no qb:codeList is present.
>>>>> *qb:codeList for DimensionProperty in order to list the possible skos:Concepts that represent values of the dimension, e.g., estat:y2003 for one specific year, estat:AT for one specific country, and estat:F for one specific gender
>>>> I would use qb:codeList only with skos:ConceptSchemes. It looks like your intention is to create concept schemes for all dimensions, including time. I think that's ok.
>>>>> *skos:Concepts have as rdfs:seeAlso instances linked that they represent, e.g., estat:AT rdfs:seeAlso dbpedia:Austria
>>>> I would use skos:closeMatch (or skos:exactMatch if you're a radical; or skos:relatedMatch if you're a coward) instead of rdfs:seeAlso.
>>>> This has the consequence of typing dbpedia:Austria as a skos:Concept, but that surely is fine, given the definition of skos:Concept:
>>>> [[
>>>> A SKOS concept can be viewed as an idea or notion; a unit of thought. However, what constitutes a unit of thought is subjective, and this definition is meant to be suggestive, rather than restrictive.
>>>> ]]

>>>> Some might say: ï¿½A country is not an idea! It exists in the real world!ï¿½ But I don't find that such arguments hold water. Countries are created and abolished through legislation and treaties; and decades can pass where large parts of mankind disagree on the question whether a particular entity is a country or not. Countries are really just the taxonomist's business objects of political geographers.

>> read more ï¿½

--
Epimorphics Ltd www.epimorphics.com
Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Tel: 01275 399069

Epimorphics Ltd. is a limited company registered in England (number 7016688)

Richard Cyganiak

unread,

Dec 13, 2011, 7:54:08 AM12/13/11

to publishing-st...@googlegroups.com, Dominik Siegele

On 12 Dec 2011, at 21:45, Benedikt Kämpgen wrote:
> Thanks a lot for the elaborate answer, I especially liked:
>
>> country or not. Countries are really just the taxonomist's business
> objects of
>> political geographers."
>
> Though, this might not hold for concrete things, e.g., if you refer to a
> human being. Are you saying, one should avoid using concrete objects for
> classification?

I'm saying that you should use skos:Concepts for classification. But I'm also saying that I don't believe that skos:Concept needs to be treated as disjoint from any other class, such as ex:Country or, for that matter, foaf:Person.

> We are thinking that it sometimes may be useful to represent
> an concrete object in the data using different skos:Concepts, e.g., for
> defining separate hierarchies.

Sure. Re-use if the model fits your purposes; define your own and map to the existing entities if it doesn't.

Best,
Richard

Richard Cyganiak

unread,

Dec 13, 2011, 8:01:56 AM12/13/11

to publishing-st...@googlegroups.com, Bill Roberts

On 13 Dec 2011, at 10:26, Stuart Williams wrote:
> I think this is a more general problem for any reference data (eg. OS linked data admin geo; ONS admin areas; companies house info...). You have to make a choice whether you are going to pull it all together into one place (triple soup) so that you can make more useful queries over the combination - or pull smaller amounts toward your computation as needed.

Ok, right.

> The degenerate case becomes everything of interest in a one large (logical) store so that you can query it - the interval data stresses that a little because there's quite a lot of it. But I think we need patterns for using reference datasets in place rather than *having* to proliferate copies (or worse, aliases) that fail to be kept up to date.

What you call “copies”, I call “caches”; and caching is a great way of solving performance problems.

> I think it essential that in the long run we have good patterns for retrieval and computation over multiple distributed 'little' graph - otherwise what is the 'link' in linked data!

The easy way to do that is to cache everything in one place. That's conceptually *much* simpler and makes computation *much* faster. So what's needed first of all is good patterns for keeping cached copies up-to-date.

Completely distributed querying with on-the-fly retrieval is the holy grail of RDF-based data integration, and it's an important area of work and of research, but I'm not holding my breath for it to become a practical everyday solution.

Best,
Richard

BillRoberts

unread,

Dec 13, 2011, 8:25:10 AM12/13/11

to Publishing Statistical Data

> Completely distributed querying with on-the-fly retrieval is the holy grail of RDF-based data integration, and it's an important area of work and of research, but I'm not holding my breath for it to become a practical everyday solution.
>
> Best,
> Richard

Yes effective cache management is really what this is all about.

And I suspect that achieving the 'holy grail' of fast federated
queries will actually come down to transparent/automatic caching
approaches so that it looks like the query is being run in a
distributed way, or we can treat it as such when it comes to
distributed management of the data.

In the short term, I'll be using xsd:date literals and/or copying or
recreating parts of the Interval URI set!

One idea that could maybe help as a short term workaround: would it be
possible to provide downloads of subsets of the intervals, eg all UK
financial years since 1980 or something, another one for calendar
years, another one for months etc?