Expressing "part of" relationships between observations

119 views
Skip to first unread message

BillRoberts

unread,
Dec 27, 2011, 12:42:02 PM12/27/11
to Publishing Statistical Data
It often comes up in statistical data that you have some kind of
'overall' figure, which is then broken down into parts. To Supposing I
have a set of population observations, expressed with the Data Cube
vocabulary - something like (in pseudo-turtle)

ex:obs1
sdmx:refArea <UK>;
sdmx:refPeriod "2011";
ex:population "60" .

ex:obs2
sdmx:refArea <England>;
sdmx:refPeriod "2011";
ex:population "50" .

ex:obs3
sdmx:refArea <Scotland>;
sdmx:refPeriod "2011";
ex:population "5" .

ex:obs4
sdmx:refArea <Wales>;
sdmx:refPeriod "2011";
ex:population "3" .

ex:obs5
sdmx:refArea <NorthernIreland>;
sdmx:refPeriod "2011";
ex:population "2" .

What is the best way (in the context of the RDF/Data Cube/SDMX
approach) to express that the values for the England/Scotland/Wales/
Northern Ireland ought to add up to the value for the UK and
constitute a more detailed breakdown of the overall UK figure?

I might also have population figures for France, Germany, EU27,
etc...so it's not as simple as just taking a qb:Slice where you fix
the time period and the measure.

Suggestions welcome!

Bill


Jindřich Mynarz

unread,
Dec 27, 2011, 12:59:38 PM12/27/11
to publishing-st...@googlegroups.com
Hi Bill,

I would use hierarchical code lists to express this. Given that all
code lists in Data Cube are skos:ConceptSchemes, you can express
hierarchy with SKOS (e.g., skos:narrower, skos:broader). In your case,
you would have:

<UK> skos:narrower <England>, <Scotland>, <Wales>, <NorthernIreland> .

Best,

Jindrich

--
Jindrich Mynarz
@jindrichmynarz
<http://keg.vse.cz/resource/person/jindrich-mynarz>

Landong Zuo

unread,
Dec 30, 2011, 6:53:13 AM12/30/11
to Publishing Statistical Data
You might already have the hierarchy of geographical area if you
follow the Ordnance Survey URI.

Regards,

Landong

Keith Alexander

unread,
Dec 30, 2011, 7:53:22 AM12/30/11
to publishing-st...@googlegroups.com
The remaining question is how to tie the containment relationship of the geographic areas to the relationship between the measure values (the total population of the super area is, or should be, the sum of the population of its parts ).

Something like this?

<ukPopulation2009Observation> ex:calculatedFrom (<scotlandPopulation2009Observation> <englandPopulation2009Observation> <northernIrelandPopulation2009Observation> <walesPopulation2009Observation>) .

ex:calculatedFrom ex:observationMeasure ex:population .

?

Is there a more idiomatic datacube way of expressing that?


2011/12/30 Landong Zuo <lando...@googlemail.com>

Jindřich Mynarz

unread,
Dec 30, 2011, 8:02:55 AM12/30/11
to publishing-st...@googlegroups.com
Hi Keith,

the simple answer is to use broader skos:Concepts (from the code lists
coding the Data Cube dimensions) for the aggregated observations.
I.e., for your and Bill's combined example, it would be something
like:

<ukPopulation2009Observation> ex:geoArea <UK> .

where ...

<UK> skos:narrower <England>, <Scotland>, <Wales>, <NorthernIreland> .

In this way, the measures are comparable through the code list
(skos:ConceptScheme).

Best,

Jindrich

2011/12/30 Keith Alexander <k.j.w.a...@gmail.com>:

BillRoberts

unread,
Dec 30, 2011, 8:51:36 AM12/30/11
to Publishing Statistical Data
Thanks everyone for your suggestions.

While the use of skos:narrower seems very sensible, it is in itself a
strong enough statement of what I want to say. Keith's suggestion
alongside the skos:narrower relationships might do the job.

Ideally I would like to assert the 'pie-chart-like' nature of the data
- that the subsets (England, Scotland, Wales, NI) do not intersect and
that the superset (UK) is the union of those non-intersecting subsets
(and hence the population of the UK should be the sum of populations
of the subsets)

Bill



On Dec 30, 1:02 pm, Jindřich Mynarz <mynarzjindr...@gmail.com> wrote:
> Hi Keith,
>
> the simple answer is to use broader skos:Concepts (from the code lists
> coding the Data Cube dimensions) for the aggregated observations.
> I.e., for your and Bill's combined example, it would be something
> like:
>
> <ukPopulation2009Observation> ex:geoArea <UK> .
>
> where ...
>
> <UK> skos:narrower <England>, <Scotland>, <Wales>, <NorthernIreland> .
>
> In this way, the measures are comparable through the code list
> (skos:ConceptScheme).
>
> Best,
>
> Jindrich
>
> 2011/12/30 Keith Alexander <k.j.w.alexan...@gmail.com>:
>
>
>
>
>
>
>
> > The remaining question is how to tie the containment relationship of the
> > geographic areas to the relationship between the measure values (the total
> > population of the super area is, or should be, the sum of the population of
> > its parts ).
>
> > Something like this?
>
> > <ukPopulation2009Observation> ex:calculatedFrom
> > (<scotlandPopulation2009Observation> <englandPopulation2009Observation>
> > <northernIrelandPopulation2009Observation> <walesPopulation2009Observation>)
> > .
>
> > ex:calculatedFrom ex:observationMeasure ex:population .
>
> > ?
>
> > Is there a more idiomatic datacube way of expressing that?
>
> > 2011/12/30 Landong Zuo <landong....@googlemail.com>

Keith Alexander

unread,
Dec 30, 2011, 8:53:02 AM12/30/11
to publishing-st...@googlegroups.com
Hi Jindřich,

Yes, I read your suggestion the first time, but it doesn't go all the way to explicitly express "that the values for the England/Scotland/Wales/Northern Ireland ought to add up to the value for the UK".

For one thing, skos:narrower doesn't guarantee that there will be no overlaps between the geographies, for another, it is not always true that the value of a measure for UK will be the total of values for that measure for Scotland England, Northern Ireland and Wales - the measure might be an average or a ratio.

I guess also part of what could be expressed is whether some observations (or sets of observations) are derived from others, or if the figures are arrived at independently.

Cheers

Keith

2011/12/30 Jindřich Mynarz <mynarzj...@gmail.com>

Franck

unread,
Dec 30, 2011, 9:22:56 AM12/30/11
to Publishing Statistical Data
Hi Bill

Please have a look at the SKOS extensions that some of us are
currently working on (http://www.w3.org/2011/gld/wiki/
ISO_Extensions_to_SKOS). It is intended to add some features to SKOS
in order to be able to represent statistical classifications, and also
some semantic relations between concepts explained in ISO 1087.

In particular, we define partitive relations, and introduce properties
to indicate that a concept scheme covers exhaustively and/or mutually
exclusively a given fied.

This is work in progress (see there https://github.com/FranckCo/SKOSExt),
but we would be glad to know if it suits your needs.

Franck

Landong Zuo

unread,
Dec 30, 2011, 9:34:25 AM12/30/11
to Publishing Statistical Data
Hi Keith and Jinrich,

I feel Skos:narrower might be vague for this kind of relationship.
What you need is a vocabulary to specify the mapping between
geographic containment to the statistical aggregration.

Firstly, this mapping may or may not be deterministic. It is
deterministic only if the original dataset said so, otherwise the
meaning of statictical measurement varies from different context, for
example how the data was collected, who did the survey, what survey
method has be implemented, e.g. I am not sure if the simple
aggregation over containment area value would always tell the
meaningful story. (Some dataset specifies explicitly that measurement
are not supposed to be re-calculated. )

Secondly, I don't know if datacube or SDMX-RDF would allow to specify
this aggregation relationship. To keep the precision of
semantics(mathmatically not statistically ), would you be interested
to look embeding rule-language or SPARQL aggregation syntax into the
datacube? I don't know if there is a standard way to handle it, but it
is worth to try. Hope this is helpful.

Regards,

Landong









On Dec 30, 1:02 pm, Jindřich Mynarz <mynarzjindr...@gmail.com> wrote:
> Hi Keith,
>
> the simple answer is to use broader skos:Concepts (from the code lists
> coding the Data Cube dimensions) for the aggregated observations.
> I.e., for your and Bill's combined example, it would be something
> like:
>
> <ukPopulation2009Observation> ex:geoArea <UK> .
>
> where ...
>
> <UK> skos:narrower <England>, <Scotland>, <Wales>, <NorthernIreland> .
>
> In this way, the measures are comparable through the code list
> (skos:ConceptScheme).
>
> Best,
>
> Jindrich
>
> 2011/12/30 Keith Alexander <k.j.w.alexan...@gmail.com>:
>
>
>
>
>
>
>
> > The remaining question is how to tie the containment relationship of the
> > geographic areas to the relationship between the measure values (the total
> > population of the super area is, or should be, the sum of the population of
> > its parts ).
>
> > Something like this?
>
> > <ukPopulation2009Observation> ex:calculatedFrom
> > (<scotlandPopulation2009Observation> <englandPopulation2009Observation>
> > <northernIrelandPopulation2009Observation> <walesPopulation2009Observation>)
> > .
>
> > ex:calculatedFrom ex:observationMeasure ex:population .
>
> > ?
>
> > Is there a more idiomatic datacube way of expressing that?
>
> > 2011/12/30 Landong Zuo <landong....@googlemail.com>

Dave Reynolds

unread,
Dec 31, 2011, 8:23:20 AM12/31/11
to publishing-st...@googlegroups.com
The general area of expressing aggregation relationships in Data Cube is
something I put on the proposed work programme for the next phase when
last discussing it with Richard. Hopefully we can get this done under
the W3C GLD task group [1].

I think a specific extension vocabulary is need for this. There are
several requirements:

(1) Need to be able to specify a relationship to use for hierarchical
dimensions other than the skos relationships. In particular we
frequently want to use geospatial data where is already containment
relation that should be reused. [2]

(2) Need to be able to specify when such a relationship gives a disjoint
cover so that aggregation would be meaningful.

(3) There may also be a need for defining a specific type of slice to
allow aggregations be asserted about data which isn't strictly hierarchical.

(4) We need a general way to express relationships between measures.
E.g. is it common to have a measure expressed as a count then have a
separate measure to indicate % of that measure against some denominator.
It's possible that the same machinery could/should be used to express
aggregations.

Dave

[1] Though since it is currently unfunded "spare time" effort the
timescales aren't guaranteed :)

[2] Yes, I know that geospatial areas *can* be treated as SKOS concepts
but the the skos:narrower/broader relations aren't really appropriate
and in any case the requirement is to reuse the existing relationships
which have been specified and asserted by third parties.

A. Gregory

unread,
Dec 31, 2011, 9:35:45 AM12/31/11
to publishing-st...@googlegroups.com
Dave:

This is a subject which has also been much-discussed within the SDMX
community as a whole. There has been (to my knowledge) at least one
implementation of a system designed to capture these relationships, and I
think more.

What has been done in the past is to attach an attribute to a slice (or
entire data set) containing an equation expressing the relationship between
the various parts.

While the SDMX information model contains a model for how statistical data
is processed, this was never given a standard notation of any sort. I know
that at least one organization has developed a "syntax-neutral" way of
expressing the processing being performed, based on the SDMX model -
essentially a programming language with the capabilities of addressing
specific observations as variables within the equation.

This will not probably help with your geography example - SDMX has never had
a concept of geographical hierarchies, although this is included in the DDI
model, based on the ISO 19115 (etc.) family of standards. I would comment
that we want to have as exact a notation of specific aggregation
relationships as possible - we are dealing with statistics, and they tend to
have quite precise and well-known relationships.

I guess my main concern is that whatever Data Cube does in this area is
aligned with how SDMX is used more broadly, so that it is always possible to
produce valid Data Cube RDF from any SDMX data set. If it would be useful,
as Data Cube moves forward through W3C, I can put you in touch with the guy
who is heading up the SDMX Technical Working Group, so that input could be
more formally collected from within the SDMX community on this subject.

Cheers - and happy New Year!

Arofan

Dave Reynolds

unread,
Dec 31, 2011, 10:13:46 AM12/31/11
to publishing-st...@googlegroups.com
Hi Arofan,

All good input, thanks.

The notion of a complete embedded programming language would be beyond
what I had in mind - *definitely* wouldn't want to create yet another
rule language in W3C!

I agree we want to keep Data Cube as aligned with SDMX as we can. Though
I do want to make sure this doesn't turn into too ambitious an exercise
otherwise it won't get done.

Cheers,
Dave

BillRoberts

unread,
Dec 31, 2011, 10:49:34 AM12/31/11
to Publishing Statistical Data
Hi Dave

Good to hear this issue is already on the W3C GLD agenda. And that
essentially answers my question: currently there is not an established
way to represent this kind of information in the Data Cube context,
but it is recognised that it would be useful.

If I can assist with that process let me know - whether through use
cases, review, trying candidate solutions out in practice or whatever.

Best regards

Bill
Reply all
Reply to author
Forward
0 new messages