How to interpret measures attached to slices?

34 views
Skip to first unread message

Jindřich Mynarz

unread,
Oct 1, 2015, 4:54:15 AM10/1/15
to publishing-st...@googlegroups.com
Hi,

I'm wondering what is the intended interpretation of cases in which you have a component specifying a measure (qb:MeasureProperty) attached to slice (qb:componentAttachment qb:Slice). Based on the Data Cube Vocabulary formalization in RDF I see this is possible. Moreover, the Payments Ontology that is based on the Data Cube Vocabulary defines payment:totalNetAmount and payment:totalGrossAmount measure properties that are attached to slices (payment:Payment is a qb:Slice) (see https://data.gov.uk/resources/payments). In this case, it seems that measure attached to slice is meant to represent an aggregation of the measures on observations included in the slice. Contrary to this interpretation, my understanding of component attachment on the slice level (based on reading the DCV specification) is that the value of the component is fixed for all observations included in the slice. Viewed in this way, all observation in a slice with attached measure would share the same measure value (which probably does not make much sense).

I'd like to learn how to interpret measures attached to slices and how much of it is formalized and described in the specification vs. being wishful thinking.

Best,

Jindřich

-- 
Jindřich Mynarz

Dave Reynolds

unread,
Oct 2, 2015, 6:20:29 AM10/2/15
to publishing-st...@googlegroups.com
Hi Jindřich,

On 01/10/15 09:53, Jindřich Mynarz wrote:

> I'm wondering what is the intended interpretation of cases in which you
> have a component specifying a measure (qb:MeasureProperty) attached to
> slice (qb:componentAttachment qb:Slice). Based on the Data Cube
> Vocabulary formalization in RDF I see this is possible. Moreover, the
> Payments Ontology that is based on the Data Cube Vocabulary defines
> payment:totalNetAmount and payment:totalGrossAmount measure properties
> that are attached to slices (payment:Payment is a qb:Slice) (see
> https://data.gov.uk/resources/payments).

These examples are misleading, see below.

> Contrary to this
> interpretation, my understanding of component attachment on the slice
> level (based on reading the DCV specification) is that the value of the
> component is fixed for all observations included in the slice. Viewed in
> this way, all observation in a slice with attached measure would share
> the same measure value (which probably does not make much sense).

Correct, and indeed doesn't make much sense there.

> I'd like to learn how to interpret measures attached to slices and how
> much of it is formalized and described in the specification vs. being
> wishful thinking.

If the measure (qb:MeasureProperty) is a measure *in that cube* then
there is no point in attaching it to the slice, as you say.

However, it is often convenient to put aggregate information on a slice
*using other properties*. It would have been nice to formalize this in
the QB spec and there are various proposals that were around at the time
but it seemed too early and we lacked time to get it done.

What makes the Payments ontology confusing is that it provides some
convenience properties for such aggregate values (the ones you point
out, payment:totalNetAmount and payment:totalGrossAmount). That's fine
but it goes one step further and declares those as being of type
qb:MeasureProperty which is were the confusion begins!

Why does it do that?

If you look at the section of [1] on "Technical Note on Data Cubes"
under "Extensions and well-formedness" there is the explanation.
Normally a payments cube is a cube of expenditure lines where you might
aggregate up the lines to single payment amount in the slices. However,
sometimes people only wanted to publish the aggregate information and
not the more detailed breakdown. For that they would use a different
cube structure with a different DSD, as illustrated in that section.

So those two aggregation properties can be used in two roles.

On a normal expenditure cube they are not in the DSD, they are not
measures in the cubes, they are just convenient RDF properties for
recording aggregate information on slices.

On a summarized payments-only cube they are measures in the DSD and they
are used on observations, not on slices.

Whether this option to allow two different sorts of cubes, one projected
from the other, was a good one is hard to tell. The payments ontology
was developed before the Data Cube Vocabulary went through
standardization and before there was much experience around using it.

Dave


[1] https://data.gov.uk/resources/payments

Jindřich Mynarz

unread,
Oct 2, 2015, 7:03:39 AM10/2/15
to publishing-st...@googlegroups.com
Hi Dave,

as always, thank you for the helpful explanation! I hope you don't mind me digging into the design decisions behind the Payments Ontology. I believe it is helpful to learn about them for all users of the Data Cube Vocabulary. Morevover, for me it helps in a similar effort of using DCV for budget data in OpenBudgets.eu project (https://openbudgets.eu).

On Fri, Oct 2, 2015 at 12:20 PM, Dave Reynolds <dave.e....@gmail.com> wrote:
However, it is often convenient to put aggregate information on a slice *using other properties*. It would have been nice to formalize this in the QB spec and there are various proposals that were around at the time but it seemed too early and we lacked time to get it done.

Do you know if there is a plan to revisit DCV and extend it with regards to aggregations? Alternatively, this can be also approached as a separate vocabulary that builds on top of DCV.

For example, I see that in the DCV Turtle file at <http://purl.org/linked-data/cube#> there is still a commented out definition of qb:AggregatableHierarchy, so it seems like something unfinished.

So those two aggregation properties can be used in two roles.

On a normal expenditure cube they are not in the DSD, they are not measures in the cubes, they are just convenient RDF properties for recording aggregate information on slices.

On a summarized payments-only cube they are measures in the DSD and they are used on observations, not on slices.

Whether this option to allow two different sorts of cubes, one projected from the other, was a good one is hard to tell. The payments ontology was developed before the Data Cube Vocabulary went through standardization and before there was much experience around using it.

This is perfectly clear.

However, I wonder that represented in this way the link between aggregations and individual expenditure lines is not explicit. You end up with 2 different data cubes; one for the aggregations, one for the break-down. There are no explicit links between the data cubes that would allow to reproduce the aggregation. I seems to me that the aggregation semantics is implied by the property names (total*) rather than some formalization. Or am I missing something?

- Jindřich

Dave Reynolds

unread,
Oct 2, 2015, 9:14:07 AM10/2/15
to publishing-st...@googlegroups.com
On 02/10/15 12:03, Jindřich Mynarz wrote:
> Hi Dave,
>
> as always, thank you for the helpful explanation! I hope you don't mind
> me digging into the design decisions behind the Payments Ontology. I
> believe it is helpful to learn about them for all users of the Data Cube
> Vocabulary. Morevover, for me it helps in a similar effort of using DCV
> for budget data in OpenBudgets.eu project (https://openbudgets.eu).
>
> On Fri, Oct 2, 2015 at 12:20 PM, Dave Reynolds
> <dave.e....@gmail.com <mailto:dave.e....@gmail.com>> wrote:
>
> However, it is often convenient to put aggregate information on a
> slice *using other properties*. It would have been nice to formalize
> this in the QB spec and there are various proposals that were around
> at the time but it seemed too early and we lacked time to get it done.
>
>
> Do you know if there is a plan to revisit DCV and extend it with regards
> to aggregations?

Not that I'm aware of.

> Alternatively, this can be also approached as a
> separate vocabulary that builds on top of DCV.

Sure.

> For example, I see that in the DCV Turtle file at
> <http://purl.org/linked-data/cube#> there is still a commented out
> definition of qb:AggregatableHierarchy, so it seems like something
> unfinished.
>
> So those two aggregation properties can be used in two roles.
>
> On a normal expenditure cube they are not in the DSD, they are not
> measures in the cubes, they are just convenient RDF properties for
> recording aggregate information on slices.
>
> On a summarized payments-only cube they are measures in the DSD and
> they are used on observations, not on slices.
>
> Whether this option to allow two different sorts of cubes, one
> projected from the other, was a good one is hard to tell. The
> payments ontology was developed before the Data Cube Vocabulary went
> through standardization and before there was much experience around
> using it.
>
>
> This is perfectly clear.
>
> However, I wonder that represented in this way the link between
> aggregations and individual expenditure lines is not explicit. You end
> up with 2 different data cubes; one for the aggregations, one for the
> break-down. There are no explicit links between the data cubes that
> would allow to reproduce the aggregation. I seems to me that the
> aggregation semantics is implied by the property names (total*) rather
> than some formalization. Or am I missing something?

For the payments ontology the idea is that you publish one or other not
both. So if you want to publish the detailed expenditure line breakdown
you do so and put the aggregate information on the slices. The slice
structure then tells you what these "totalX" apply to. It's true that
their full semantics is only in the text but that's true of 90% of this
stuff :)

You only publish an explicit payments-only cube if you are not
publishing the breakdown.

However, if people did publish both cubes there is indeed no way to
explain the relationship between them. That was the substance of the
postponed issue [1] from the working group. The trouble is that this
gets into expressing arbitrary computational relationships. The trouble
with that is not that it's hard as such but that there's so much else in
that space that getting the right tasteful balance between reuse and
reinvention is hard.

Dave

[1] http://www.w3.org/2011/gld/track/issues/30
Reply all
Reply to author
Forward
0 new messages