Mapping of SDMX COG to Attributes, Dimensions

65 views
Skip to first unread message

Leigh Dodds

unread,
Oct 11, 2012, 4:25:47 AM10/11/12
to publishing-st...@googlegroups.com
Hi,

How is the mapping from the SDMX Content Oriented Guidelines to the DataCube vocabulary carried out. I'm curious about the representation of a couple of components.

For example FREQ in the COG is mapped to a dimension, whereas FREQ_DETAIL, etc are all attributes. I would have thought that FREQ -- which describes the frequency at which observations are made -- would have been an attribute too, rather than a dimension.

In contrast DISS_FORMAT, plus related items like MICRO_DATA_ACC are all attributes. Which seems correct.

The FREQ case might be some nuance of SDMX that I'm not understanding, but I'd also like to understand how the mappings are done as presumably its done using more information than is available in [1]?

Cheers,

L.

A. Gregory

unread,
Oct 11, 2012, 4:34:41 AM10/11/12
to publishing-st...@googlegroups.com

Leigh:

 

There is a reason Frequency is used as a dimension in SDMX – the time series data produced and used by central banks (especially) is often captured in multiple parallel frequencies. Thus, we will have the same data expressed in quarterly, monthly, and annual series. The way time impacts the data means that these are really different data sets (you can’t compute the annual series from the quarterly in a direct fashion).

 

In order to disambiguate these otherwise-identical keys for the series, Frequency needs to be a dimension – otherwise you have a clash in the identifiers of the data.

 

I am not commenting here on how this functions in Data Cube, but just giving you the background on what is in the SDMX specification.

 

Cheers,

 

Arofan

Leigh Dodds

unread,
Oct 11, 2012, 4:42:00 AM10/11/12
to publishing-st...@googlegroups.com
Hi,

On Thu, Oct 11, 2012 at 9:34 AM, A. Gregory
<arofan....@earthlink.net> wrote:
> Leigh:
>
> There is a reason Frequency is used as a dimension in SDMX – the time series
> data produced and used by central banks (especially) is often captured in
> multiple parallel frequencies. Thus, we will have the same data expressed in
> quarterly, monthly, and annual series. The way time impacts the data means
> that these are really different data sets (you can’t compute the annual
> series from the quarterly in a direct fashion).

Thank you, that's clarifies it for me!

Cheers,

L.
--
Leigh Dodds
Freelance Technologist
Open Data, Linked Data Geek
t: @ldodds
w: ldodds.com
e: le...@ldodds.com

Dave Reynolds

unread,
Oct 11, 2012, 5:07:03 AM10/11/12
to publishing-st...@googlegroups.com
To do the RDF mapping we manually classified each COG term based on
interpreting the COG plus some clarifying discussions. The mapping file
is attached - though I can't remember if attachments work in google
groups :)

The bulk of them are pretty unambiguous.

Where a term could plausibly be used in multiple roles (e.g. CURRENCY)
we allowed to take on all the plausible roles since in the RDF
representation there's no clash in the key.

FREQ was a special case because it is such an important case in SDMX it
is mentioned explicitly in the information model and is expected to be a
dimension as Arofan says.

However, in Data Cube there would be no technical difficulty also having
an sdmx-attribute:freq which points to the same COG concept. The URI
doesn't clash so it can't get confused with the sdmx-dimension:freq.
That would be in keeping with the "if in doubt have both" approach.

Unless anyone sees a problem with that I could run a new conversion and
update the rdf docs.

Dave

On 11/10/12 09:34, A. Gregory wrote:
> Leigh:
>
> There is a reason Frequency is used as a dimension in SDMX – the time
> series data produced and used by central banks (especially) is often
> captured in multiple parallel frequencies. Thus, we will have the same
> data expressed in quarterly, monthly, and annual series. The way time
> impacts the data means that these are really different data sets (you
> can’t compute the annual series from the quarterly in a direct fashion).
>
> In order to disambiguate these otherwise-identical keys for the series,
> Frequency needs to be a dimension – otherwise you have a clash in the
> identifiers of the data.
>
> I am not commenting here on how this functions in Data Cube, but just
> giving you the background on what is in the SDMX specification.
>
> Cheers,
>
> Arofan
>
> *From:*publishing-st...@googlegroups.com
> [mailto:publishing-st...@googlegroups.com] *On Behalf Of
> *Leigh Dodds
> *Sent:* Thursday, October 11, 2012 1:26 AM
> *To:* publishing-st...@googlegroups.com
> *Subject:* [publishing-statistical-data] Mapping of SDMX COG to
concepts-classified.txt

Leigh Dodds

unread,
Oct 16, 2012, 5:06:33 AM10/16/12
to publishing-st...@googlegroups.com
Hi Dave,

It might be useful to have FREQ as an attribute too, but admit to
being hesitant because I think its useful to lean on the SDMX
documentation and modelling wherever possible. If only because
currently examples of use of DataCube are still few and far between. I
think anywhere that DataCube differs from SDMX adds ambiguity, which
would then need to be documented.

While the mapping of COG terms to properties is relatively unambiguous
I think it'd be useful to include your mapping file in the project
somewhere. Perhaps in the wiki or perhaps even as a non-normative
section in the specification. I found myself reading the COG
guidelines then grepping the various sdmx-* files to look for the
terms. Having a ready reference would be useful.

Cheers,

L.
Reply all
Reply to author
Forward
0 new messages