attributes

8 views
Skip to first unread message

Bob Jolliffe

unread,
Mar 16, 2010, 8:58:11 PM3/16/10
to sdm...@googlegroups.com
Hi Gary

I have spent the afternoon trying to wrap up the schema for the DSD
but I am a bit stuck on attributes.

1. We have had a number of discussions around fequency and
periodicity. The current concepts and codelists do not reflect where
we are on this. Probably the most important discussion is here:
http://groups.google.com/group/sdmx_hd/browse_thread/thread/be43a95a45349949.

The consequence of this is that we require a
(i) FREQUENCY concept in the SDMX -HD mandatory dimension concepts.
(ii) CL_PERIODICITY renamed CL_FREQUENCY. Not really a requirement
but it makes the naming more uniform.

Also ..

In the CS_ATTRIBUTE conceptscheme we have:
FREQ Frequency Frequency refers to the time interval between the
observations of a time series.
FREQ_DISS String Expected Frequency of Data Dissemination Expected
frequency of updates of WHO estimates.
FREQ_RECOM String Recommended Collection Expected frequency of updates
of national-level data.

Can I suggest that (i) FREQ_DISS and (ii) FREQ_RECOM are AgencyID=WHO
concepts so, should be removed. And that the definition of FREQ is
misleading. Presumably this concept is meant to be used with the MSD
to describe the expected frequency of collection for a
dataelement/indicator. It cannot refer to the actual time interval
between actual observations because it is metadata about the indicator
not the dataset. For the former we already have the frequency
dimension on the series.

That then should restore some sanity to the time dimensions/attributes.

2. Regarding the remaining attributes I am having some trouble
specifying exactly what we actually require. For the sake of
progress I am leaving it open for now in the sense that implementors
which produce a DSD are free to define whatever attributes they wish.
They should where it makes sense make use of the existing concepts in
CS_Attributes and existing SDMX-HD codelists to do so. And consumers
have to consume what the DSD tells them they have to consume :-)

Tomorrow is Paddy's Day here in Ireland so I am off work. I'll try
and look at my mail in the evening.

Regards
Bob.

Gary Patchen

unread,
Mar 17, 2010, 5:22:43 AM3/17/10
to sdm...@googlegroups.com

Hi Bob,

 

OK quick recap is needed:

 

"While frequency refers to the time interval between the observations of a time series, periodicity refers to the frequency of compilation of the data (e.g., a time series could be available at annual frequency but the underlying data are compiled monthly, thus have a monthly periodicity)."

 

ID=FREQ = The interval between observations e.g YEARLY

ID=PERIODICITY = The frequency in which observational data was compiled e.g. MONTHLY.

 

Actions:

 

1. I will deleted these two concepts as suggested as they should be Agency specific:

 

FREQ_DISS = Expected frequency of updates of WHO estimates.

FREQ_RECOM = Expected frequency of updates of national-level data.

 

2. For me what you call FREQUENCY is FREQ which uses CL_FREQ

 

In fact when you look at some of the SDMX samples they also use FREQ not FREQUENCY:

 

 

As for its metadata use, I have no problem in using the same concept “FREQ” as you have the metadata stating “FREQ of observations should be Quarterly” and then actual observations showing that the observations are “Monthly”.

 

3. FREQUENCY concept in the SDMX -HD mandatory dimension concepts?

 

      As previously stated I think this is ID=FREQ not ID=FREQUENCY.

      I initially thought that FREQ should be a mandatory attribute, but not a dimension as it is not used to key the observations.

However, the SDMX samples do have this as a Dimension:

 

<structure:Dimension conceptRef="FREQ" codelist="CL_FREQ" isFrequencyDimension="true" />

 

As the name impliesisFrequencyDimension” is a dimension.

 

Are you therefore suggesting that the FREQ concept needs to me moved to the CS_COMMON CompactScheme?

 

4. CL_PERIODICITY renamed CL_FREQUENCY?

 

      As previously stated I think this is CL_FREQ not CL_FREQUENCY.

     

      I fail to see why CL_PERIODICITY should be renamed CL_FREQUENCY.

 

a)  Are you suggesting by renaming CL_PERIODICITY to CL_FREQUENCY we will end up with CL_FREQUENCY and the existing CL_FREQ?

Confusing!

 

b)  What happens to CL_FREQ if we actually rename CL_PERIODICITY to CL_FREQ

 

c)  CL_FREQ is published as a cross domain code-list (http://sdmx.org/wp-content/uploads/2009/01/02_sdmx_cog_annex_2_cl_2009.pdf#9 ) so for me we should not changes this.

 

Suggest we leave CL_FREQ and CL_PERIODICITY as is for now – OK?

 

Regards

 

gary patchen

lead consultant

 

direct + 41.(0)58.307.7094

mobile +41.(0)79.333.1339

gary.p...@b-i.com

www.b-i.com

 

blue-infinity headquarters

+41(0)58.307.7000

+41(0)58.307.7001

INTERNATIONAL

t +800.307.70.000

f +800.307.70.001

 

b-i  branding.technology.integration.

 

The information in this e-mail, and those ensuing, is confidential and may be legally privileged. It is intended solely for the addressee. If you are not the intended recipient, please destroy this message and notify us immediately.

 

Please think of the environment before printing this email

--

You received this message because you are subscribed to the Google Groups "SDMX-HD (Health Domain)" group.

To post to this group, send email to sdm...@googlegroups.com.

To unsubscribe from this group, send email to sdmx_hd+u...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/sdmx_hd?hl=en.

 

image001.png

r.friedman

unread,
Mar 17, 2010, 6:48:47 AM3/17/10
to sdm...@googlegroups.com
List --
    As I recall, SDMX supports 2 data formats, one of which is longitudinal and the other of which is cross-sectional, and that the compact dataset is the cross-sectional version.  Do I have that right?  Are we still not supporting the longitudinal one?  Seems like there are a lot of AVR applications for which the longitudinal format would be preferred.  Also seems like we are supporting a push model and not a pull model.  Is this rest for v.2 or what?
Thanks, Roger

Bob Jolliffe

unread,
Mar 17, 2010, 6:02:30 PM3/17/10
to sdm...@googlegroups.com
Hi Roger

SDMX defines 3 formats - cross-sectional, compact and generic. The
compact dataset format is the time series one (is that what you mean
by longitudinal?). In the routine reporting of routine data there is
no real benefit in using the compact form as the observations will
typically all be for one time period. In the proof of concept we did
with OpenMRS reporting monthly ART summary data to DHIS using SDMX, we
used the cross-sectional format and that seemed to meet our
requirements well. Ryan might have additional comments to make. It
was a relatively straightforward modification to his openmrs module to
allow it to produce a cross-sectional report instead of a compact
dataset one.

Given that we are not yet sitting on a pool of experience of sdmx-hd
deployment for data transfer I have argued that we should not be over
prescriptive about this. We have some fairly tight rules about the
definition of datasets (the DSD). I strongly feel that if there are
scenarios which are better supported by one data format or the other
we should let the requirements dictate which format to use. The
schema for both formats is generated off the DSD. There is nothing
really to be gained by saying you must use a particular dataset
format. Ultimately it should be the publisher of the DSD in question
which should dictate which format it would like to see datasets coming
back in.

Regards
Bob

Bob Jolliffe

unread,
Mar 17, 2010, 6:51:43 PM3/17/10
to sdm...@googlegroups.com
Hi

On 17 March 2010 09:22, Gary Patchen <gary.p...@b-i.com> wrote:

Hi Bob,

 

OK quick recap is needed:

 

"While frequency refers to the time interval between the observations of a time series, periodicity refers to the frequency of compilation of the data (e.g., a time series could be available at annual frequency but the underlying data are compiled monthly, thus have a monthly periodicity)."

 

ID=FREQ = The interval between observations e.g YEARLY

ID=PERIODICITY = The frequency in which observational data was compiled e.g. MONTHLY.


OK.  I'll come back on this below ..

 

Actions:

 

1. I will deleted these two concepts as suggested as they should be Agency specific:

 

FREQ_DISS = Expected frequency of updates of WHO estimates.

FREQ_RECOM = Expected frequency of updates of national-level data.

 

2. For me what you call FREQUENCY is FREQ which uses CL_FREQ

 

In fact when you look at some of the SDMX samples they also use FREQ not FREQUENCY:

 

 

As for its metadata use, I have no problem in using the same concept “FREQ” as you have the metadata stating “FREQ of observations should be Quarterly” and then actual observations showing that the observations are “Monthly”.

 

3. FREQUENCY concept in the SDMX -HD mandatory dimension concepts?

 

      As previously stated I think this is ID=FREQ not ID=FREQUENCY.

      I initially thought that FREQ should be a mandatory attribute, but not a dimension as it is not used to key the observations.

However, the SDMX samples do have this as a Dimension:

 

<structure:Dimension conceptRef="FREQ" codelist="CL_FREQ" isFrequencyDimension="true" />

 

As the name impliesisFrequencyDimension” is a dimension.

 

Are you therefore suggesting that the FREQ concept needs to me moved to the CS_COMMON CompactScheme?


This is really the central point we have to put right.  Quite apart from the samples you refer to (which are correct) it is made clear in the standard that a single dimension with an attribute of isFrequencyDimension=true is a requirement for any KeyFamily which defines a TimeDimension - essentially any KeyFamily which can be used to generate a Compact DataSet schema.  So yes we need this dimension.  In which case, yes we do need to have a concept for it in CS_COMMON.  I am quite happy to call it FREQ or FREQUENCY.  The name is of little importance.  I shall call it FREQ. 

 

4. CL_PERIODICITY renamed CL_FREQUENCY?

 

      As previously stated I think this is CL_FREQ not CL_FREQUENCY.

     

      I fail to see why CL_PERIODICITY should be renamed CL_FREQUENCY.

 

a)  Are you suggesting by renaming CL_PERIODICITY to CL_FREQUENCY we will end up with CL_FREQUENCY and the existing CL_FREQ?

Confusing!

 

b)  What happens to CL_FREQ if we actually rename CL_PERIODICITY to CL_FREQ

 

c)  CL_FREQ is published as a cross domain code-list (http://sdmx.org/wp-content/uploads/2009/01/02_sdmx_cog_annex_2_cl_2009.pdf#9 ) so for me we should not changes this.

 

Suggest we leave CL_FREQ and CL_PERIODICITY as is for now – OK?


We don't need to have two codelists both describing the same thing - daily, weekly, monthly etc.  The fact that we may have a dimension and one or more attributes which all refer to period types only means that we need to have different concepts for them.  But logically we only need to maintain one codelist.  So we can have:

<structure:Dimension conceptRef="FREQ" isFrequencyDimension="true" conceptSchemeRef="CS_COMMON"  conceptVersion="1.0"
                conceptSchemeAgency="SDMX-HD" codelist="CL_PERIODICITY" codelistVersion="1.0" codelistAgency="SDMX-HD" />

as well as

<structure:Attribute conceptRef="PERIODICITY" conceptSchemeRef="CS_COMMON" conceptVersion="1.0" conceptSchemeAgency="SDMX-HD"
                    codelist="CL_PERIODICITY" codelistVersion="1.0" codelistAgency="SDMX-HD"                
                    attachmentLevel="Series" assignmentStatus="Mandatory">
                </structure:Attribute> 

as well as any number of other attributes (FREQ_DISS, FREQ_RECOM etc) one might dream up which make use of period types from a codelist.  They don't each need their own codelists.  They should all simply refer to the sdmx-hd frequency codelist.  I am relatively neutral as to whether it is call CL_FREQ, CL_FREQUENCY or CL_PERIODICITY.  I had suggested CL_FREQUENCY but am happy to stick with CL_FREQ.

I am in favour of making use of the SDMX COG where possible - that is excellent.  I do see that there a few extra period types defined currently in CL_PERIODICITY which are not in CL_FREQ.  That is fine - we should simply add them.  This will be much more maintainable and easier for implementors to get their heads around than having two competing codes for saying weekly, monthly etc.

Regards
Bob

image001.png

Ryan

unread,
Mar 18, 2010, 3:42:26 AM3/18/10
to sdm...@googlegroups.com
Hi,

The cross-sectional format definitely seems to suit our OpenMRS to DHIS integration pilot very well. As Bob is saying for routine reporting it is definitely the format that makes the most sense. In most cases when data is being reported regularly for a specific time period it makes sense to use the cross-sectional messages.

There isn't a huge difference in the structure of the two types of formats, so my module can actually produce either format at the moment (The compact format may not be 100% correct though as I haven't been concentrating my efforts there). I think both formats have a part to play and I agree that it should be left to the user to decide which format suits the requirements.

Regards,
Ryan
--
Ryan Crichton
Software Developer
ry...@jembi.org
http://www.jembi.org
Reply all
Reply to author
Forward
0 new messages