Code lists for units of measure?

2,639 views
Skip to first unread message

Dave Reynolds

unread,
Mar 10, 2010, 5:36:30 PM3/10/10
to publishing-st...@googlegroups.com
Next dumb question ...

I'm trying to represent some statistics which are essentially counts.

The SDMX cross domain concept "UNIT_MEASURE" is the right property to
use to denote this and is required to be denoted using a code list. Yet
on the sdmx.org site there is no code list for UNIT_MEASURE, seems to be
listed in the "future work" section.

Is there some other source of agreed SDMX code lists other than those on
sdmx.org?

I assume counts are pretty common in existing SDMX usage, is there a de
facto common practice for the UNIT_MEASURE code list which includes this
case?

Dave

Richard Cyganiak

unread,
Mar 10, 2010, 7:41:54 PM3/10/10
to publishing-st...@googlegroups.com
Dave,

I don't know what the common practice in SDMX is.

I looked into the general question of units of measurement though, to
see if there's a good set of URIs for units that we could use straight
away. I'll share what i found below.

The quick summary: As a stopgap measure, we could use the UN/CEFACT
Rec20 codes listed in this Excel file here, as literals:
http://www.unece.org/cefact/recommendations/rec20/rec20_rev4E_2006.xls

Or, if we really really want to use URIs straight away, we could use
http://data.nasa.gov/qudt/owl/unit#UnitName
where the unit name comes from here:
http://www.qudt.org/qudt/owl/1.0.0/unit/index.html

The code for a unit-less count would be "C62" or http://data.nasa.gov/qudt/owl/unit#Number
respectively.

More gory details pasted below -- I plan to write this up as a blog
post later.

Best,
Richard


----------------

The UN/CEFACT Recommendation 20 list of units of measurements looks
great, it's available as an Excel sheet here:
http://www.unece.org/cefact/recommendations/rec20en.htm

The code for “one; piece; unit” is “C62”. So the codes are not the
most friendly.

Unfortunately, in absence of a licensing statement I don't think it's
legally possible to create derivatives (like a SKOS ConceptScheme,
which is the intended RDF representation of codelists).

As a precedent: These codes, as literal values, are used in
GoodRelations, a popular RDF vocabulary for eCommerce. The maintainer
of GoodRelations told me that he thoroughly investigated the area and
settled on this approach.

There is a code for Pint (PTI), but none for Teaspoon.

Another prominent code list is UCUM:
http://unitsofmeasure.org/

This looks very well thought out. I didn't dig to find out what the
UCUM Organization is and what the license conditions are.

In UCUM it appears that the unit for count is “1” (the “default
unit”), but it's not clear to me wether that's a “real” unit in the
code list.

Teaspoon is [tsp_us], a pint is [pt_br].

SI units are of course as standard as one could possibly wish. They
have standard symbols, but no standard codes. Many of the symbols are
outside of the US-ASCII character set and thus cannot easily be typed
or become part of URIs. Cubic micrometers, for example. There is no
symbol for the dimensionless unit (count).

Starting with URI sets. There's the NASA QUDT Unit Ontology:
http://www.qudt.org/qudt/owl/1.0.0/unit/index.html

The namespace is <http://data.nasa.gov/qudt/owl/unit#>, commonly
abbreviated as “unit:”, and for counts you'd use unit:Number. This
looks pretty good, but the data.nasa.gov URIs don't resolve and hence
are not exactly linked data friendly. Someone involved in the project
told me on Twitter that they are working on making them resolvable,
but registering a subdomain takes a while at nasa.gov.

A nice thing is that this contains currencies as well (common in
statistical data). Coverage there is limited to currencies that are
still in use, so it has the Euro but doesn't have the German Mark.

And unit:Teaspoon exists, and so does unit:PintImperial!

The Open Geospatial Consortium has registered a URN namespace for
units of measures. The W3C's Semantic Sensor Networks Incubator Group
will be using them. Two sub-namespaces are registered, one for SI
units and one for the UCUM code list. I could not find an
authoritative list of the units recognised by the OGC, they appear to
just defer to the authorities for the sub-namespaces. I could not find
information on how to encode special characters that are common in SI
unit symbols and not allowed in URNs. The relevant web pages:

http://www.opengeospatial.org/ogcUrnPolicy (OGC URNs)
http://www.bipm.org/en/si/ (SI units)
http://unitsofmeasure.org/ (UCUM)

OGC unit URNs look like this:

<urn:ogc:def:uom:SI:2000:kg>
<urn:ogc:def:uom:UCUM::[pt_br]>

I couldn't find out how symbols outside of US-ASCII should be handled
in the SI namespace.

SI and UCUM don't really have a code for counts. I found the following
in the OGC's own URN resolver, but according to the official OGC's URN
policy there is no :OGC: sub-namespace:

<urn:ogc:def:uom:OGC:1.0:unity>

Altogether, the experience with the OGC URNs strengthens my dislike of
URNs as identifiers. The namespace is underspecified, documentation is
lacking, and management of the namespace seems to be a bit lax for a
standards organisation.

There's a few other options, but they all have various shortcomings
(most significantly, no major organisational backing) and I would
consider them inferior to the options above:
http://idi.fundacionctic.org/muo/muo-vocab.html
http://www.w3.org/2007/ont/unit

--
Linked Data Technologist • Linked Data Research Centre
Digital Enterprise Research Institute (DERI), NUI Galway, Ireland
http://linkeddata.deri.ie/
skype:richard.cyganiak
tel:+353-91-49-5711

Dave Reynolds

unread,
Mar 11, 2010, 3:30:25 AM3/11/10
to publishing-st...@googlegroups.com, publishing-st...@googlegroups.com
That's great, thanks Richard. I'll use the qudt option for now. I'm
still intruiged as to how this is typically handled in sdmx.

Dave

[Sent from phone.]

On 11 Mar 2010, at 00:41, Richard Cyganiak <richard....@deri.org>
wrote:

Reply all
Reply to author
Forward
0 new messages