AQE machine tags: what metadata conventions already exist?

65 views
Skip to first unread message

Martin Dittus

unread,
May 18, 2012, 9:00:20 AM5/18/12
to airqua...@googlegroups.com
Starting a new thread, we have a bad habit of hijacking other people's threads :)

On 17 May 2012, at 00:53, Chris Nafis wrote:

> We could sort all the Feed Tags/units and come up with a suggested list?
> The sooner we do it the better. I don't know how many live feeds there are, but I would bet they are growing fast.

Please post links to every variant of air quality metadata you can find, with egg or without. I'll summarise it on the wiki (unless César/Sara are interested in taking that on.)

m.

Martin Dittus

unread,
May 18, 2012, 10:03:18 AM5/18/12
to airqua...@googlegroups.com
Starting with a few basic ones: tags for feeds and datastreams.

People currently tag their feeds with "air quality", "airqualityegg", "air quality egg", "aqe", and other variants
https://cosm.com/feeds?tag=air+quality
https://cosm.com/feeds?q=airqualityegg
https://cosm.com/feeds?tag=air+quality+egg
https://cosm.com/feeds?q=aqe

In a separate thread Usman suggested the "project:id:aqe" machine tag, which appears to be unused so far:
https://cosm.com/feeds?q=project%3Aid%3Daqe

Can someone do a tag analysis of these search results to summarise which kinds of datastream annotations and units are popular?

(Sidenote to the Cosm product team: making existing/emergent tagging conventions discoverable would help greatly with collaboration. This could be as simple as implementing autocomplete-type tagging suggestions. As far as I can tell the current "Tags" field is free-form text without any assistance.)

m.

Martin Dittus

unread,
May 18, 2012, 10:14:13 AM5/18/12
to airqua...@googlegroups.com

On 18 May 2012, at 15:03, Martin Dittus wrote:

> Can someone do a tag analysis of these search results to summarise which kinds of datastream annotations and units are popular?

On second thought, hold that -- I just remembered that I produced a TSV of all Cosm feeds a week ago, will post a summary later.

m.

Simone Cortesi

unread,
May 18, 2012, 10:16:00 AM5/18/12
to airqua...@googlegroups.com
On Fri, May 18, 2012 at 4:03 PM, Martin Dittus <dek...@gmail.com> wrote:
> (Sidenote to the Cosm product team: making existing/emergent tagging conventions discoverable would help greatly with collaboration. This could be as simple as implementing autocomplete-type tagging suggestions. As far as I can tell the current "Tags" field is free-form text without any assistance.)

+1 on this!

--
-S

Martin Dittus

unread,
May 18, 2012, 11:49:51 AM5/18/12
to airqua...@googlegroups.com

Just published a summary of Cosm's air quality feed metadata here:
https://gist.github.com/2725914

It's not the most readable form but it'll do for a first scan. This is based on the Cosm feeds as of 26 April 2012.

I extracted all feeds (environments) and their datastreams that refer to any of these keywords: air quality, airquality, airqualityegg, air quality egg, aqe, project:id:aqe. This selection is case insensitive; but I did not convert cases for any metadata I'm summarising below. Feeds and streams will occur more than once in these rankings if they have multiple tags (once per tag.)

The ranked summaries are as follows:

Unique number of AQ feeds with a certain tag:
https://gist.github.com/2725914#file_aq_feed_tags.txt

Unique number of AQ data streams using a certain ID:
https://gist.github.com/2725914#file_aq_datastream_ids.txt

Unique number of AQ data streams with a certain tag:
https://gist.github.com/2725914#file_aq_datastream_tags.txt

Unique number of AQ data streams using a certain unit identifier:
https://gist.github.com/2725914#file_aq_datastream_units.txt

m.

Martin Dittus

unread,
May 18, 2012, 2:30:33 PM5/18/12
to airqua...@googlegroups.com
César's older post about contextual metadata is worth bringing into this as well, though it's more a proposal and (I understand) not currently in use:
https://groups.google.com/d/msg/airqualityegg/BhRBQYeDqLc/3bM3ZcXSPSsJ

This suggests metadata for sensor placement (indoors/outdoors, height above street level, etc) and device (type, version, etc.)

m.


On 18 May 2012, at 14:00, Martin Dittus wrote:

Ed Borden

unread,
May 19, 2012, 4:30:59 PM5/19/12
to airqua...@googlegroups.com
I put the suggestion about autocompleting tags to the product team.

I also want to say that using "AirQualityEgg" or
"Project:ID:AirQualityEgg" should be the overall tag. It needs to
stay consistent with how we are talking about it on the
web/twitter/etc. Also, acronyms are never good. Everytime I have to
write "IoT", I cringe.

Daniel Nüst

unread,
May 21, 2012, 5:26:04 AM5/21/12
to airqua...@googlegroups.com, Martin Dittus
Am 18.05.2012 15:00, schrieb Martin Dittus:
> On 17 May 2012, at 00:53, Chris Nafis wrote:
>> We could sort all the Feed Tags/units and come up with a suggested list?
>> The sooner we do it the better. I don't know how many live feeds there are, but I would bet they are growing fast.
>
> Please post links to every variant of air quality metadata you can find, with egg or without. I'll summarise it on the wiki (unless C�sar/Sara are interested in taking that on.)

Just to give a glimpse into non-Internet of Things world. This is how an
"official" air quality monitory station can (!) be modelled in OGC's
SensorML [0].

http://geoviqua.dev.52north.org/SIR/sir?service=SIR&version=0.3.1&REQUEST=DescribeSensor&SENSORIDINSIR=416

It follows the guidelines presented in section 7 here [1], and section
12 here [2].

Please don't get me wrong, I don't want to advocate that model for the
AirQualityEgg, but I think we can learn a bit form those conventions in
the way that we have a good collection of required elements, may they be
fixed or generated, such as identifiers, classification (AirQualityEgg
model and/or version), time interval with data, contact information, ...
Having conventions that allow us (or: me) to map AirQualityEgg metadata
to the referenced (or other, admittedly complicated and potentially
bloated) standards will allow integration of the data to the OGC
information system.

And of course, we must provide a very simple way for users to create the
non-automatic part of the metadata conventions we come up with.

I am in a favour of a model that is partly fixed now to ensure a minimal
set of information, but of course open for dynamic development and
evolution, but maybe not as much as say the OpenStreetMap tagging
system. Comments?

Best,
Daniel

[0] http://www.opengeospatial.org/standards/sensorml
[1] http://portal.opengeospatial.org/files/?artifact_id=46693
[2] http://portal.opengeospatial.org/files/?artifact_id=37944

--
Daniel N�st
52� North Initiative for Geospatial Open Source Software GmbH
Martin-Luther-King-Weg 24
48155 M�nster, Germany
E-Mail: d.n...@52north.org
Fon: +49-(0)-251�396371-36
Fax: +49-(0)-251�396371-11
http://52north.org/
General Managers: Dr. Albert Remke, Dr. Andreas Wytzisk
Local Court Muenster HRB 10849

Martin Dittus

unread,
May 21, 2012, 7:53:32 AM5/21/12
to Daniel Nüst, airqua...@googlegroups.com
Thank you very much for this. I think looking at such precedents will be well worth the time. I'm curious about any other such models people have seen.

And yes, your suggestion sounds good: "a model that is partly fixed now to ensure a minimal set of information, but of course open for dynamic development and evolution", while keeping it simple enough so it remains accessible for moderate beginners.

m.



On 21 May 2012, at 10:26, Daniel Nüst wrote:

> Am 18.05.2012 15:00, schrieb Martin Dittus:
>> On 17 May 2012, at 00:53, Chris Nafis wrote:
>>> We could sort all the Feed Tags/units and come up with a suggested list?
>>> The sooner we do it the better. I don't know how many live feeds there are, but I would bet they are growing fast.
>>
>> Please post links to every variant of air quality metadata you can find, with egg or without. I'll summarise it on the wiki (unless César/Sara are interested in taking that on.)
> Daniel Nüst
> 52° North Initiative for Geospatial Open Source Software GmbH
> Martin-Luther-King-Weg 24
> 48155 Münster, Germany
> E-Mail: d.n...@52north.org
> Fon: +49-(0)-251–396371-36
> Fax: +49-(0)-251–396371-11

Cesar Garcia

unread,
May 21, 2012, 9:23:16 PM5/21/12
to airqua...@googlegroups.com, Daniel Nüst
Hi,

This OGC Sensor Observation Service also looks quite interesting for our purpose

http://www.opengeospatial.org/standards/sos

I'll check it further to see how could it fit
Best,
César

--
Cesar García - @elsatch

Ando con encolamiento para responder correos y los proceso lunes, miércoles y viernes. Si es algo urgente/rápido contáctame por Twitter. Gracias!

Daniel Nüst

unread,
May 22, 2012, 2:47:45 AM5/22/12
to airqua...@googlegroups.com, Cesar Garcia
Hi Cesar!

Am 22.05.2012 03:23, schrieb Cesar Garcia:
> This OGC Sensor Observation Service also looks quite interesting for our
> purpose
>
> http://www.opengeospatial.org/standards/sos

First things first: Disclaimer: I work as a researcher for the company
who develops a SOS (Open Source, of course!), see [0].

I agree it is quite interesting :-). Most useful features imho are the
filtering (spatial, temporal) and the available clients. In fact, we/I
plan to develop a cosm > SOS feeder to have the data available, but it
would be great if others could work with me on that.

So I guess you can understand why I am convinced the SOS can be useful
for Air Quality Egg data. (See also [1].) I hope I can convince others, too.

Please let me know if you would like to participate in that development!

Best regards,
Daniel

[0] http://52north.org/communities/sensorweb/sos/
[1] https://groups.google.com/d/msg/airqualityegg/BhRBQYeDqLc/P1ZPk-elbbUJ

>
> On Mon, May 21, 2012 at 1:53 PM, Martin Dittus <dek...@gmail.com
> <mailto:dek...@gmail.com>> wrote:
>
> Thank you very much for this. I think looking at such precedents
> will be well worth the time. I'm curious about any other such models
> people have seen.
>
> And yes, your suggestion sounds good: "a model that is partly fixed
> now to ensure a minimal set of information, but of course open for
> dynamic development and evolution", while keeping it simple enough
> so it remains accessible for moderate beginners.
>
> m.
>
>
>
> On 21 May 2012, at 10:26, Daniel N�st wrote:
>
> > Am 18.05.2012 15:00, schrieb Martin Dittus:
> >> On 17 May 2012, at 00:53, Chris Nafis wrote:
> >>> We could sort all the Feed Tags/units and come up with a
> suggested list?
> >>> The sooner we do it the better. I don't know how many live feeds
> there are, but I would bet they are growing fast.
> >>
> >> Please post links to every variant of air quality metadata you
> can find, with egg or without. I'll summarise it on the wiki (unless
> C�sar/Sara are interested in taking that on.)
> > Daniel N�st
> > 52� North Initiative for Geospatial Open Source Software GmbH
> > Martin-Luther-King-Weg 24
> > 48155 M�nster, Germany
> > E-Mail: d.n...@52north.org <mailto:d.n...@52north.org>
> > Fon: +49-(0)-251�396371-36 <tel:%2B49-%280%29-251%E2%80%93396371-36>
> > Fax: +49-(0)-251�396371-11 <tel:%2B49-%280%29-251%E2%80%93396371-11>
> > http://52north.org/
> > General Managers: Dr. Albert Remke, Dr. Andreas Wytzisk
> > Local Court Muenster HRB 10849
>
> --
> Cesar Garc�a - @elsatch
>
> Ando con encolamiento para responder correos y los proceso lunes,
> mi�rcoles y viernes. Si es algo urgente/r�pido cont�ctame por
> Twitter. Gracias!
>


--
Daniel N�st
52� North Initiative for Geospatial Open Source Software GmbH
Martin-Luther-King-Weg 24
48155 M�nster, Germany
E-Mail: d.n...@52north.org
Fon: +49-(0)-251�396371-36
Fax: +49-(0)-251�396371-11

Cesar Garcia

unread,
May 22, 2012, 4:37:41 AM5/22/12
to Daniel Nüst, airqua...@googlegroups.com
Hi,

This could solve one my questions about storing metadata. Maybe we could use machine readable tags for cosm feeds that could be later parsed on to create sensorML objects or SOS networks. Obviously that information must be stored in another repository. Is there any free service provider that could be used to store them? I'm thinking about wikimedia fundation or openknowlegde fundation datahub.

BTW, some people from the Ontology Engineering group from UPM will attend this afternoon session about Air Quality in Madrid. They work all day long with linked data, so I will also forward them these questions.

Best,
César

>     César/Sara are interested in taking that on.)
>     > Daniel Nüst
>     > 52° North Initiative for Geospatial Open Source Software GmbH
>     > Martin-Luther-King-Weg 24
>     > 48155 Münster, Germany
>     > E-Mail: d.n...@52north.org <mailto:d.n...@52north.org>
>     > Fon: +49-(0)-251–396371-36 <tel:%2B49-%280%29-251%E2%80%93396371-36>
>     > Fax: +49-(0)-251–396371-11 <tel:%2B49-%280%29-251%E2%80%93396371-11>
>     > http://52north.org/
>     > General Managers: Dr. Albert Remke, Dr. Andreas Wytzisk
>     > Local Court Muenster HRB 10849
>
>     --
>     Cesar García - @elsatch

>
>     Ando con encolamiento para responder correos y los proceso lunes,
>     miércoles y viernes. Si es algo urgente/rápido contáctame por
>     Twitter. Gracias!
>


--
Daniel Nüst
52° North Initiative for Geospatial Open Source Software GmbH
Martin-Luther-King-Weg 24

48155 Münster, Germany
E-Mail: d.n...@52north.org
Fon: +49-(0)-251–396371-36

http://52north.org/
General Managers: Dr. Albert Remke, Dr. Andreas Wytzisk
Local Court Muenster HRB 10849

Daniel Nüst

unread,
May 24, 2012, 2:20:17 AM5/24/12
to Cesar Garcia, airqua...@googlegroups.com
Am 22.05.2012 10:37, schrieb Cesar Garcia:
> This could solve one my questions about storing metadata. Maybe we could
> use machine readable tags for cosm feeds that could be later parsed on
> to create sensorML objects or SOS networks. Obviously that information
> must be stored in another repository. Is there any free service provider
> that could be used to store them? I'm thinking about wikimedia fundation
> or openknowlegde fundation datahub.

There is a specific sensor catalogue that handles SensorML called Sensor
Instance Registry, which we developed (see different clients here [1]
and here [2]). Do you mean something like that?

Arguably, it is all XMLSchema based now and not extremely attractive for
hackers/mashups, but it is ongoing research and testing Linked Data or
JSON output has been discussed for a while.

If you mean CKAN, that looks great, a very accessible to users and bulk
downloads of data. Do you know how well they support real-time datasets
such as the AirQualityEgg?

> BTW, some people from the Ontology Engineering group from UPM will
> attend this afternoon session about Air Quality in Madrid. They work all
> day long with linked data, so I will also forward them these questions.

Yes, good thinking. We should definitely explore modelling the data as
linked data.

Should we start a wiki page and collect the information from this
thread? Just to make all the ideas more readily understandable for others.

So my suggestion here would be to have a list of machine readable tags
in cosm that can than be mapped to SensorML, JSON encodings and linked
data formats (e.g. [3]). The sooner this list is available, the earlier
we can prepare mappings to different outputs so we have some idea of
metadata as soon as the eggs actually hit the shelves.


/Daniel

[1] http://geoviqua.dev.52north.org/SIR/
[2] http://geoviqua.dev.52north.org/SIR/client.jsp
[3] http://www.w3.org/2005/Incubator/ssn/wiki/Semantic_Sensor_Net_Ontology
> > <mailto:dek...@gmail.com <mailto:dek...@gmail.com>>> wrote:
> >
> > Thank you very much for this. I think looking at such precedents
> > will be well worth the time. I'm curious about any other such
> models
> > people have seen.
> >
> > And yes, your suggestion sounds good: "a model that is partly
> fixed
> > now to ensure a minimal set of information, but of course open for
> > dynamic development and evolution", while keeping it simple enough
> > so it remains accessible for moderate beginners.
> >
> > m.
> >
> >
> >
> > On 21 May 2012, at 10:26, Daniel N�st wrote:
> >
> > > Am 18.05.2012 15:00, schrieb Martin Dittus:
> > >> On 17 May 2012, at 00:53, Chris Nafis wrote:
> > >>> We could sort all the Feed Tags/units and come up with a
> > suggested list?
> > >>> The sooner we do it the better. I don't know how many live
> feeds
> > there are, but I would bet they are growing fast.
> > >>
> > >> Please post links to every variant of air quality metadata you
> > can find, with egg or without. I'll summarise it on the wiki
> (unless
> > C�sar/Sara are interested in taking that on.)
Daniel N�st
52� North Initiative for Geospatial Open Source Software GmbH
Martin-Luther-King-Weg 24
48155 M�nster, Germany
E-Mail: d.n...@52north.org
Fon: +49-(0)-251�396371-36
Fax: +49-(0)-251�396371-11

Cesar Garcia

unread,
May 24, 2012, 3:51:15 AM5/24/12
to Daniel Nüst, airqua...@googlegroups.com

Mapping sensorML to machine tags in Cosm that could be prepopulated would be great. In fact , after thinking about solutions web basically arrived to the same idea :)

> On Tue, May 22, 2012 at 8:47 AM, Daniel Nüst <d.n...@52north.org
>     >     César/Sara are interested in taking that on.)
Daniel Nüst
52° North Initiative for Geospatial Open Source Software GmbH
Martin-Luther-King-Weg 24

48155 Münster, Germany
E-Mail: d.n...@52north.org
Fon: +49-(0)-251–396371-36

Martin Dittus

unread,
May 30, 2012, 8:01:58 AM5/30/12
to airqua...@googlegroups.com

On 24 May 2012, at 07:20, Daniel Nüst wrote:
>
> Should we start a wiki page and collect the information from this
> thread? Just to make all the ideas more readily understandable for others.

A thousand times yes :) Even a simple starting point on metadata needs and initial concepts would be excellent.

(Generally I think that good & living documentation goes a long way in fostering community and collaboration. A wiki is a good mechanism for inviting basic contributions.)

m.

Cesar Garcia

unread,
Jul 15, 2012, 10:28:15 PM7/15/12
to airqua...@googlegroups.com
After investigating a bit, our plan is to map Cosm machine readable
tags to W3C SSN (http://www.w3.org/2005/Incubator/ssn/) instead of
SensorML. Any thoughts about this?

There seems to be lots of information regarding this topic:

http://www.planet-data.eu/sites/default/files/pr-material/deliverables/D1.1_Characterisation_mechanics_for_unknown_data_sources.pdf

Examples and comparison:
http://wiki.planet-data.eu/web/PlanetDataVocabulary

Final report for W3C SSN:
http://www.w3.org/2005/Incubator/ssn/XGR-ssn-20110628/

We will be having a big metadata "party" next wednesday at
Medialab-Prado (Intermediae), Madrid. Anyone wants to join in?
http://www.meetup.com/iotmadrid/events/73022172/

Best,
César
Reply all
Reply to author
Forward
0 new messages