Anyone interested in building a pattern recognition system for the air quality data?

178 views
Skip to first unread message

chduke

unread,
Apr 15, 2012, 1:02:34 PM4/15/12
to airqualityegg
Hi all,

Based on the number of sensors (and data) that will be available and
the fact that data will be 'raw' (uncalibrated, etc.) this looks like
a very interesting project and challenge for applying pattern
recognition techniques in order to either identify patterns (based on
location of eggs, sensors used, etc.) and/or classify air quality
(based on user estimation, correlation with other data, etc.).

Apart from collecting the data this project would need some additional
information from users (such as air quality estimation, etc.) and a
mechanism for analyzing the data. It could be implemented as part of
the back-end infrastructure of Pachube or completely as an external
application. The outcome could be used to provide an automated
estimation of air quality, show what sensor data have the highest
impact on air quality etc. It could also be used by the scientific
community to perform further experiments in the domain since data will
be somewhat structured and annotated with air quality information.

I am wondering how many people would be interested in participating in
such a projects (web developers, data contributors, etc.).

Matthew Dance

unread,
Apr 15, 2012, 1:43:16 PM4/15/12
to airqua...@googlegroups.com

Count me in!

matthew dance | 780.554.9222
sent from my mobile device

Sergey Vlasov

unread,
Apr 15, 2012, 1:48:12 PM4/15/12
to airqua...@googlegroups.com
It is a very intresting subject for me.
I'd like to point there are two related parts:
1) Pattern discovery from large historical data set.
2) Near-real time pattern detection.
I have some expertise in the pattern detection and will be happy to contribute.

Sergey

Joseph Saavedra

unread,
Apr 15, 2012, 2:08:19 PM4/15/12
to airqua...@googlegroups.com
absolutely. 

there's a ton of work to be done in this space -- and is the number one reason we are including temperature and humidity *on the board*.

those are measurements you can get from any web based weather service, but both are big factors in the data coming from metal oxide sensors, so by including them on the board, we can have precise temp/humidity measurements to then consider when analyzing the data coming from any of the metal oxide gas sensors of any given unit.

I think the best approach would be to have Pachube holding all raw, un-processed data. We can build apps using Pachube's API that in real-time use algorithms we develop to include the temp/humidity data in processing the data and then output refined datastreams.  We could then even push those back to Pachube, or build our own apps to display it where/in the format we like.

this is the most exciting part of the open data and community aspects of the project, so glad we're already thinking in this vein!!

joe

Martin Dittus

unread,
Apr 15, 2012, 4:24:35 PM4/15/12
to airqua...@googlegroups.com
I'm very interested in any data modelling projects people will start with this; not necessarily even going as far as detecting patterns, even just building environmental models under consideration of uncertainty factors would already be interesting. E.g. with the aim of optimising the sensor network structure (knowing where best to place new sensors for best coverage), or as a starting point for comparative studies with related data sets.

m.

Ed Borden

unread,
Apr 15, 2012, 5:16:45 PM4/15/12
to airqua...@googlegroups.com
This is super-important. How can we help move this forward, I wonder?
Where is everyone on this thread located? Is this the type of thing
where an in-person session might be beneficial? Maybe I can help
facilitating bringing us all together to work on this.

Cesar Garcia

unread,
Apr 15, 2012, 5:34:51 PM4/15/12
to airqua...@googlegroups.com
Hi everyone,

Sara and I, have been talking extensively in Madrid about the need to tag data in a proper way to make it meaningful. (Among other hundreds of things, lately). We've been thinking about the following tags:

-In/Out: Is it located in the outer part of the house near a street or to the inner part near a yard.
-Road/Pedestrian: Is it near a street/road or near a pedestrian area.
-Floor: Which floor is the sensor located it? For some official measurements,  it's is necessary for the sensor to be 4 meters over the ground level

Other data that may be needed to track is:
-Longitude, Latitude (already tracked in Pachube). Maybe we could use a web service to add altitude if it's needed.
-AQE Versión: As versions evolve, it could be useful to know the version people are using.
-Available Sensors: Being a modular platform, each datastream will hold data for a single component, but it might be useful to be able to filter using tags. (AQE + Radiation sensor to get all egg equiped with radiation sensors only).
-AQE serial number: If sensor is moved, it could be interesting to track measures in other areas using exactly the same equipment.

We are also thinking about taking photos of "best" installation options, to try to get consistent placements. We are planning to create some manuals to explain and cover all this options, so people get knowledgable about their AQE.

What do you think about these tags? What else would you add?

Best.
César

--
Cesar García - @elsatch

Ando con encolamiento para responder correos y los proceso lunes, miércoles y viernes. Si es algo urgente/rápido contáctame por Twitter. Gracias!

Usman Haque

unread,
Apr 15, 2012, 5:37:44 PM4/15/12
to airqua...@googlegroups.com
cesar
i would recommend using machine tags, perhaps with the 'aqe' namespace. that way it's continually extensible and can be defined by the community as a whole. here's something i wrote a long while back which explains why and how machine tags are useful: http://community.pachube.com/node/542
usman

Cesar Garcia wrote:
> Hi everyone,
>
> Sara and I, have been talking extensively in Madrid about the need to tag data in a proper way to make it meaningful. (Among other hundreds of things, lately). We've been thinking about the following tags:
>
> -In/Out: Is it located in the outer part of the house near a street or to the inner part near a yard.
> -Road/Pedestrian: Is it near a street/road or near a pedestrian area.
> -Floor: Which floor is the sensor located it? For some official measurements, it's is necessary for the sensor to be 4 meters over the ground level
>
> Other data that may be needed to track is:
> -Longitude, Latitude (already tracked in Pachube). Maybe we could use a web service to add altitude if it's needed.

> -AQE Versi�n: As versions evolve, it could be useful to know the version people are using.


> -Available Sensors: Being a modular platform, each datastream will hold data for a single component, but it might be useful to be able to filter using tags. (AQE + Radiation sensor to get all egg equiped with radiation sensors only).
> -AQE serial number: If sensor is moved, it could be interesting to track measures in other areas using exactly the same equipment.
>
> We are also thinking about taking photos of "best" installation options, to try to get consistent placements. We are planning to create some manuals to explain and cover all this options, so people get knowledgable about their AQE.
>
> What do you think about these tags? What else would you add?
>
> Best.

> C�sar


>
> On Sun, Apr 15, 2012 at 11:16 PM, Ed Borden <borden...@gmail.com <mailto:borden...@gmail.com>> wrote:
>
> This is super-important. How can we help move this forward, I wonder?
> Where is everyone on this thread located? Is this the type of thing
> where an in-person session might be beneficial? Maybe I can help
> facilitating bringing us all together to work on this.
>

> On Sun, Apr 15, 2012 at 4:24 PM, Martin Dittus <dek...@gmail.com <mailto:dek...@gmail.com>> wrote:
> > I'm very interested in any data modelling projects people will start with this; not necessarily even going as far as detecting patterns, even just building environmental models under consideration of uncertainty factors would already be interesting. E.g. with the aim of optimising the sensor network structure (knowing where best to place new sensors for best coverage), or as a starting point for comparative studies with related data sets.
> >
> > m.
> >
> >
> > On 15 Apr 2012, at 18:02, chduke wrote:
> >
> > > Hi all,
> > >
> > > Based on the number of sensors (and data) that will be available and
> > > the fact that data will be 'raw' (uncalibrated, etc.) this looks like
> > > a very interesting project and challenge for applying pattern
> > > recognition techniques in order to either identify patterns (based on
> > > location of eggs, sensors used, etc.) and/or classify air quality
> > > (based on user estimation, correlation with other data, etc.).
> > >
> > > Apart from collecting the data this project would need some additional
> > > information from users (such as air quality estimation, etc.) and a
> > > mechanism for analyzing the data. It could be implemented as part of
> > > the back-end infrastructure of Pachube or completely as an external
> > > application. The outcome could be used to provide an automated
> > > estimation of air quality, show what sensor data have the highest
> > > impact on air quality etc. It could also be used by the scientific
> > > community to perform further experiments in the domain since data will
> > > be somewhat structured and annotated with air quality information.
> > >
> > > I am wondering how many people would be interested in participating in
> > > such a projects (web developers, data contributors, etc.).
> > >
> >
>
> --

> Cesar Garc�a - @elsatch
>
> Ando con encolamiento para responder correos y los proceso lunes, mi�rcoles y viernes. Si es algo urgente/r�pido cont�ctame por Twitter. Gracias!
>

--

.....................................................
Usman Haque
http://www.pachube.com/

3 Scrutton Street
London EC2A 4HF
United Kingdom
Direct: +44 20 3441 1343
Mobile: +44 7796 507 162

Martin Dittus

unread,
Apr 15, 2012, 5:40:08 PM4/15/12
to airqua...@googlegroups.com
I'm currently preparing a dissertation on the topic, I'm very interested :)

(I'm in London, as you know.)

m.

Simone Cortesi

unread,
Apr 15, 2012, 5:50:14 PM4/15/12
to airqua...@googlegroups.com
Count me in too....

-Simone.

--
-S

Cesar Garcia

unread,
Apr 15, 2012, 6:00:09 PM4/15/12
to airqua...@googlegroups.com
Hi Usman,

Thanks for the tip! In fact, you're our original inspiration to use them, as you presented them to us at Visualizar workshop at Medialab Prado last year, and we took good note of it. Machine tags are our goal, but I preferred to keep this first explanation in plain words. Once we agree about what data is needed, we should move to a more technical/operative definition. Ed, met some people in Madrid, that are working at Open Ontology Group at the Politecnical University specialized in Linked Data/Semantic Web. Maybe they would like to join this conversation!  

Best,
César

On Sun, Apr 15, 2012 at 11:37 PM, Usman Haque <us...@pachube.com> wrote:
cesar
i would recommend using machine tags, perhaps with the 'aqe' namespace. that way it's continually extensible and can be defined by the community as a whole. here's something i wrote a long while back which explains why and how machine tags are useful: http://community.pachube.com/node/542
usman





Cesar Garcia wrote:
Hi everyone,

Sara and I, have been talking extensively in Madrid about the need to tag data in a proper way to make it meaningful. (Among other hundreds of things, lately). We've been thinking about the following tags:

-In/Out: Is it located in the outer part of the house near a street or to the inner part near a yard.
-Road/Pedestrian: Is it near a street/road or near a pedestrian area.
-Floor: Which floor is the sensor located it? For some official measurements, it's is necessary for the sensor to be 4 meters over the ground level

Other data that may be needed to track is:
-Longitude, Latitude (already tracked in Pachube). Maybe we could use a web service to add altitude if it's needed.
-AQE Versión: As versions evolve, it could be useful to know the version people are using.

-Available Sensors: Being a modular platform, each datastream will hold data for a single component, but it might be useful to be able to filter using tags. (AQE + Radiation sensor to get all egg equiped with radiation sensors only).
-AQE serial number: If sensor is moved, it could be interesting to track measures in other areas using exactly the same equipment.

We are also thinking about taking photos of "best" installation options, to try to get consistent placements. We are planning to create some manuals to explain and cover all this options, so people get knowledgable about their AQE.

What do you think about these tags? What else would you add?

Best.
César

Cesar García - @elsatch

Ando con encolamiento para responder correos y los proceso lunes, miércoles y viernes. Si es algo urgente/rápido contáctame por Twitter. Gracias!


--

.....................................................
Usman Haque
http://www.pachube.com/

3 Scrutton Street
London EC2A 4HF
United Kingdom
Direct: +44 20 3441 1343
Mobile:

chduke

unread,
Apr 15, 2012, 6:11:23 PM4/15/12
to airqualityegg
Hello guys,

Thank you for the responses. It's really exciting to see such a
positive response and people willing to contribute! It is also nice to
have the support and collaboration of Ed and Usman.

I have some very specific and detailed ideas on how to annotate and
process the data in order to a) provide a continuously updated air
quality dataset to the scientific community for further experimenting
with air quality and pattern recognition b) provide a mechanism to
users that can generate estimations about the air quality from raw
data, and potentially make predictions, automatically group sensor
data, etc.

I am happy to share my ideas about how to build the essential back-end
mechanism that will perform the aforementioned and how it can be
integrated with Pachube or any other platform and also to become part
of the team that will develop/integrate it. I can also prepare a small
demo that will demonstrate the overall idea using a very well
established data classification engine.

I leave the communication and the 'bring people together' process to
Ed. By the way, I am located in Athens, Greece.

Charalampos

Sergey Vlasov

unread,
Apr 15, 2012, 9:13:35 PM4/15/12
to airqua...@googlegroups.com
Hi César,

Different sensors have a different lifespan. I think it is important
to add meta data about sensor (when it was calibrated, date of
deployment, maintenance, etc), especially ones with a short lifespan.
I'm from Montreal, Quebec, Canada.

Sergey

Daniel Nüst

unread,
Apr 16, 2012, 4:30:37 AM4/16/12
to airqua...@googlegroups.com
Hi!

I am interested in the pattern recognition aspect, too.

Especially the in combination with "official" data streams. I'll gladly
discuss this aspect and would appreciate pointers to previous
discussions if there have been any.

Am 16.04.2012 03:13, schrieb Sergey Vlasov:
> Different sensors have a different lifespan. I think it is important
> to add meta data about sensor (when it was calibrated, date of
> deployment, maintenance, etc), especially ones with a short lifespan.

I am involved in sensor metadata handling a bit (non-Pachube only, for
now!) and this is very interesting for me, too.

If I chance the temperature sensor on my egg, I would probably add it as
a new feed and stop the old one, right? If we want to do longer term
pattern recognition, would we need a mechanism to connect those
different feeds?


Best regards,
Daniel

> On Sun, Apr 15, 2012 at 5:34 PM, Cesar Garcia <cesarga...@gmail.com> wrote:
>> Hi everyone,
>>
>> Sara and I, have been talking extensively in Madrid about the need to tag
>> data in a proper way to make it meaningful. (Among other hundreds of things,
>> lately). We've been thinking about the following tags:
>>
>> -In/Out: Is it located in the outer part of the house near a street or to
>> the inner part near a yard.
>> -Road/Pedestrian: Is it near a street/road or near a pedestrian area.
>> -Floor: Which floor is the sensor located it? For some official
>> measurements, it's is necessary for the sensor to be 4 meters over the
>> ground level
>>
>> Other data that may be needed to track is:
>> -Longitude, Latitude (already tracked in Pachube). Maybe we could use a web
>> service to add altitude if it's needed.

>> -AQE Versi�n: As versions evolve, it could be useful to know the version


>> people are using.
>> -Available Sensors: Being a modular platform, each datastream will hold data
>> for a single component, but it might be useful to be able to filter using
>> tags. (AQE + Radiation sensor to get all egg equiped with radiation sensors
>> only).
>> -AQE serial number: If sensor is moved, it could be interesting to track
>> measures in other areas using exactly the same equipment.
>>
>> We are also thinking about taking photos of "best" installation options, to
>> try to get consistent placements. We are planning to create some manuals to
>> explain and cover all this options, so people get knowledgable about their
>> AQE.
>>
>> What do you think about these tags? What else would you add?
>>
>> Best.

>> C�sar

>>> Cesar Garc�a - @elsatch


>>>
>>> Ando con encolamiento para responder correos y los proceso lunes,

>>> mi�rcoles y viernes. Si es algo urgente/r�pido cont�ctame por Twitter.
>>> Gracias!
>>>
>>
>


--
Daniel N�st
52� North Initiative for Geospatial Open Source Software GmbH
Martin-Luther-King-Weg 24
48155 M�nster, Germany
E-Mail: d.n...@52north.org
Fon: +49-(0)-251�396371-36
Fax: +49-(0)-251�396371-11
http://52north.org/
General Managers: Dr. Albert Remke, Dr. Andreas Wytzisk
Local Court Muenster HRB 10849

Daniel Nüst

unread,
Apr 16, 2012, 4:34:55 AM4/16/12
to airqua...@googlegroups.com
Am 16.04.2012 10:30, schrieb Daniel N�st:
> Hi!
>
> I am interested in the pattern recognition aspect, too.
>
> Especially the in combination with "official" data streams. I'll gladly
> discuss this aspect and would appreciate pointers to previous
> discussions if there have been any.
>
> Am 16.04.2012 03:13, schrieb Sergey Vlasov:
>> Different sensors have a different lifespan. I think it is important
>> to add meta data about sensor (when it was calibrated, date of
>> deployment, maintenance, etc), especially ones with a short lifespan.
>
> I am involved in sensor metadata handling a bit (non-Pachube only, for
> now!) and this is very interesting for me, too.
>
> If I chance the temperature sensor on my egg, I would probably add it as
> a new feed and stop the old one, right? If we want to do longer term
> pattern recognition, would we need a mechanism to connect those
> different feeds?

Correction: I would add it as a new datastream.
Extension: Do we really want multiple devices over time as input?

Apologies for the many questions, just trying to find the issues from a
pattern recognition perspective and where I can contribute.

chduke

unread,
Apr 16, 2012, 7:44:52 AM4/16/12
to airqualityegg
I can see that many issues arise, that need to be clarified/defined
before we start designing such a system, this is good actually.

@Sergey: Initially I think we should focus on collecting data,
building and evaluating a train model and then provide a mechanism for
real time data classification.

@Cesar: Very good points and tag suggestions! These actually could
make perfect class candidates for data classification as well.
My initial idea about 'contextual' tags would be a user tag for
manually (whenever possible) defining that the air quality feels like.
Since there is no (direct and easy) way to determine automatically
quality from the raw data, in the very beginning we could retrieve
such information from user's estimation (like good, bad, etc.). Such
an approach is highly objective (and can be easily biased) but I think
given the (hopefully) great number of users and data a valid
estimation model could be built.
This does not contradict with Usman's suggestion about machine tags.
Data from sensors should be annotated with machine tags, but for the
pattern recognition a 'user' tag could also be useful until we develop
a good classification model.

Some more questions (and my answers) from a data classification
perspective:
1) Data needs to be identified/annotated in the same way: simple
solution -> use machine tags, advanced solution -> use an ontology
2) Eggs (will) have different sensors -> different streams of data.
Most likely though the basic versions will carry the minimum standard
sensors (temperature, humidity, CO2?, etc.).
So, how do you deal with different data streams? We group them based
on sensors used and define/use them as separate data sets? Could be
the initial approach for collecting data, performing basic analysis
and then move on to something more advanced (like feature selection,
etc.)
3) Do we collect data from Pachube and make an initial analysis
offline? Do we implement a platform for classification that integrates
with Pachube? (Ed, Usman please give ideas!)

I've counted 8 people (counting myself) interested so far in
participating. The number sounds already good, we can still wait for
some more and from feedback from Ed/Usman on how to proceed.

Cesar Garcia

unread,
Apr 16, 2012, 8:46:35 AM4/16/12
to airqua...@googlegroups.com
Hi,

Regarding data that is created by the Egg upon initial setup, here is my AQE data from London workshop:


There you can see what datastreams are created, and other related metadata. In our setup we had fixed temp/humidity sensors (DHT22) and CO (MQ-7) and e2v MiCS-2170 NO2 sensor. I expect Joe could bring some light regarding changes being made to the original prototipe to accomodate new sensors, but I suspect data structure will be similar.

Best,
César

P.D As soon as I have time to get it into a Tupperware, I'll be submitting data again from Madrid :)



--
Cesar García - @elsatch

Ando con encolamiento para responder correos y los proceso lunes, miércoles y viernes. Si es algo urgente/rápido contáctame por Twitter. Gracias!

Félix Pedrera García

unread,
Apr 16, 2012, 9:34:38 AM4/16/12
to airqua...@googlegroups.com
Hi there,

Count me in.

Like Daniel, I am specifically interested in how official data and AQE data can be combined, but for this to be meaningful I also believe some work needs to be done regarding official data aggregation.

At least in Spain, the information is scattered on the websites of the different autonomous regions and municipalities, in different formats and with different detail level. For some cities happily historical data is available in friendly formats (CSV), for hourly and daily averages. For most of them, HTML pages and request forms are the only way to get some data, and even if you manage to get them, you have to deal with the variety of formats. Attending W3C 5 stars open data criteria for up to date spanish air quality data, the outlook is bleak.

It seems the only way to aggregate this information is by scraping the data available on the websites and transform it to a common format, in order to be able to analyze it globally and not just per municipality. This is the way I'm told some air quality scientists deal with the official data. Maybe others find smarter ways.

Fortunately, at least for the UE, historical data from 1969 to 2010 is available for every official air quality station [1]. Only 1 year and four months of data needs to be aggregated in order to have an up to date (european) global air quality dataset to play with and combine with the AEQ data.

To develop an scraper for every air quality local site without a full open data policy is an enormous challenge, but I believe it's possible with tools like ScraperWiki [2] or GitHub [3], which catalyzes collaboration, and the work can be distributed into local groups.

Once the data is aggregated and in a common format (CSV, JSON, whatever), it could be pushed to Pachube, combined with data pulled from the AQE on Pachube, analyzed and visualized with tools like the Kings College's OpenAir R toolkit [4] or Vizzuality's CartoDB [5] and let citizens tell stories about the air we breathe, that is what is all about.

They exist some aggregation sites like AirQualityNow [6] and AirNow [7]. It seems for the US data availability is better. For the UE it is incomplete and with no API or data access for up to date data, as far as I know.

What's the situation regarding official air quality data openness in your country?

Kind regards,

--
Félix.

[1] http://www.eea.europa.eu/data-and-maps/data/airbase-the-european-air-quality-database-6
[2] https://scraperwiki.com/
[3] http://www.github.com
[4] http://www.openair-project.org/
[5] http://cartodb.com/
[6] http://www.airqualitynow.eu/
[7] http://airnow.gov/




2012/4/16 chduke <ch.d...@gmail.com>

chduke

unread,
Apr 16, 2012, 10:35:03 AM4/16/12
to airqualityegg
@César: Thank you for the info about the initial sensor setup!

@Félix: Good points and great information and links, thank you!

I think we all understand the complexity and effort needed for
collecting official AQ data and correlating the latter with AQE Data.
My suggestion is still to begin with asking users to provide such
information (either based on what they feel or based on manually
tracking official information), or a similar mechanism, then utilize
the data from users and Pachube to evaluate classification models and
then move on with official data acquisition/processing for improving
the accuracy of the results, etc.

What do you think?

Martin Dittus

unread,
Apr 17, 2012, 8:36:34 AM4/17/12
to airqua...@googlegroups.com
Lots of good ideas, and good to see that everyone's interests cover most of the core areas.

One thing that hasn't been brought up yet is the crucial step between measurement an analysis: data cleaning/preparation. Cf the "Calibration" thread and Ed's "the network calibrates itself" vision, and loads of suggestions others made there.

The outcomes of that discussion should have a direct impact on the opportunities and limitations of your pattern detection aims.

m.

chduke

unread,
Apr 17, 2012, 11:36:40 AM4/17/12
to airqualityegg
Hello Martin,

indeed data preparation is a critical phase between data collection
and analysis. Most common issues are missing data, data noise, wrong/
irrelevant data, etc. Hopefully we will able to apply some well
established data filtering techniques and resolve such issues.
Solutions (as well as problems) will come by first collecting and
analyzing data. In the long run, pattern recognition might not have
any effect at all, but at least there will be a great repository of
annotated air quality data.

NeilH

unread,
Apr 17, 2012, 12:24:22 PM4/17/12
to airqua...@googlegroups.com

Hi all, 
I've had some real world experience with a few sensors, and coaxed a number to live up to the claims of their manufacturers, 
I've taken them through their birthing tantrums, 
introducing them to their big brother electronics, 
local cleanup filtering - narrow band pass filtering removing  50/60Hz
analog to digital conversion anti-aliasing checks,
temperature range adjustments.

One of the most common problems is flat lining. 
The measurement output flat lines - the raw data may be there, but it can't be seen, and the amplified out  goes to one extreme or the other.
Coaxing them into defining a usable range in a reliable way is a challenge, especially when targeting the most interesting areas.

Once they are working then all sorts of things can happen,
One simple circuit I bought had interference from the radio - for 1 in  ??? readings the radio received something while taking a measurement and the sw responded with an ACK, it took the voltage rail down and caused a blip in a temperature measurement. The battery being used had too high an impedance for the radio. 
Sensors can detoriate - get covered in gunk (bacteria, particles ?), get wet - and flat line, or even worse - output a merry jig - probably not a random number generator, almost realistic - but utterly false readings.

Then again for sensitive sensors (all bridges) there is standard electronic noise  and there is 1/f noise - the electronics have to be chosen to mitigate against it.
There is ADC noise, and making sure the front end process is correctly anti-aliasing the incoming digital signal. That is the analog signal is varying faster than the digitizing process it is designed to cope with, and it effectively produces sampling errors.

One of the most fun to chase down is RF breakthrough - a taxi (or other high watt mobile radio RF source) goes by. 
The effect is akin to this -think you're trying to listen for a cricket to chirp with a sensitive microphone, and a motorbike with its muffler off roars by. 
The signal goes all over the place, hits its rails - was it real or was it an anomaly. 
Taxi arrivals are a known unpredictable occurrence in most urban settings. They don't occur as predictably in other settings, so a sensor that works one place may have issues somewhere else. 
Its utterly pointless to add after manufacturing protection (copper foil round sensitive areas) - the circuits have to be redesigned.

Then there are standard design issues, that don't interefer with "a signal" in the data that maybe can be filtered out later - signal offsets, signal drift.

Just thought I would share a bit of experience from the trenches of hot sensors  translating raw data to polite readings.

cheers

Gustavo Olivares

unread,
Apr 17, 2012, 6:57:04 PM4/17/12
to airqua...@googlegroups.com
Indeed the quality assurance of the data is very important but this is where I think that we need to leverage on the numbers. Ed's vision of "self-calibrating network" is what I think we should be aiming for.
There are many mathematical techniques to "assimilate" data and lately the concept of "crowdsource" is a popular buzzword that has been applied to weather networks and I think it is ripe to start applying it to other fields.
I'm not an specialist on the different techniques but I do "know people" ;-) and it is a fertile research area so there is something to be gained trying to involve universities through their mathematics and earth sciences departments (course projects, theses, dissertations ... etc).
Anyone with a better handle on kalman filters and the like?

/El Gus

chduke

unread,
Apr 18, 2012, 6:32:47 AM4/18/12
to airqualityegg
Yes, this is exactly what I have in mind for data pre-processing, I
just did not want to mention yet such terms (most likely unknown to
people). I do have experience with data filtering and implementation
of Kalman filters and will explore such options when having some data.

chduke

unread,
Apr 19, 2012, 3:55:01 AM4/19/12
to airqualityegg
Hello guys,

I have made a simple survey so that we can track people who want to
contribute. Feel free to disseminate, statistics are public (except
from the contact and information about you), at the end we will
deliver the information to Pachube and we can all start working
together.

http://www.buildinginternetofthings.com/survey/index.php?sid=48474&lang=en

Ed Borden

unread,
Apr 25, 2012, 4:32:00 PM4/25/12
to airqualityegg
We talked quite a bit about this at the recent EcoHackNYC. I took
some video in the middle of the conversation.

https://vimeo.com/40867990

Conversation at Parsons DT Lab in NYC 4/21/12 with:
#Sensemakers: @jmsaavedra, @lpercifield, @edborden
Airnow.gov: @timsdye
Univ Colorado: Ricardo Piedrahita
@HabitatMap: Michael Heimbinder
> http://www.buildinginternetofthings.com/survey/index.php?sid=48474&la...

chduke

unread,
Apr 26, 2012, 1:42:24 PM4/26/12
to airqualityegg
Hello Ed,

Thanks for the vid!

I have made some initial analysis on some of the egg data found on
Pachube:

http://blog.buildinginternetofthings.com/2012/04/26/pattern-recognition-for-the-air-quality-egg-part-two/

Ed Borden

unread,
Apr 26, 2012, 3:46:58 PM4/26/12
to airqua...@googlegroups.com
This is great! First time we've had analysis done on real Egg data.

Can you reiterate your conclusions here in layman's terms?

chduke

unread,
Apr 27, 2012, 2:34:13 AM4/27/12
to airqualityegg
Well, the most important conclusion is that more data is needed! and
hopefully will arrive soon! :-)

So far, based on the feeds from the 2 eggs, the data from the air
quality sensor seem to be categorizable into 3 different groups
(whatever that could mean). Also, it looks like there have been no
significant changes in NO2 and CO sensor readings that can cause
important changes on the AQ sensor readings. Humidity seems to have
affected the sensor mostly generating the 3 different value ranges.


Pattern analysis can give an answer to the famous calibration issue:
once we have the adequate data, we could easily identify the range of
sensor readings that have an impact on the sensed (by both sensors and
humans) air quality.

On Apr 26, 10:46 pm, Ed Borden <borden.edw...@gmail.com> wrote:
> This is great!  First time we've had analysis done on real Egg data.
>
> Can you reiterate your conclusions here in layman's terms?
>
>
>
>
>
>
>
> On Thu, Apr 26, 2012 at 1:42 PM, chduke <ch.dou...@gmail.com> wrote:
> > Hello Ed,
>
> > Thanks for the vid!
>
> > I have made some initial analysis on some of the egg data found on
> > Pachube:
>
> >http://blog.buildinginternetofthings.com/2012/04/26/pattern-recogniti...

JP de Vooght

unread,
Apr 27, 2012, 5:08:53 AM4/27/12
to airqua...@googlegroups.com
Hello @elsatch! all!

I have been tinkering with some sensor data and R libraries and wanted to share here a couple of scripts I use to fetch data from Pachube. I based myself on work done for the WorldBank open data (WDI).

library(rjson)
library(RCurl)

Pachube.query <- function(feed='504')
{
  json<-getURL(
    paste('http://api.pachube.com/v2/feeds/',feed,sep=''),
    netrc='optional')
  data<-fromJSON(json)
  if (data$private == "true" || !is.null(data$error)) stop(data)
  return (sapply(data$datastreams,"[[",i="id"))
}

Pachube.fetch <- function(feed='504',datastream='0',hours=6)
{
  max_range=c(0,6,12,24,5*24,14*24,31*24,90*24,180*24,365*24,365*24)
  max_value=c(0,30,60,300,900,3600,10800,21600,43200,84600)
  interval=max_value[max(which(max_range<hours))]
  start=format(Sys.time()-hours*3600+60,"%Y-%m-%dT%H:%M:%OSZ",tz="UTC")
  pages=ceiling(hours*3600/max(1,interval)/1000)
  res=matrix(nrow=0,ncol=2)
  for (p in 1:pages) {
    json<-getURL(paste(
      '/datastreams/',datastream,
      '?start=',start,
      '&page=',p,
      '&per_page=1000',
      '&interval=',interval,
      sep=''),netrc='optional')
    data<-fromJSON(json)
    if (!is.null(data$error)) stop(data)
    if (length(data$datapoints)==0) break
    m<-matrix(unlist(data$datapoints),ncol=2,byrow=T)
    res<-rbind(res,m)
  }
  return (data.frame(
    ts=strptime(res[,2],"%Y-%m-%dT%H:%M:%OS",tz="UTC"),
    val=as.numeric(res[,1])))
}
Pachube.R

Félix Pedrera García

unread,
Apr 27, 2012, 5:15:15 AM4/27/12
to airqua...@googlegroups.com
Hi all!

It would be great to extend the OpenAir open source toolkit to read data directly from Pachube and use its specific air quality functions.

It already reads data directly from some London stations. 

And the source code is hosted at Rforge: http://r-forge.r-project.org/projects/openair/

Regards, 

--
Félix.



-- 
Félix

Adjuntos:
- Pachube.R

Nafis

unread,
Apr 27, 2012, 11:04:22 AM4/27/12
to airqualityegg
We should look at what has been done in the weatherstation arena. CWOP
(Citizen Weather Observer Program http://wxqa.com/) has been going on
for quite a while. Phil Gladstone set-up a Weather Quality Reporter
(http://weather.gladstonefamily.net/). It uses surrounding stations to
look for outliers. For example here is the report from my old 1-wire
weatherstation (http://weather.gladstonefamily.net/site/C3725).

Pachube is nice for storing raw data. I just wish there were more
filters available (and when you make a historical data request, you
could get the full start/end date range instead of the maximum 1000
records :-).

It would be nice if Weather Underground or CWOP could add additional
air quality fields. Maybe this would happen if we could get
weatherstation vendors such as Davis to accept/sell additional
sensors. I wonder how hard it would be to interface some of the Egg
sensors to a Vantage Pro2? Hmmm.....

In my mind I think weatherstations and air quality hardware should be
more integrated. Of course it is a "chicken and EGG" problem... should
the EGG get data from a weatherstation, should the weatherstation get
data from the egg, or should we let an Internet App like Pachube
integrate the data :-)

I suppose in the short-term we can scarf our weather data back off
Weather Underground and populate Pachube.

chduke

unread,
Apr 27, 2012, 5:26:22 PM4/27/12
to airqualityegg
Hello Nafis,

interesting points!

My thoughts:
1) Eggs are sensory systems, they should stick to sensing data
2) Pachube is great for storing data on the Cloud and providing an API
for external applications (My vote also for the data history limit)
3) There is a lot that can be done on the collected data, by experts
on air quality, by data analysts, etc. Let them build the applications
that will analyze data, share them, combine them with other data, do
more stuff, ... Pachube can act as a gateway.

On Apr 27, 6:04 pm, Nafis <na...@nycap.rr.com> wrote:
> We should look at what has been done in the weatherstation arena. CWOP
> (Citizen Weather Observer Programhttp://wxqa.com/) has been going on

Carsten Dannat

unread,
Apr 27, 2012, 6:15:11 PM4/27/12
to airqua...@googlegroups.com
Hi chduke,
Some additional info on the AQEgg feed 48307:

The AQEgg data, which you have analyzed, has been captured indoors. I had difficulties to get NO2 and CO readings outdoors. You might have a look at the discussion I've had with Joe:

https://groups.google.com/forum/m/?fromgroups#!topic/airqualityegg/W9e5sbK9_Aw


Cheers,
Carsten

Carsten Dannat

unread,
Apr 27, 2012, 6:35:23 PM4/27/12
to airqua...@googlegroups.com
The link to the discussion is not working. Hopefully, this one will do:
 
 
However, I haven't changed the resistor as suggested in the thread yet.
 
Carsten

 

Joseph Saavedra

unread,
Apr 27, 2012, 6:38:42 PM4/27/12
to airqua...@googlegroups.com
Hi all -

Also important to note that Carsten has an old CO sensor (MQ7 from Hanwei, not E2V) as well as old NO2 circuit.

Also, I believe the temp and humidity values on that unit are not from a DHT22, but SF breakouts.

Thanks, Carsten for maintaining one of the first protos this long!

_ _ _
Joseph Saavedra
Creative Technologist, Developer

Adjunct Faculty,
School of Art, Media, and Technology,
Parsons the New School for Design

Carsten Dannat

unread,
Apr 27, 2012, 6:46:13 PM4/27/12
to airqua...@googlegroups.com
@Joe: Yes, temp and humidity are not read from a DHT22, but SF breakouts are used instead.
 
@chduke: On March 7th, the room was pretty crowded and as far as I do remember, there has been a pretty steep rise in the NO2 or CO signal.
 
Carsten

chduke

unread,
Apr 28, 2012, 2:51:54 AM4/28/12
to airqualityegg
Hello guys,

thank you Carsten and Joseph for the information!

I had some suspicion that feed 48307 is indoor because temperature
readings look very stable throughout the day! Still, same amount of
data have been used for the analysis from both sensors, and for these
two particular cases, there have not been significant NO2 and CO
changes that have affected the AQ reading.

Carsten I will retrieve feed data from March 7th will run the analysis
again and post results!
> >https://groups.google.com/forum/m/?fromgroups#!topic/airqualityegg/W9...
>
> > Cheers,
> > Carsten

Daniel Nüst

unread,
May 2, 2012, 11:19:28 AM5/2/12
to airqua...@googlegroups.com
Am 27.04.2012 17:04, schrieb Nafis:
> Pachube is nice for storing raw data. I just wish there were more
> filters available (and when you make a historical data request, you
> could get the full start/end date range instead of the maximum 1000
> records :-).

We think about porting the data to our Sensor Observation Service [0]
(which might be challenging because Pachube is way easier to set-up
dynamic networks such as AQE), which would offer such functionality, and
make it available to a variety of clients [1].

Let me know if that would be interesting for you!

Best regards,
Daniel

[0] http://52north.org/communities/sensorweb/sos/
[1] http://52north.org/communities/sensorweb/clients/index.html

> It would be nice if Weather Underground or CWOP could add additional
> air quality fields. Maybe this would happen if we could get
> weatherstation vendors such as Davis to accept/sell additional
> sensors. I wonder how hard it would be to interface some of the Egg
> sensors to a Vantage Pro2? Hmmm.....



chduke

unread,
May 2, 2012, 4:55:12 PM5/2/12
to airqualityegg
Hello Daniel,

looks interesting thanks for sharing! Will look on the service in
details and come back to you with questions.

Charalampos

On May 2, 6:19 pm, Daniel Nüst <d.nu...@52north.org> wrote:
> Am 27.04.2012 17:04, schrieb Nafis:
>
> > Pachube is nice for storing raw data. I just wish there were more
> > filters available (and when you make a historical data request, you
> > could get the full start/end date range instead of the maximum 1000
> > records :-).
>
> We think about porting the data to our Sensor Observation Service [0]
> (which might be challenging because Pachube is way easier to set-up
> dynamic networks such as AQE), which would offer such functionality, and
> make it available to a variety of clients [1].
>
> Let me know if that would be interesting for you!
>
> Best regards,
> Daniel
>
> [0]http://52north.org/communities/sensorweb/sos/
> [1]http://52north.org/communities/sensorweb/clients/index.html
>
> > It would be nice if Weather Underground or CWOP could add additional
> > air quality fields. Maybe this would happen if we could get
> > weatherstation vendors such as Davis to accept/sell additional
> > sensors. I wonder how hard it would be to interface some of the Egg
> > sensors to a Vantage Pro2? Hmmm.....
>
> --
> Daniel N st
> 52 North Initiative for Geospatial Open Source Software GmbH
> Martin-Luther-King-Weg 24
> 48155 M nster, Germany
> E-Mail: d.nu...@52north.org
> Fon: +49-(0)-251 396371-36
> Fax: +49-(0)-251 396371-11http://52north.org/
Reply all
Reply to author
Forward
0 new messages