CPCB AQI All Parameters Data

389 views
Skip to first unread message

Saketha Ramanujam

unread,
Jun 27, 2020, 9:16:55 AM6/27/20
to datameet
Hello all,


I've managed to acquire 24 Hr average AQI data for dates between 01-01-2020 and 15-06-2020 from 227 monitoring stations out of 231 across India as listed on  

I'm currently looking at trends of various parameters before and during various phases of lockdown. Posting this because these datasets might be of use to those interested.

I'm also working on tweaking around with a wrapper over the original cpcb api to make it easy for the datasets to be downloaded. 


Thanks,
Saketha Ramanujam

Sarath Guttikunda

unread,
Jun 27, 2020, 10:19:35 AM6/27/20
to data...@googlegroups.com
Dear All,

Good evening.

Here is our assessment of air quality during the 4 lockdown periods.
City averages through the periods for all the pollutants is downloadable.

If you are looking for raw monitoring data, main site is this
https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing

Use the advanced search option to download the data by city and station. This is a cumbersome process and there are some limitations on the number of days you can request in a single search, depending on the time step you choose (1hr or 2hr averages can be longer periods as compared to 15 mins average). Best is 15 mins, which allows for cleaning the data; otherwise if you ask for a day average, the program will give you a value averaging everything it has, which includes negative numbers and some unexplainable numbers like 999's.

This is Air Quality (AQ) data and NOT Air Quality Index (AQI)

Alternatively, you can access the official data at Openaq.org, which stores last 90 days of data to access via API

One way is to search by city name
https://api.openaq.org/v1/measurements?city=Ahmedabad&parameter=pm25&date_from=2020-04-01&date_to=2020-04-10&format=csv
Ahmedabad has only one site, as of 2020.
The Second way is to search by radius method (in km), which will give stations in and around Ahmedabad, including Gandhi Nagar.
https://api.openaq.org/v1/measurements?coordinates=23,72.6&radius=30000&date_from=2020-04-01&date_to=2020-04-10&parameter=pm25&format=csv

All the openaq.org data (up until the last hour) is all stored on aws s3 buckets
https://openaq-fetches.s3.amazonaws.com/index.html (click on real time, to access the global csv files)

There is also an instructional video (with JS language), if you want to dig deeper - https://youtu.be/Tiot877orkU

With best wishes,
Sarath



--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/2de06254-ad0f-4905-bb25-5cf1a5013afao%40googlegroups.com.

Saketha Ramanujam

unread,
Jul 10, 2020, 9:13:53 AM7/10/20
to data...@googlegroups.com
Hello all,

I've just uploaded the updated data for all pollution parameters data from the 221 available stations under cpcbccr here.
Previously the datasets were uploaded for, until 15th of June, which is now updated to 30th of June.


Thanks,
Saketh.

Sarath Guttikunda

unread,
Jul 10, 2020, 9:30:03 AM7/10/20
to data...@googlegroups.com
Thank you, Saketha.

One cautionary note: From the CPCB site, best to download the raw 15-minutes data and you weed out all the redundant numbers like 0’s or negative numbers or 999’s or repeats or any other number you may find not meeting the quality check criteria (we found at least 15 instances of random patterns and numbers that need weeding out). You can request hourly or daily average data, which will allow for downloading longer time periods in one search, but this will likely not pass through the same level of quality check for representativeness. For example, a 24-hour request algorithm will simply aggregate and average that data for the day without any deletions or corrections for quality.


With best wishes,
Sarath

--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

Saketha Ramanujam

unread,
Jul 10, 2020, 9:39:39 AM7/10/20
to data...@googlegroups.com
Hello Sarath!


I've found out a programmatic way to eliminate the hassle with using the CPCBs UI.
It's true that the averages across multiple frequencies do not match at times.

I'm working on a client that returns a dataframe for a given date, range, sampling frequency  and station.
Will push the code over the weekend and post an update here. 


There's a simple interface that I'm also writing alongside which will easily let us understand what are different stations and parameters available in a particular station.

1. https://love-the-air.herokuapp.com/states lists the states which have stations
2. http://love-the-air.herokuapp.com/state/<state> eg., http://love-the-air.herokuapp.com/state/Assam returns a list of cities that have active stations within that state.
3. https://love-the-air.herokuapp.com/city/<city> eg., https://love-the-air.herokuapp.com/city/Guwahati returns a list of available stations in that city.
4. https://love-the-air.herokuapp.com/station/<station_id> eg., https://love-the-air.herokuapp.com/station/site_5073 returns a list of parameters that are specific to that particular station.

 
Thanks,
Saketh.
Reply all
Reply to author
Forward
0 new messages