How to Scrape National Air Quality Data

598 views
Skip to first unread message

Thejesh GN

unread,
Feb 14, 2020, 6:41:13 AM2/14/20
to datameet
I wanted to write this how-to for a long time.  But today I did :)



Thej
--
Thejesh GN  ತೇಜೇಶ್ ಜಿ.ಎನ್
http://thejeshgn.com
GPG ID :  0xBFFC8DD3C06DD6B0

Thejesh GN

unread,
Feb 14, 2020, 6:43:40 AM2/14/20
to datameet

Thej
--
Thejesh GN  ತೇಜೇಶ್ ಜಿ.ಎನ್
http://thejeshgn.com
GPG ID :  0xBFFC8DD3C06DD6B0

Sarath Guttikunda

unread,
Feb 14, 2020, 7:06:33 AM2/14/20
to data...@googlegroups.com
Dear Thej,

FYI, the site you are accessing is for Air Quality Index and not Air Quality.

As a researcher, I am interested in Air Quality Data
Main site is this
Use the advance search option and download the data -- cumbersome process; there are limitations on number of days you can request, depending on the time step you choose. Best is 15 mins and you take care of the weeding out; otherwise asking for day average, the program will give you a value averaging everything it has, which includes -999 and some random numbers.

Alternative is openaq.org
We keep getting blocked by once in a while from fetching and archiving.
Only last 90 days is online and here is a post to access it
Rest is all stored on aws s3 buckets
There are patches of days missing when the server is blocked by cpcb from fetching and there is not backlog correction method in place.

AQI is useful for PR purposes, but beyond that not so much. Since, this is a calculated value based on 24-hr averages for PM2.5 (for example), it missed the raw instantaneous story.

With best wishes,
Sarath


--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/CAABnYsX%3DzYCL-9g26AS69HbNz1qD5Aj%2Be%3D0YsCPRt7sdup1AbQ%40mail.gmail.com.

Thejesh GN

unread,
Feb 14, 2020, 7:25:02 AM2/14/20
to datameet
No I am getting air quality.


I am downloading from the same site you mentioned

https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing/caaqm-comparison-data

As of now 15 mins average of PM2.5 and PM 10.

2 days for a request.

I can go back for years. But it means 1000s of requests. Hence my note to be soft on the site.

You can get other parameter like Benzene, Toluene etc by just modifying this line. I was interested only in PM2.5 and PM10. Hence getting only that. Check



Ultimately the request goes to 


Note:
The first screenshot in the blog post is for just getting site ids and not the actual data. I think it's what is causing confusion. I will add a new screen shot to remove that confusion.


Regards,
Thej


Thejesh GN ⏚ ತೇಜೇಶ್ ಜಿ.ಎನ್
http://thejeshgn.com

Sarath Guttikunda

unread,
Feb 18, 2020, 12:03:28 AM2/18/20
to data...@googlegroups.com
Dear Thej,

Note: The first screenshot in the blog post is for just getting site ids and not the actual data. I think it's what is causing confusion. I will add a new screen shot to remove that confusion.

This was it.. thank you for clarifying. I did not check the code, so did not see the actual link in use.
This is most useful. Thank you.

With best wishes,
Sarath


Reply all
Reply to author
Forward
0 new messages