Extract district wise rainfall prediction data from NWP site

49 views
Skip to first unread message

a.ja...@gmail.com

unread,
May 26, 2021, 6:02:45 PM5/26/21
to datameet
Hi,


In the link above, we can see the 5 days weather prediction by districts - however, there are a lot of clicks that are required to reach any one particular district. Please help in identifying a method to collate all India Rainfall data in a single table.

Tried to use Octoparse - however, either I am using it incorrectly or else it doesn't work in this site. any other source of the same data is also welcome.

regards,
Abhishek

Nikhil VJ

unread,
May 26, 2021, 9:53:50 PM5/26/21
to datameet
Hi Abhishek,

Right-click on page -> Inspect -> opens the browser console -> Go to Network tab , then browse around the site and check out the comings and goings between site and server.

I was able to get the data we're seeing in the website coming in my command prompt (which means: one can capture this) with this simplified cURL command:
curl 'https://nwp.imd.gov.in/blf/blf_temp/block.php' --compressed --data-raw 'dis=22AMRAVATI'

How to get these district codes: inspecting one page before..
curl --compressed 'https://nwp.imd.gov.in/blf/blf_temp/dis.php?value=22maharashtra'

How to get the state codes: just scrape from the html of the page you shared: https://nwp.imd.gov.in/blf/blf_temp/

Before my foray into python, I used to use notepad++ and libreoffice Calc (their raw text import dialog leaves excel in the dust) to separate out the data I needed from the html tags etc. I'd use spreadsheet formulas to even generate command-prompt commands in bulk. Fun times. For limited jobs, there's things you can do there in a few mins which would take hours of coding.

Maybe you can set these up in Octoparse (i have no experience with that as I just roll my own code) or some other tools to get the data you need. All the best!

--
Cheers,
Nikhil VJ
https://nikhilvj.co.in


--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/00fa6267-9dcd-489a-b579-58be5d62b383n%40googlegroups.com.

Dammalapati Sai Krishna

unread,
May 26, 2021, 10:06:46 PM5/26/21
to data...@googlegroups.com
I wrote a Python code that does a similar web crawling - 

You could be able to get what you want with minor changes to the code, let me know if you need any help. 

Regards,
Dammalapati Sai Krishna, 
7981548365


On Thu, May 27, 2021 at 3:32 AM a.ja...@gmail.com <a.ja...@gmail.com> wrote:

Abhishek Jain

unread,
May 27, 2021, 9:19:24 PM5/27/21
to data...@googlegroups.com
Hi Nikhil, 

Thanks very much for your help. Gave me some clarity as to where I should be looking - . However, when I try to use the curl command, I get a couple of errors: first if I try the command you ran above which includes '--compressed' - I get an error stating that the installed libcurl version does not support this - if i try without the --compressed option, then I get an error "Protocol "'https" not supported or disabled in libcurl" - I saw online that I need to install some other software - however, don't really want to go down that path as my primary laptop is the company laptop.

Also - assuming I get this working - the next page which has the actual data - its a table with no discernible fields - How can I extract the same to a usable format? If I can get the data in Excel I can still create code to work with it - however, I am not proficient in other programming languages.
Would really appreciate some help here.

Regards,
Abhishek

You received this message because you are subscribed to a topic in the Google Groups "datameet" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/datameet/gMW9xvKrSNg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to datameet+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/CAH7jeuM80boH2pdNrOkef3wQcTzFD9Yg0LVJaQL-SNr8pU_GbA%40mail.gmail.com.


--
Regards,

Abhishek Jain
Reply all
Reply to author
Forward
0 new messages