Hi,
I'm looking at Maharashtra's land records portal :
https://mahabhulekh.maharashtra.gov.in
.. and wondering if it's possible to scrape data from here?
Will share a workflow:
choose 7/12 (७/१२) > select any जिल्हा > तालुका > गाव
select शोध : सर्वे नंबर / गट नंबर (first option)
type 1 in the text box and press the "शोधा" button
Then we get a dropdown with options like 1/1 , 1/2, 1/3 etc.
On selecting any and clicking "७/१२ पहा",
a new window/tab opens up (you have to enable popups), having static
HTML content (some tables). I need to capture this content.
The URL is always the same:
https://mahabhulekh.maharashtra.gov.in/Konkan/pg712.aspx
..but the content changes depending on the options chosen.
On using the browser's "Inspect Element"> Network and clicking the
final button, there is a request to this URL:
https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx/call712
and the request Params / Payload is like:
{'sno':'1','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'}
when you change the survey/gat number to 1/10, the params change like so:
{'sno':'1#10','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'}
for 1/1अ:
{'sno':'1#1अ','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'}
I tried some wget and curl commands but no luck so far. Do let me know
if you can make some headway.
Also, it would be great to also learn how to extract on the list of
districts, talukas (subdistricts) in each district, and villages in
each taluka.
dumping other info at bottom if it helps.
Why do this:
At present it's just an exploration following on from our work on
village shapefiles.
The district > taluka > village mapping data from official Land
Records data could serve as a good source for triangulation.
Then, while I don't see myself going deeper into this right now, I am
aware that land records / ownership has major corruption,
entanglements and other issues precisely because of the lack of
transparency. The mahabhulekh website itself is a significant step
forward in making this sector a little more transparent, and more push
in this direction would probably do more good IMHO. At some point
GIS/lat-long info might come in, and it would be good to bring the
data to a level that is ready for it.
Data dump:
When we press the button to fetch the 7/12 (saatbarah) record, the
console records a POST with these parameters:
Copy as cURL:
curl '
https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx/call712'
-H 'Host:
mahabhulekh.maharashtra.gov.in' -H 'User-Agent: Mozilla/5.0
(X11; Ubuntu; Linux i686; rv:42.0) Gecko/20100101 Firefox/42.0' -H
'Accept: application/json, text/plain, */*' -H 'Accept-Language:
en-US,en;q=0.5' --compressed -H 'Content-Type:
application/json;charset=utf-8' -H 'Referer:
https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx' -H
'Content-Length: 170' -H 'Cookie:
ASP.NET_SessionId=3ozsnwd3nhh4py4hmiqcjeoc' -H 'Connection:
keep-alive' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'
Copy POST data:
{'sno':'1#1अ','vid':'273200030398260000','dn':'रत्नागिरी','tn':'खेड','vn':'वाळंजवाडी','tc':'3','dc':'32','did':'32','tid':'3'}
request headers:
POST /Konkan/Home.aspx/call712 HTTP/1.1
Host:
mahabhulekh.maharashtra.gov.in
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:42.0)
Gecko/20100101 Firefox/42.0
Accept: application/json, text/plain, */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/json;charset=utf-8
Referer:
https://mahabhulekh.maharashtra.gov.in/Konkan/Home.aspx
Content-Length: 170
Cookie: ASP.NET_SessionId=3ozsnwd3nhh4py4hmiqcjeoc
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
response headers:
HTTP/1.1 200 OK
Cache-Control: private, max-age=0
Content-Type: application/json; charset=utf-8
Server: Microsoft-IIS/8.0
X-Powered-By:
ASP.NET
Date: Mon, 24 Oct 2016 15:31:40 GMT
Content-Length: 10
Copy Response:
{"d":null}
--
--
Cheers,
Nikhil
+91-966-583-1250
Pune, India
Self-designed learner at Swaraj University <
http://www.swarajuniversity.org>
Blog <
http://nikhilsheth.blogspot.in> | Contribute
<
https://www.payumoney.com/webfronts/#/index/NikhilVJ>