Indian Gazette archives

98 views
Skip to first unread message

sreeram kandimalla

unread,
Jun 11, 2025, 3:18:54 AM6/11/25
to data...@googlegroups.com
Hi all,

We now have most of the Indian central and state gazettes archived at https://archive.org/details/gazetteofindia?sort=-date

There are crawlers running daily out of the code at the egazette repo and my temporary fork of the same.

One of the advantages of having the data at archive.org is that it comes with automatic OCR(using tesseract), a free text search engine and a possibility to get a RSS feed based on a search query. I hope people build some useful things with it. 

The following states and union territories currently have problems:
  1. Andaman and Nicobar islands: Site doesn't have current data. 
  2. Jammu and Kashmir: Site is offline. Hopefully temporarily.
  3. Mizoram: Data is not being updated at source.
  4. Meghalaya: Data delayed by 3 months
  5. West Bengal: No gazette site could be found. Would appreciate it if anyone can locate it( https://www.wbgazettepart2.in/ is not it ). 
Thanks,
Sreeram K

Arun Ganesh

unread,
Jun 11, 2025, 4:22:35 AM6/11/25
to data...@googlegroups.com
This is hero's work and an extraordinary resource for the Government itself!

Thank you Sreeram for making this happen.

--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/datameet/CAMgvHC5sttm0hoajbFySGRRVHUmHKM2d3e-_NtmpooSUxAd1OQ%40mail.gmail.com.

sreeram kandimalla

unread,
Jun 11, 2025, 5:21:20 AM6/11/25
to data...@googlegroups.com
Thanks Arun, but this was an almost decade old project from Carl Malamund from Public Resource and Sushant Sinha from IndianKanoon

More websites showed up since they started the work, so I helped with closing the missing states. 

The plan is to consolidate in one repo and maintain it, this is going to be painful because the sites keep disappearing, changing the backend software, changing domain names and in one case corrupting their DNS entries( all of which have happened while I was trying to get the crawlers running ).



Rohini

unread,
Jun 12, 2025, 1:26:57 AM6/12/25
to datameet
Many thanks for doing this. This is invaluable! 

In 2013, I tried to get CDs of the Maharashtra Gazette because of the some of the same issues with their website that you have mentioned. I have quite a story to tell. (I intended to use the copy of the gazette to maintain references on several articles on Wikipedia that had suffered link rot.)

Thanks,
Rohini
Reply all
Reply to author
Forward
0 new messages