Mapping Local Government Directory to WikiData

231 views
Skip to first unread message

Thejesh GN

unread,
Mar 20, 2021, 1:16:35 AM3/20/21
to datameet
LGD publishes some important IDs, that can be useful. I also think WikiData item Id as a primary key. I just started syncing both of them locally so, I can update the WikiData with missing Census Location IDs. States was easy, but districts turned out to be not so easy.

I have blogged here


But here are the differences. Let me know what do you guys think.

WikiDataIdLabelDescriptionComments
Q955977South ArcotFormer district in Tamil Nadu, India Needs to be marked as dissolved in WikiData
Q1900496BangaloreFormer district in Karnataka, India Needs to be marked as dissolved in WikiData
Q1606061AndamanFormer district of the Andaman and Nicobar Islands Needs to be marked as dissolved in WikiData
Q24949801ShahbazwanDistrict of Bihar in India is this same as GOPALGANJ district? Marked by mistake in WikiData. Should be removed as a district.
Q6007135ImphalWikimedia disambiguation page is ex-district. Was split. Needs to be marked as dissolved in WikiData
Q48731903NoklakDistrict in India, NagalandNew district. LGD needs update. January 20, 2021.
Q61746013 NarayanapetDistrict of Telangana, India There seem to be a duplicate Narayanpet district (Q85787759); but Q61746013 was created earlier. DataCommons also uses the same. It also has
Q29025081East Karbi AnglongDistrict of Assam, India When KARBI ANGLONG was split. The western part became the new "West Karbi Anglong" and the rest remained part of "Karbi Anglong". There is no "East Karbi Anglong" as such. Should be removed in WikiData?
Q101088203Bajalidistrict of Assam India New district formed in 12 January 2021. LGD needs an update
DONT KNOW Vijayanagara district of Karnataka in IndiaNew district formed in 2020/21. Needs an addition to LGD. May be mark Q1611788 as district in WikiData?
DONT KNOW Chachaura district of mpMissing on LGD, WikiData and OSM. No gazette yet
DONT KNOW Maihar district of mpMissing on LGD, WikiData and OSM. No gazette yet
DONT KNOW Nagda district of mp Missing on LGD and WikiData. No gazette yet.
Q61439260Pakke-Kessang district of Arunachal Pradesh in IndiaIt was missing from WikiData query results. Because it was not tagged as district. I updated WikiData.


Thej
--
Thejesh GN  ತೇಜೇಶ್ ಜಿ.ಎನ್
http://thejeshgn.com
GPG ID :  0xBFFC8DD3C06DD6B0

Bodhisattwa Mandal

unread,
Mar 20, 2021, 10:28:43 AM3/20/21
to data...@googlegroups.com
Hi Thejesh,

The best place to discuss this is here - https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_India

There are Wikidata contributors who had been working on this, who might respond there.

Thanks,
Bodhisattwa


--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/datameet/CAABnYsUTfHZnmisWitBKBAGRBkYQ2OA8%2BuuK46MwRy8uNqiWTg%40mail.gmail.com.

Arun Ganesh

unread,
Mar 20, 2021, 2:39:19 PM3/20/21
to datameet
Very cool Thejesh! The LGD dataset is definitely super useful to help reconcile various other datasets that reference any territory.

Have been maintaining a dump of all the other LGD lookups here https://github.com/planemad/india-local-government-directory . Would be great to have it merged with the datmeet repo and see how we can maintain an easy to access dump of https://lgdirectory.gov.in

Thejesh GN

unread,
Mar 21, 2021, 12:30:32 AM3/21/21
to datameet

Arun - Sure. How do we proceed?


I also have the udise_districts and udise_blocks in the same SQLITE. udise_districts uses a completely different  udise_dist_code. I will try and map wikiDataId to this as well.

udise_blocks are completely different from blocks as a geographical area. I am not going to pick it up as of now.

My plan to pickup sub-district after this.

Thej
--
Thejesh GN  ತೇಜೇಶ್ ಜಿ.ಎನ್
http://thejeshgn.com
GPG ID :  0xBFFC8DD3C06DD6B0

Naveen Francis

unread,
Mar 25, 2021, 10:26:25 PM3/25/21
to datameet
Hello 

To maintain the country subdivision data model, there is a task force in Wikidata. 

Thanks,
naveenpf

sreeram kandimalla

unread,
Sep 22, 2023, 1:23:35 AM9/22/23
to datameet
Just an FYI, LGD mappings have been asserted in wikidata till the district level based on Thejesh's work and I verified them independently. 

Moving onto lower divisions( Tehsils/CD blocks ) now. The wikidata hierarchy for these is unclear and needs to be cleaned up. 

Thejesh GN

unread,
Sep 22, 2023, 1:57:17 AM9/22/23
to data...@googlegroups.com
Thank you for letting us know Sreeram.

I had started working on Taluks. Its not that straightforward. I will keep the list informed.

Thej
--
Thejesh GN  ತೇಜೇಶ್ ಜಿ.ಎನ್
http://thejeshgn.com
GPG ID :  0xBFFC8DD3C06DD6B0

Arun Ganesh

unread,
Sep 22, 2023, 3:37:58 AM9/22/23
to data...@googlegroups.com
Sharing some of the LGD-Wikidata mapping that I had done from two years ago. Hopefully its of some use and can be a start.


sreeram kandimalla

unread,
Sep 22, 2023, 5:25:47 AM9/22/23
to data...@googlegroups.com
@Thejesh: Calling it not straightforward would be an understatement :)

@Arun : I will try and use the data.

BTW, I did add LGD codes to Tehsils in Wikidata based on their existing census codes in Wikidata. I need to verify the names. 

Still around 800(Tehsils) + 1000(CD blocks) of the close to 4000 entries already in Wikidata left to be reconciled.

I have been using Paul Novosad's masala-merge to reconcile names. I wonder if libindic's inexactsearch can also be used for this. 

Also, this is the current status of the Subdistrict/Tehsil/CD Block entity class hierarchy in Wikidata.  
Screenshot 2023-09-15 at 9.25.29 PM.png

I think this hierarchy is incorrect and needs to be fixed. 

Out of curiosity, I also ran an analysis to check if blocks are co-terminus with tehsils in any of the states, based on data in LGD.
Here is the link to the gist if anyone is interested - https://gist.github.com/ramSeraph/66f4b4a8780e3e47932467776731416a

The results are still confusing mostly because block mapping in LGD is probably incomplete.



Arun Ganesh

unread,
Sep 22, 2023, 6:36:42 AM9/22/23
to data...@googlegroups.com

The results are still confusing mostly because block mapping in LGD is probably incomplete.


This is part of the problem, generally it seems LGD is still a WIP from subdistricts onwards. Only sometime in the last year did they update the missing taluks for Mumbai Suburban and Chennai districts even though it was always in existence. So the LGD unfortunately cannot be trusted to be current even though the creation of new entities seem to be quite prompt.

Coverage of subdistrict items in wikidatata is quite low. Most of the items that exist with the same name would be the item for the town. There are also cases where the Wikidata item may be missing the English label (example) making name matching a bit of a puzzle..

sreeram kandimalla

unread,
Jan 17, 2024, 9:55:15 AMJan 17
to data...@googlegroups.com
One more update here.. 

Wikidata syncing with LGD is done till subdistrict level. 

You can query them at https://w.wiki/8sWm

Also, I tried to add all alternate names I could find as aliases in wikidata.

Automating this syncing and updating periodically is something I might take up at a future date.

But I suspect manual intervention and review is going to be required for this.

If someone wants to have a go at it, I will try to provide support.



--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

sreeram kandimalla

unread,
Jan 17, 2024, 10:05:02 AMJan 17
to datameet
Correction: query link is https://w.wiki/8sXT

Arun Ganesh

unread,
Jan 17, 2024, 1:36:00 PMJan 17
to data...@googlegroups.com
This is the kind of painful work that can drive most people insane. Kudos to you Sreeram, next level stuff!

Sabarish, KSITM

unread,
Jan 18, 2024, 2:14:54 AMJan 18
to data...@googlegroups.com
The UDISE uses a totally different blocks in Kerala We have educational districts and sub districts  and blocks and educational grouping is totally different from that of Revenue grouping.
Regards
Sabarish

Thejesh GN

unread,
Jan 18, 2024, 2:31:42 AMJan 18
to data...@googlegroups.com
Its not just in Kerala, all over India. Educational (UDISE) geographic boundaries of districts and below, don't match with revenue ones .

Thej
--
Thejesh GN  ತೇಜೇಶ್ ಜಿ.ಎನ್
http://thejeshgn.com
GPG ID :  0xBFFC8DD3C06DD6B0

Reply all
Reply to author
Forward
0 new messages