Hi,
This is regarding Shivangi's question titled "State-->District-->Village level data".
DTName column is same as district column "DID,C,254" in the census data.
SDTName column is same as taluka column "TID,C,254" in the census data.
State code is the same in both. (for e.g. 27 for Maharashtra)
Districts are numbered from 1 to 35 exactly as they appear in the census file and those numbers are used in the shape file.
Talukas are also numbered from 1 to x (x being the number of talukas in each district). The sequence is same in both the files.
Here is the python code for district and taluka:
census = pd.read_csv("census.CSV")
census = census.apply(lambda s: s.str.replace("'", ""))
census = census[census["DTName"] != "MAHARASHTRA"]
ndf = census[census["SDTCode"] == "00000"]
ndf["mapindex"] = np.arange(1, len(ndf) + 1)
ndf[["DTCode", "Name", "mapindex"]]
taluka = census[census["TVCode"] == "000000"]
taluka = taluka[taluka["SDTCode"] != "00000"]
taluka["sno"] = (taluka.groupby(["DTName"]).cumcount()+ 1)
The full code is available here...
I have tested with Maharashtra data but the same logic can be applied to other states.
-- Shantanu