Hi everyone,
I am a researcher at Development Data Lab and I wanted to share a Py package I recently created for our data workflows: AdminLineageAI.
The package helps create crosswalks between administrative units across datasets and time periods using Artificial Intelligence. For example, matching districts, subdistricts, states, or countries when names differ because of spelling variants, translations, renaming, splits, or mergers.
This problem comes up often in research and policy datasets. For example, one dataset may use Paschimi Singhbhum while another uses West Singhbhum, Allahabad -> Prayagraj or a newer district may be split from its older predecessor district. These cases are difficult to solve with plain fuzzy matching.
Some possible use cases include matching government scheme data to Census or official administrative lists, creating district or sub district evolution crosswalks across years, handling renamed/split/merged administrative units, and building reproducible data pipelines for social science, policy, or development research.
It is experimental, so important matches should still be cross-checked but I would love to hear more possible use cases for it. Main aim here is to reduce manual work spent on just merging dataset.
You can install it with:
pip install adminlineage
Here is the GitHub repo with usage instructions and a few examples:
It is open source, so please feel free to fork it, use it, or raise an issue on GitHub if you have feedback or run into anything.
Best,
Taha
--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/datameet/7ba665b2-beb2-4b6b-a6b8-371df36c9bc6n%40googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/datameet/CAMieJ_7UVL0zD7zbmXDMxAjZHDEg558QPcrNyM0OppSXMtoWqw%40mail.gmail.com.