Claim: The upcoming Census data is being fudged

5 views
Skip to first unread message

Shubham Kumar

unread,
Jun 16, 2026, 10:56:04 AM (7 days ago) Jun 16
to core-stack-dev
Dear friends,

The Indian Census is underway and there are basic questions being asked that when aggregated might give us the development that has happened in the country like whether there is closed defecation facility in home or not, what is the source of water in the house etc. 

Yesterday, I got in touch with a Physics PhD student at IIT Delhi, whom I know from last 3 or more years. He was saying to me that one of his friend is a teacher and has also been tasked to collect census data from each home in a particular region. They are being asked to manipulate the data/ misrepresent it otherwise their data would not be submitted/ accepted. 

When asked why he is doing this? He should be maintaining his integrity. But the teacher responded that everyone is doing this and there is pressure to follow this. 

Consider now the fact that policy level interventions would be made based on this fudged data. How accurate and effective would be the policy formulation? 

To counter this: 
Q1. Are there statistical tests, or analysis which one can perform on the population data or on a random sample of it, which detects data fudging/ fluke data? 

Q2. Can one do analysis on available satellite data, that gives proxy for the true development on the ground or at least quantifies different dimensions of development?

Best Regards,
Shubham Kumar

P.S.: Today I learnt, that the census work is undertaken by Office of the Registrar General and Census Commissioner, India under the Ministry of Home Affairs and not by MoSPI(Ministry of Statistics and Program Implementation). 

Amit Kumar

unread,
Jun 16, 2026, 11:57:32 AM (7 days ago) Jun 16
to Shubham Kumar, core-stack-dev
Hi Shubham,

Yes, there have been several reports published by TheHindusabrangindia, and other portals. 

They have been delaying on this for several years, since census requires the cooperation of so many people, all of whom would be required to follow governemnt dictats. Survey, and other data collection methods are far easier to capture, since the number of nodes to act in a sketchy manner is manageable. However, since they now have experience of Systematic Voter Deletion, in addition to general change in norms about expectations of truth from this government, they have now went ahead with building national demographic dataset in the image of their idea of India. 

There is a portal WorldPop, which has built a seemingly reliable dataset based on estimations by several factors and Geospatial imageries. It is also hosted by GEE:
 
This can also be used to check their numbers. 

However, since these dataset only include numbers of people (and one of them classifies them with Gender also), it would be hard to counter check if the Census gives the number of people to be correct, while fudging the attributes of that person or locality. This time, we are also having `caste` as part of census, which too is susceptible to fudging, in addition to basic current norms, such as fudging against certain religions. For analysing these kinds of fudging, we could have analysed electoral roll data, which is supposed to be a dataset of almost everyone over 18 years of age. Electoral Rolls data also contains names, from which, for crude analysis, some researchers around the world have built ML based methodologies (It’s All in the Name: A Character-Based Approach to Infer Religion - Cambridge University Press - 23 March 2023) to estimate religion from names. This methodology can also be expanded for Caste inference, if reliable large enough dataset is available with verified names based caste inference, however, it will always be prone to local variations, limitations with generic second names etc. So, based on my current, understanding fudging at this level would be hard to counter check without going to localities.

It would be an essential democratic tool if we can build a reliable way to identify patterns of data fudging, spatial variations (where they are most likely to implement it most, like West Bengal, JK, Punjab etc), and then methodologies for each of these fudging patterns. 

- Amit

--
You received this message because you are subscribed to the Google Groups "core-stack-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to core-stack-de...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/core-stack-dev/b53c0f4f-646d-4945-9e24-8e53bc558ebdn%40googlegroups.com.

Prashant Kumar

unread,
Jun 17, 2026, 4:51:07 AM (7 days ago) Jun 17
to Amit Kumar, Shubham Kumar, core-stack-dev
Hello,

It will be interesting to see what variables they are trying to manipulate, I have heard (unofficially) they (enumerators) are reporting caste parameters wrong, they ask few questions (like 4-5) and fill in other details on their intuition. The manipulated set of questions will be important to check. On the other hand the weather is too hot to conduct a field survey, the incentive provided to enumerators is 40000 rupees for completing the task, if you disagree with certain things the admin creates pressure and a strict deadline. We hope things will be better in the actual census survey which will be done in Feb 2027. I'm not sure about the quality of houselisting but if we ask proper questions in the correct forum it may be helpful. 

Thanks
Prashant

Reply all
Reply to author
Forward
0 new messages