Extracting NSSO data

2,043 views
Skip to first unread message

Jagriti Arora

unread,
Sep 4, 2016, 5:31:31 AM9/4/16
to datameet
Hi,
Can anyone tell me how I can make sense of the raw data NSSO provides on its website?
I tried converting the XML to dataframe in R, to no avail. I, now, have an excel sheet with references and variables that have not been previously declared.
Can anyone help? I'm looking for data from 38th and 66th round.

Thanks and regards!

Devdatta Tengshe

unread,
Sep 4, 2016, 5:57:55 AM9/4/16
to data...@googlegroups.com

Can you share the link where this data is available? That way we can have a look at it.

Regards,
Devdatta Tengshe
Ph: 735-358-0782


--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

sachin

unread,
Sep 5, 2016, 3:33:02 AM9/5/16
to datameet
Hi, 
I have used 68th round data for agri consumption and poverty estimation using STATA. 
I am assuming that the raw data you are referring to is also available in .txt format. As I know, the NSSO data has a highly structured format - Schedule.Level>Block>Item No. The variables are not declared in the raw data. These variables are to be understood from the "layout" file for that specific round (released along with the NSSO round data) and this is available along with raw data. 

The data is a long string characters. These are read in a specific manner. The layout file will specify how many characters must be read together to form each variable. So it could look like - 
v11 1-3 v12 4-8 v13 9-10 v14 11-13 v15 14-14 v16 15-15 v17 16-18 v18 19-20 v19 21-22 v110 23-24 v111 25-25 v112 26-26 and so on. 

Now, this is the data that is then called from your software, to be read from a raw data file (.txt) and then a table of required variables is obtained for analysis. In a sense, the raw data is always excerpted for analysis. And for this one begins with the layout file to check the variables of interest and how they are encoded in the data.

I am not sure this helps. With STATA it works a bit easy. With R, I do not know how to assemble the same dataframe, although the analysis using the variables will be a breeze.

Best
Sachin 
 

To unsubscribe from this group and stop receiving emails from it, send an email to datameet+u...@googlegroups.com.

Tarun Kateja

unread,
Aug 11, 2018, 7:58:57 PM8/11/18
to datameet
Hi Sachin,

I also want to extract 68th round Household and Consumer expenditure data. I am little confused and have never worked with Stata. Can you explain what is multiplier and how to use it? and can you share your code to extract data from .txt file? 

This will be a great help!

Thanks

GALEN PATRICK MURRAY

unread,
Aug 13, 2018, 7:06:14 PM8/13/18
to datameet
Hi Tarun,

Sachin is correct you use the layout file to identify which position in the string of characters correspond to which variables. Even though I'm an R user I think this extraction is more easily done in STATA. I've attached my STATA code for the 68th round extraction 

Since the NSSO data are samples, the multiplier acts as a survey weights so you can get population level estimates based on the sampled survey responses. Look at the readme (attached) for more information on how these multipliers are used to calculate survey weights (especially this part):

 For generating subsample-wise estimates based on data of all 
    subrounds taken together, either Subsample-1 households or 
    Subsample-2 households are to be considered at one time.  
    Subsample code is available in the data file. 
    (Please see layout of data).   
      
     Apply final weight (or all-subround multipliers) as follows :
     
     final weight = MLT/100,   if NSS=NSC
                  = MLT/200    otherwise.

Also, I found this blog very helpful for explaining NSSO data, the comments in particular may ask and answer common questions that you have. You can even write to the author and he seems generally quick to respond. Good luck!
NSS_68th_Type2.do
README_68_1.0_type1.txt

Chandrasekhar S.

unread,
Aug 14, 2018, 1:02:45 AM8/14/18
to data...@googlegroups.com
Greetings!

If you purchased data from NSSO it comes with a program (nesstar) that extracts the data for you. Use this program and it will extract to whichever format you would like including STATA.

Hope this helps.

Chandrasekhar

To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscribe@googlegroups.com.

Tarun Kateja

unread,
Aug 14, 2018, 4:31:29 AM8/14/18
to data...@googlegroups.com
Hi,

Hearty thanks to all for your responses. 

Extracting fixed length information is relatively easy considering there is separate data file for each level. My confusion on multiplier is not getting clear. Why is there a separate multiplier file for complete data and we also have last 10 bytes of each row as multiplier. 

How to use multiplier to populate the data? Are we simply multiplying each row's every attribute with weight (calculated from Multiplier as given in Readme file) and if yes which weights (multiplier from separate multiplier file or multipliers given as last 10 bytes in each level data file)? 

I can extract the data using python if we dont need to do any manipulation or calculations and simply use byte position to know the attribute value. Blogs are not explaining multiplier properly and everyone focusing on software like Stata but I need conceptual understanding to utilize the information in most Accurate way.

Thanks
Tarun Kateja
IIT Madras

Chandrasekhar S.

unread,
Aug 14, 2018, 4:56:13 AM8/14/18
to data...@googlegroups.com

Every household is given a weight. Adding up all the household weights gives us the estimated number of households in India. 
Assuming you are working with the household level file and if you multiple household weight by number of household members then you get what can be called the person weight. If you add up all the person weights you get the estimated number of people in India. 
Both estimates are representative for All India, State, NSS Region separately for rural and urban.
In Stata you would use the command 
tab State  Sector [fw=hh_weight]
tab State Sector [fw=person_weight]
Simple explanation without getting into sampling methodology - the weights are nothing but frequency weights
Hope this helps.


Cheers

C



To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Datameet is a community of Data Science enthusiasts in India. Know more about us by visiting http://datameet.org
---
You received this message because you are subscribed to the Google Groups "datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to datameet+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages