Hi! New guy here. @
josefpktd , thanks for looking into this! It is extremely important. I'm an applied microeconometrician and have been working on survey analysis for development organizations (sp World Bank) for years. Rhough I'm trying to convert the research teams to Python... there's always the complex survey design issue
that has everyone in the community coming back to Stata (yes, they could use R, but Stata svy is so well integrated).
For survey design, we need *probability weights*, not the other types. Those other types are useful, but not what's relevant for survey analysis. I'm afraid that the github conversation linked to down here is about frequency weights.
The real difference kicks in with the variance/covariance matrix only! "Analytical" weights and "probability" weights, for example, lead to the same estimated beta parameters in an OLS model; but the SE are different.
Other than weights, we'd need the error structure for stratification and for clustering.
A while ago (July 2016), I managed to match some results using statsmodels to Stata:
For adding weights:
func = lambda x: np.asarray(x) * np.asarray(np.sqrt(weights))
And I applied that against all the variables. Then estimated an OLS model using the weighted y and X's.
That gives me beta parameters that match Stata's regress y X [pw = Weights]
To match se's: This matches the Stata results for regress y X [pw = Weights]
import statsmodels.stats.sandwich_covariance as robust
np.sqrt(robust.cov_white_simple(resultsW))
(resultsW is resultsW = model.fit() , where model is the weighted model)
To match clustered se's: This matches Stata's results of regress y X [pw = Weights], vce(cluster Weights)
np.sqrt(robust.cov_cluster(resultsW, weights))
Please let me know if that helps.
I have not coded for statsmodels or any other packages before (just scripting data analysis), but was thinking I'd start ... then found this thread!
Has there any progress been done? Otherwise, if you have some patience, I'd love to help. I'm fluent in Python and R, but from the statistics and data science side. If you have some patience, I'd love to learn to develop the package by helping to add survey funcitionality
(as a reference: I started studying complex survey analysis in my PhD, which I finished a decade ago, and have worked on that for most of my professional career since, mostly at the research group at the world bank).
Please let me know!