That was the wrong machine, my bad. I am on statsmodels 0.7.0.dev0+5dd2502
Does it belong in the formula string tho? I think it is better this way, both data argument and offset are outside.
With R they are not in the string, but R does not use a string even for the formulas itself.
Also note about halfway down this pagehttp://www.ats.ucla.edu/stat/r/dae/nbreg.htmwhere they point out that R's parameterization for negative binomial is different from SAS, Stata, and SPSS (and us).
On Thursday, March 12, 2015 at 7:27:51 AM UTC-4, Kerby Shedden wrote:Our alpha parameter is apparently the reciprocal of R's theta parameter.If I run it like this, I get perfect agreement with R in terms of the coefficients.model=smf.glm("survive ~ age + sex + C(whichclass)", data=df, offset=df['lncases'],family=sm.families.NegativeBinomial(alpha=1/9.6122)).fit()Apparently R is doing joint ML estimation of the theta parameter and the coefficients, so the standard errors don't line up if you do it this way. But if I run the R function like the following, I get perfect agreement for both estimates and SE's:reslm <- glm(survive ~ age + sex + factor(whichclass) + offset(lncases),family=negative.binomial(theta=9.61), data = titanicgrp)
mod_nb = smf.negativebinomial("survive ~ age + sex + C(whichclass)",
data=df, offset=df['lncases'])
# There is a bug in start_params with offset exposure,
#res_nb = mod_nb.fit(method='lbfgs', start_params=np.concatenate((model.params, [0.5])))
res_nb = mod_nb.fit(start_params=np.concatenate((model.params, [0.5])))
print(res_nb.summary())
Optimization terminated successfully.
Current function value: 3.643069
Iterations: 12
Function evaluations: 14
Gradient evaluations: 14
NegativeBinomial Regression Results
==============================================================================
Dep. Variable: survive No. Observations: 12
Model: NegativeBinomial Df Residuals: 7
Method: MLE Df Model: 4
Date: Thu, 12 Mar 2015 Pseudo R-squ.: 0.1306
Time: 08:39:08 Log-Likelihood: -43.717
converged: True LL-Null: -50.283
LLR p-value: 0.01065
======================================================================================
coef std err z P>|z| [95.0% Conf. Int.]
--------------------------------------------------------------------------------------
Intercept 0.6134 0.328 1.868 0.062 -0.030 1.257
C(whichclass)[T.2] -0.3746 0.307 -1.220 0.223 -0.977 0.227
C(whichclass)[T.3] -0.9071 0.288 -3.155 0.002 -1.471 -0.343
age -0.6700 0.254 -2.643 0.008 -1.167 -0.173
sex -0.9802 0.246 -3.985 0.000 -1.462 -0.498
alpha 0.1040 0.068 1.521 0.128 -0.030 0.238
======================================================================================