As indicated in the subject line, I'm trying to test the poisson regression model by seeing if it can recover the correct parameters:
'''
from scipy.stats import poisson
from statsmodels.discrete.discrete_model import Poisson
def poisson_sample(x,coef=[2,.02,.06]):
return(poisson.rvs(coef[0] + coef[1]*x['A']+coef[2]*x['B']))
test_df = pd.DataFrame({"A":np.random.normal(size=10000),"B":np.random.normal(size=10000),"C":np.ones(10000)})
test_df['val'] = test_df.apply(poisson_sample,axis=1)
model = Poisson(test_df['val'],test_df[['C','A','B']])
result = model.fit()
test_df['val_comparison'] = test_df.apply(lambda x:exponential_function(x,result.params),axis=1)
'''
I expect result.params to be approximately equal to [2,.02,.06]; instead, I get [.68,.018,.05]. The values generated from a those parameters (val_comparison) don't resemble the original independent
variable at all. Using result.model.score on both sets of params indicates the gradient of the latter is near zero, as one would expect at the end of optimization,while the gradient at the actual
parameter values is enormous: [-181253.37971073 -10661.46990424 -1820.99983089].
Am I simulating the underlying distribution incorrectly, or misunderstanding something about the api? Any advice on what might be going wrong would be greatly appreciated.