statsmodels.discrete.discrete_model.Poisson not finding correct parameters from simulated toy problem data

51 views
Skip to first unread message

Timothy Elser

unread,
Mar 5, 2018, 5:03:52 PM3/5/18
to pystatsmodels
As indicated in the subject line, I'm trying to test the poisson regression model by seeing if it can recover the correct parameters:


'''
from scipy.stats import poisson
from statsmodels.discrete.discrete_model import Poisson
def poisson_sample(x,coef=[2,.02,.06]):
    return(poisson.rvs(coef[0] + coef[1]*x['A']+coef[2]*x['B']))
test_df = pd.DataFrame({"A":np.random.normal(size=10000),"B":np.random.normal(size=10000),"C":np.ones(10000)})
test_df['val'] = test_df.apply(poisson_sample,axis=1)
model = Poisson(test_df['val'],test_df[['C','A','B']])
result = model.fit()
test_df['val_comparison'] = test_df.apply(lambda x:exponential_function(x,result.params),axis=1)
'''

I expect result.params to be approximately equal to [2,.02,.06]; instead, I get [.68,.018,.05]. The values generated from a those parameters (val_comparison) don't resemble the original independent 
variable at all.  Using result.model.score on both sets of params indicates the gradient of the latter is near zero, as one would expect at the end of optimization,while the gradient at the actual 
parameter values is enormous: [-181253.37971073 -10661.46990424 -1820.99983089].

Am I simulating the underlying distribution incorrectly, or misunderstanding something about the api? Any advice on what might be going wrong would be greatly appreciated.


josef...@gmail.com

unread,
Mar 5, 2018, 5:15:25 PM3/5/18
to pystatsmodels
The problem is the parameterization, Poisson uses a exponential mean function, or log-link in GLM terminology.

poisson_mean = exp( X dot params)

if you simulate data with
poisson.rvs(np.exp(coef[0] + coef[1]*x['A']+coef[2]*x['B']))
then those are the coef that should be estimated accurately  with a large sample.

The main reason to use a log-link, exponential mean function instead of a linear mean function is that the exp results in predictions that are positive.

The exponential mean function is hard coded in discrete.Poisson, but GLM allows choosing the links.

It is possible to use am identity or linear link in GLM with Poisson. However, that might break if the start_params are not good and the optimization ends up with negative expected mean. It works in simple examples, but might often result in convergence failures.
(I don't remember right now if a link check raises an exception in this case or if we just get a DomainWarning.)

Josef



Timothy Elser

unread,
Mar 5, 2018, 6:16:51 PM3/5/18
to pystatsmodels
Thanks; I'm more than a bit embarrassed that I missed the omitted np.exp when proofreading.
Reply all
Reply to author
Forward
0 new messages