statsmodels.discrete.discrete_model.Poisson not finding correct parameters from simulated toy problem data

Timothy Elser

unread,

Mar 5, 2018, 5:03:52 PM3/5/18

to pystatsmodels

As indicated in the subject line, I'm trying to test the poisson regression model by seeing if it can recover the correct parameters:

'''

from scipy.stats import poisson

from statsmodels.discrete.discrete_model import Poisson

def poisson_sample(x,coef=[2,.02,.06]):

return(poisson.rvs(coef[0] + coef[1]*x['A']+coef[2]*x['B']))

test_df = pd.DataFrame({"A":np.random.normal(size=10000),"B":np.random.normal(size=10000),"C":np.ones(10000)})

test_df['val'] = test_df.apply(poisson_sample,axis=1)

model = Poisson(test_df['val'],test_df[['C','A','B']])

result = model.fit()

test_df['val_comparison'] = test_df.apply(lambda x:exponential_function(x,result.params),axis=1)

'''

I expect result.params to be approximately equal to [2,.02,.06]; instead, I get [.68,.018,.05]. The values generated from a those parameters (val_comparison) don't resemble the original independent

variable at all. Using result.model.score on both sets of params indicates the gradient of the latter is near zero, as one would expect at the end of optimization,while the gradient at the actual

parameter values is enormous: [-181253.37971073 -10661.46990424 -1820.99983089].

Am I simulating the underlying distribution incorrectly, or misunderstanding something about the api? Any advice on what might be going wrong would be greatly appreciated.

josef...@gmail.com

unread,

Mar 5, 2018, 5:15:25 PM3/5/18

to pystatsmodels

The problem is the parameterization, Poisson uses a exponential mean function, or log-link in GLM terminology.

poisson_mean = exp( X dot params)

if you simulate data with
poisson.rvs(np.exp(coef[0] + coef[1]*x['A']+coef[2]*x['B']))

then those are the coef that should be estimated accurately with a large sample.

The main reason to use a log-link, exponential mean function instead of a linear mean function is that the exp results in predictions that are positive.

The exponential mean function is hard coded in discrete.Poisson, but GLM allows choosing the links.

It is possible to use am identity or linear link in GLM with Poisson. However, that might break if the start_params are not good and the optimization ends up with negative expected mean. It works in simple examples, but might often result in convergence failures.

(I don't remember right now if a link check raises an exception in this case or if we just get a DomainWarning.)

Josef

Timothy Elser

unread,

Mar 5, 2018, 6:16:51 PM3/5/18

to pystatsmodels

Thanks; I'm more than a bit embarrassed that I missed the omitted np.exp when proofreading.

Reply all

Reply to author

Forward