open parameter spaces, edge cases and numerical compactification

10 views

Skip to first unread message

josef...@gmail.com

unread,

Sep 13, 2017, 1:59:38 PM9/13/17

to pystatsmodels

(random thoughts on corner cases)

open parameter spaces

- stationarity requires that roots are inside the unit circle ( <1)

- Logit and similar assume that the probabilities are in the open interval (0, 1)

- Mixture distribution based on Logit or Multinomial Logit assume that the mixing probabilities are not degenerate (i.e. are in interior of simplex)

- Poisson is theoretically a special case of Negative Binomial but the standard equations for Negative Binomial don't work for the Poisson corner case (a=0 causes terms to become inf or 0/0)

In all cases we run into numerical and possibly theoretical problems as we approach an open bound.

aside: Vuong test for comparing non-nested likelihood models for Poisson versus ZeroInflatedPoisson

Poisson is a corner case of ZeroInflatedPoisson with Logit or Probit mixing for the case when the inflation probability is zero, which can only happen if the linear prediction goes to -inf. For any finite parameters Poisson and ZIP are not nested.

So is Vuong test valid or not? It's popular but on shaky grounds.

https://github.com/statsmodels/statsmodels/issues/3897

back to implementation details

Suppose we want to estimate a possibly misspecified model subject to this strict inequality constraints or open parameter spaces. We might want to do this for model comparison or for example to evaluate predictive ability.

Then,

We need to fix the numerical problems. One requirement in many of these cases is that we stay away from the boundary. We can add thresholds to the numerical computation so that in effect we estimate based on a numerically compactified parameter space.

We do this in GLM and some other places by clipping values to stay away from the bounds.

In some cases we clip just in specific functions to avoid numerical problems even if the answer is still correct (e.g. 0 log(0) = 0).

Those adjustments also help in many cases for convergence problems when we evaluate away from an optimal solution which can be finite and fine.

Second, don't shove it under the rug.

Currently, optimization problems might indicate a misspecified model. We explicitly raise or warn in a few cases, e.g. PerfectSeparation on Logit and Probit.

If we fix the problems using "artificial" bound, then users are getting some numbers. But we should try to warn.

E.g. currently we don't have any information about whether the clipping in GLM is in effect or not. We also don't know if clipping affects the final results or not.

Finally

We should have more models and more diagnostics so we can identify and use a "less misspedified" model that works without cheating.

aside2:

I ran into a few problems like this in count models

e.g. Poisson is numerically not a special case of NegativeBinomial

https://github.com/statsmodels/statsmodels/issues/3863

so we also cannot use it for a score_test between the two.

Poisson is not a special case of ZeroInflatedPoisson because of Logit and Probit bounds.

Under-dispersion and zero deflation are even worse because we cannot jump over the boundary besides not being able to approach the boundary.

How do we estimate the best overdispersed model if the true model (data generating process) has under-dispersion?

How do we estimate the best zeroinflated model if the true model has zero-deflation?

The current answer is most likely convergence failure.

aside3:

https://stackoverflow.com/questions/46173061/statsmodels-throws-overflow-in-exp-and-divide-by-zero-in-log-warnings-and-ps/46184092