On Sun, Apr 21, 2013 at 4:34 PM, Sarah Mount <
mount...@gmail.com> wrote:
> Hi there,
>
> this is a very elementary question about using statsmodels; I'm just getting
> to grips with the API and a bit confused about what is where.
>
> I have some discrete data and want to figure out which distribution it best
> fits. I understand the way to do this is to use MLE to fit various
> distributions and perform a Chi-squared test to see which has the best fit.
> scipy has the Chi-squared test covered, so it's the model fitting I'm
> interested in here.
>
> AIUI the fitting process goes like this using SM:
>
>>>> import scikits.statsmodels.api as sm
>>>> import numpy
>>>> data = numpy.random.poisson(0.35, 1000)
>>>> exog = [i for i in range(1000)]
>>>> poisson_model = sm.GLM(data, exog, family=sm.families.Poisson())
>>>> results = poisson_model.fit()
>>>> print 'Params:', results.params, 'p-value:', results.pvalues
> Params: [-0.00169184] p-value: [ 0.]
>>>>
>
> So my questions are:
>
> 1) is the above correct, have I understood exog and endog correctly and so
> on?
kind of, in your case exog is a trend,
if you want to fit just a constant, then exog = np.ones(1000)
if you want to fit a constant and a trend, then you could use
exog = sm.add_constant(np.arange(1000))
np.arange(1000) is the same as your list comprehension above, but faster.
>
> 2) which discrete distributions are available? In sm.families.family I see
> Binomial, Gamma, Gaussian, InverseGaussian, NegativeBinomial, Poisson. Is
> there a way to use other distributions like Pareto or Yule?
statsmodels.discrete has also Poisson, Logit, ... as pure MLE models
instead of generalized linear model.
both discrete_models and GLM are focused on regression, getting the
distribution parameters as a function of some explanatory variables
If you just want to estimate the parameters of a distribution (instead
of fitting it to some explanatory variables), then you can also use
the fit method of the scipy.stats.distributions. I don't know how good
they are for discrete distributions.
MLE for Pareto works if you know the lower bound (location), but it's
trickier if you want to estimate the location.
I never looked at estimation of Yule.
Do you have explanatory variables, or just constant parameters for the
distributions?
Josef