> Hi,
>
> At work I'm trying to predict the probability of click in a banner. The
> exogenous variables are all categorical (sex, language, country, OS, etc...)
> and the response is binary (click Yes or Not). In the first step of the
> exploratory phase I planned to use statsmodel.api.Logit, but I'm getting a
> "LinAlgError: Singular matrix". Of course, that is because some rows in the
> exogenous are duplicated and I assume that internally the solver is doing
> some matrix inversion.
Since Logit uses nonlinear optimization, it doesn't handle singular
matrices, in contrast to the linear models.
GLM might be able to handle singular exogenous variables, but it would
still leave you with non unique parameter representation.
> So the question is, how should I reformulate the problem in order to use
> logistic regression with categorical variables in the statsmodel package?
I think the easiest might be to build the designmatrix, exog, with
patsy for example dmatrix or use the formula interface to the model
explicitly (requires 0.5dev)
The alternative would be to drop the columns that are linear
combinations of others "by hand" or use a transform to non-singular
design matrix.
Josef
>
> Regards.
> Toni.
>
> --
> You received this message because you are subscribed to the Google Groups
> "pystatsmodels" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
pystatsmodel...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>