design change: mutliple inheritance and super is super

19 views
Skip to first unread message

josef...@gmail.com

unread,
May 7, 2015, 1:16:53 PM5/7/15
to pystatsmodels
We are making extensive use of `super` calls but have avoided multiple inheritance so far so we don't get the additional code complexity from figuring out which class is used when within the super sequence.

We use some simple Mixins and multiple inheritance in the test suite.


I attended Hettinger's talk on super at pycon, and I think we can safely extend our pattern to at least one level of multiple inheritance.

class PoissonMixed2(MixedMixin, Poisson):
    pass

>>> for i in mod.__class__.mro(): print(i)
... 
<class '__main__.PoissonMixed2'>
<class 'statsmodels.base.mixed.MixedMixin'>
<class 'statsmodels.discrete.discrete_model.Poisson'>
<class 'statsmodels.discrete.discrete_model.CountModel'>
<class 'statsmodels.discrete.discrete_model.DiscreteModel'>
<class 'statsmodels.base.model.LikelihoodModel'>
<class 'statsmodels.base.model.Model'>
<class 'object'>

                                ...
                                 |
                         LikelihoodModel
                                 |
                                ...
                                 |
MixedMixin      - >  Poisson
        |
PoissonMixed


That's similar to an example that Hettinger showed, where MixedMixin is modifying a method from the other part Poisson to LikelihoodModel

The method resolution order is still simple and easy to figure out.
This is full multiple inheritance, not just adding Mixins that define non-overlapping sets of methods that we discussed before.


I used the same pattern for a PenalizedMixin

class PoissonPenalized2(PenalizedMixin, Poisson):

    pass


In both cases the Mixin is modifying the likelihood model of the base likelihood model.

The penalized mixin is simpler, it just adds the penalization term to loglike, score and hessian, and will also need to do it for extra methods, scoreobs, loglikeobs and so on.


    def loglike(self, params):
        llf = super(PenalizedMixin, self).loglike(params)
        return llf - self.pen_weight * self.penal.func(params)


The MixedMixin is more complicated. It takes the super, e.g. Poisson, loglikeobs, and integrates it over the random effects and aggregates for each group in a panel, longitudinal or cluster setting.

This could be nested, although I haven't tried it yet:

class PoissonPenalizedMixed(PenalizedMixin, MixedMixin, Poisson):

    pass


and we have penalized maximum likelihood for Poisson model with cluster specific random effects. 

- Poisson provides the underlying distribution, 

- MixedMixin integrates and aggregates and 

- PenalizedMixin adds a penalty term. 

`PoissonPenalizedMixed` has the submodels including Poisson as special cases, and would be the only one that we really need, except for complex signatures.


And, in two more lines we can do the same for GLM


class GLMMixedPenalized(PenalizedMixin, MixedMixin, GLM):

    pass


(inherited classes are listed in reverse of the list in the name, according to post-fixing convention)

The only thing I didn't like so much about the pattern is that we need and get a large number of new classes instead of adding new methods and extensions to existing classes.

Josef
Did I mention recently, Python is fun. I started to pay more attention to Julia, but ...

josef...@gmail.com

unread,
May 7, 2015, 1:32:16 PM5/7/15
to pystatsmodels
Just to mention one alternative:

We could implement many of the things also with instance methods. But those cannot be pickled and, AFAIR, we dropped all use of them that we had for a while.
Also, they make classes a lot more "crowded" with conditional code.

Josef

Kerby Shedden

unread,
May 8, 2015, 12:03:39 AM5/8/15
to pystat...@googlegroups.com
I like it.  Will the penalized mixin override fit, or provide a separate fit_regularized method?

Kerby

josef...@gmail.com

unread,
May 8, 2015, 8:44:48 AM5/8/15
to pystatsmodels
On Fri, May 8, 2015 at 12:03 AM, Kerby Shedden <kshe...@umich.edu> wrote:
I like it.  Will the penalized mixin override fit, or provide a separate fit_regularized method?

Neither, either or both.

We can always add extra methods that are not in the inheritance chain for either internal use or user facing, like `fit_regularized` or `_fit_regularized`.

However, the user will always have the `fit` method available whether inherited or modified, so it needs to work and will have to be part of the user facing API.

Right now my PenalizedMixin does neither, I just use the inherited `fit` method, which calls the standard optimizer and creates the standard model specific results class.
(To clean up my PenalizedMixin, I will have to override fit at least for GLM because it will not work with method IRLS which is still our default.)

We need to override and modify the inherited fit method, if we don't use the default optimizers, if we want to adjust the created results instance or if the inherited methods don't work without changes.

For example, if you only want to provide regularized fit with a special optimizer, then you could just name you fit_regularized as `fit` and it replaces the inherited fit, or add a switch between optimizers (similar to GLM.fit).

As example for not working fit: MixedMixin adds additional parameters to the `params`, and some inherited methods won't work. I need to override the inherited `predict` to strip the extra parameters for the super().predict call, and I have to override `fit` because the default start_params don't have the extra parameters.

I think that we should provide several user API `fit_xxx` only if they provide clearly distinct functionality, like currently `fit` is unregularized, while `fit_regularized` has the penalized fit (and in discrete_models also returns a different results class.)  
Another possible reason to provide a second official fit function is if the signature is very different

My guess for the specific case with elastic net:

If we provide special `XXXPenalized` classes, then it would be better to override fit and delegate to an internal `_fit_elasticnet` or `_fit_regularized` method.
My main worry is about what attributes we have to attach to the model and whether they could get out of sync if we have several fit methods. It might be easier to have one main fit method that is in "control" overall.

But, I don't have a very strong opinion about this yet. I have to go through the standard fit channel because I'm using the standard inherited optimizers. For elastic net, both ways can be made to work.


Josef

josef...@gmail.com

unread,
May 8, 2015, 9:28:35 AM5/8/15
to pystatsmodels
On Fri, May 8, 2015 at 8:44 AM, <josef...@gmail.com> wrote:


On Fri, May 8, 2015 at 12:03 AM, Kerby Shedden <kshe...@umich.edu> wrote:
I like it.  Will the penalized mixin override fit, or provide a separate fit_regularized method?

Neither, either or both.

We can always add extra methods that are not in the inheritance chain for either internal use or user facing, like `fit_regularized` or `_fit_regularized`.

However, the user will always have the `fit` method available whether inherited or modified, so it needs to work and will have to be part of the user facing API.

Right now my PenalizedMixin does neither, I just use the inherited `fit` method, which calls the standard optimizer and creates the standard model specific results class.
(To clean up my PenalizedMixin, I will have to override fit at least for GLM because it will not work with method IRLS which is still our default.)

We need to override and modify the inherited fit method, if we don't use the default optimizers, if we want to adjust the created results instance or if the inherited methods don't work without changes.

For example, if you only want to provide regularized fit with a special optimizer, then you could just name you fit_regularized as `fit` and it replaces the inherited fit, or add a switch between optimizers (similar to GLM.fit).

As example for not working fit: MixedMixin adds additional parameters to the `params`, and some inherited methods won't work. I need to override the inherited `predict` to strip the extra parameters for the super().predict call, and I have to override `fit` because the default start_params don't have the extra parameters.

I think that we should provide several user API `fit_xxx` only if they provide clearly distinct functionality, like currently `fit` is unregularized, while `fit_regularized` has the penalized fit (and in discrete_models also returns a different results class.)  
Another possible reason to provide a second official fit function is if the signature is very different

My guess for the specific case with elastic net:

If we provide special `XXXPenalized` classes, then it would be better to override fit and delegate to an internal `_fit_elasticnet` or `_fit_regularized` method.
My main worry is about what attributes we have to attach to the model and whether they could get out of sync if we have several fit methods. It might be easier to have one main fit method that is in "control" overall.

But, I don't have a very strong opinion about this yet. I have to go through the standard fit channel because I'm using the standard inherited optimizers. For elastic net, both ways can be made to work.


I'm getting more convinced of this (override fit and delegate to _fit_xxx).  

We will have to attach penalization weight and penalty function to the model, at which loglike, score and similar are evaluated by default. Otherwise, we would have to adjust several of the default methods and cached attributes in the Results. If we allow two different user facing methods to change this, then it might become easy to get conflicting behavior.
(If penalization parameters and penalization weight are mutable, i.e. not fixed in __init__, we still run into the possibility of "stale state", that we haven't removed from all models yet.)

On the positive side: `fit` can provide common things like input checking and result creation or modification, that doesn't need to be in every `_fit_xxx` method.

Josef

josef...@gmail.com

unread,
May 8, 2015, 10:06:52 AM5/8/15
to pystatsmodels
On Thu, May 7, 2015 at 1:16 PM, <josef...@gmail.com> wrote:
just got a silly idea that won't work

class GLMMixedNested3(MixedMixin, MixedMixin, MixedMixin, GLM):

    pass



What we would actually like to do is to chain Models which does a recursive composition of methods.
In a nested multilevel model, we need to integrate the likelihood several times within a nested hierarchy of groups. Each Mixin class performs one level of aggregation and integration.

However inheritance doesn't chain recursively.

We need a ChainMixin or another way to chain methods, but then we don't get the chaining as cheap (in terms of code and automatic features) as just using `super`.

Josef

Josef

Kerby Shedden

unread,
May 8, 2015, 6:41:56 PM5/8/15
to pystat...@googlegroups.com
When you merge your PenalizedMixin I can rebase #2385 on it.

For non-smooth penalties and coordinate-descent type algorithms, we don't want to include the non-smooth part of the penalty in like/score/hess.   We use like/score/hess to obtain a quadratic approximation to the likelihood plus the smooth part of the penalty, then we use the structure of the non-smooth terms to do each one-dimensional optimization.  This can still be handled within your mixin, but the penal_func, score, etc. would not reflect the entire penalty.

Also for coordinate-descent, it might be useful to have a method that "spawns" a 1-dimensional restricted model along a given coordinate.  For PHReg this would have the ability to recycle a lot of the setup calculations that only depend on endog.

Kerby

josef...@gmail.com

unread,
May 8, 2015, 7:36:19 PM5/8/15
to pystatsmodels
BTW: 
I converted this thread to a SMEP statsmodels enhancement proposal which are just a collection of notes, not like real PEPs. 
https://github.com/statsmodels/statsmodels/wiki/SMEP-D:-Multiple-Inheritance-and-Modifier-Mixins-for-LikelihoodModels

Some of the discussion of implementation details we can then move to a github issue for easier future reference.


On Fri, May 8, 2015 at 6:41 PM, Kerby Shedden <kshe...@umich.edu> wrote:
When you merge your PenalizedMixin I can rebase #2385 on it.

For non-smooth penalties and coordinate-descent type algorithms, we don't want to include the non-smooth part of the penalty in like/score/hess.   We use like/score/hess to obtain a quadratic approximation to the likelihood plus the smooth part of the penalty, then we use the structure of the non-smooth terms to do each one-dimensional optimization.  This can still be handled within your mixin, but the penal_func, score, etc. would not reflect the entire penalty.

I had started to think about it. We might have to change the penalty methods internally to switch between smoothed and nonsmooth versions depending on the optimizer.

I'm starting to think that it will not have any effect for the results for the user if we use trimming of "zero" variables, or thresholding to set the coefficients to be exactly zero, if the user chooses L1, SCAD or similar. That's independent of whether we use the smoothed version with the scipy optimizers.
The idea is that the smoothing part is local around zero and only affects parameters that are thresholded to zero. Other parameters are in the part where the smoothed penalty function is identical to the unsmoothed version.

This would mean that it doesn't matter whether we use a smoothed version or not for the final estimate of a reduced model.

(Fan and Li use the locally quadratic, smoothed approximation as part of their optimization algorithm, but the local smoothing moves with the parameters towards zero. IIRC, they need to use other numerical tricks for optimization problems this can create.
In my version of their local smoothing, the smoothing interval is a essentially fixed parameter, but in my experiments I set it so small that coefficients that should be zero are smaller than 1e-4. I don't have the theoretically correct thresholding yet.)

This would not hold when there is a user option for smooth(ed) penalization that does not threshold.


Related: I think, but haven't verified yet, that the covariance matrix of the parameters of Fan and Li can be obtained with cov_type="HC0" of the reduced model without any additional work.
It's a standard M-estimator sandwich cov_params AFAIU. Which means we also get other sandwiches like cluster robust standard errors for free.

Josef
 

Also for coordinate-descent, it might be useful to have a method that "spawns" a 1-dimensional restricted model along a given coordinate.  For PHReg this would have the ability to recycle a lot of the setup calculations that only depend on endog.

Yes I think it's very useful to add private methods for model specific computational shortcut.
I don't have an overview or specific suggestions.

Josef
Reply all
Reply to author
Forward
0 new messages