Overwrite `params` to get different predictions

90 views
Skip to first unread message

VincentAB

unread,
Jun 22, 2023, 5:07:35 PM6/22/23
to pystatsmodels
Hi all,

I'm not sure if this is the right place to ask this newbie question. Please point me in the right direction if it isn't.

I would like to overwrite the `params` of a fitted model object in such a way that calling `res.predict()` will make different predictions, based on the arbitrary parameter values that I supplied instead of the original (estimated) ones.

Background: I want to use numerical differentiation to get derivatives of predictions (and functions of) w.r.t. parameters, for some Delta Method applications. I'm exploring the possibility of porting my `marginaleffects` package for R to Python and `statsmodels`: https://vincentarelbundock.github.io/marginaleffects/

Concretely, this is what I need:

# load and estimate
import pandas as pd
import statsmodels.formula.api as smf
df = sm.datasets.get_rdataset("Guerry", "HistData").data
mod = smf.ols("Literacy ~ Pop1831 + Desertion", df)
res = mod.fit()

# overwrite the `params` attribute of the results object
res2 = res
res2.params = pd.Series([1., 2., 3.], index=res.params.index)

# These two commands should now make different predictions, based on their different `params`
res.predict(df.head())
res2.predict(df.head())


Thanks for your time!
   
Vincent

josef...@gmail.com

unread,
Jun 22, 2023, 5:49:56 PM6/22/23
to pystat...@googlegroups.com
Hi Vincent,

You can use model predict which takes `params` as the first argument.
The other difference is that model.predict expects exog to be an numpy array, while results predict can take a pandas DataFrame that is transformed with the formula in the same way as the training sample data.

> Background: I want to use numerical differentiation to get derivatives of predictions (and functions of) w.r.t. parameters, for some Delta Method applications. I'm exploring the possibility of porting my `marginaleffects` package for R to Python and `statsmodels`: https://vincentarelbundock.github.io/marginaleffects/

That would be great. I looked at it and similar R packages in the last year.

I did most of the background implementation already, e.g. delta method for prediction is available through `_test_wald_nonlinear` which can take user provided functions.
It's currently used in get_prediction.
I have notebooks to illustrate how to use it for computing predictive margins and marginal/partial effects (with some unit tests against get_margeff)

The two main missing pieces
- creating "interesting" exog, sets of explanatory variables that can be used in predict. (I showed some of my experiments with pandas in an earlier comment in the mailing list)
- figuring out terms in the formulas and their derivative, e.g. interaction effects, or polynomials and similar


main issue for discussing implementation is https://github.com/statsmodels/statsmodels/issues/5387

My notebooks are not published, so I have to look for them

Cheers,
Josef



--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/1409382b-6c30-4e20-89e1-e961b191b55cn%40googlegroups.com.

josef...@gmail.com

unread,
Jun 22, 2023, 6:01:32 PM6/22/23
to pystat...@googlegroups.com
On Thu, Jun 22, 2023 at 5:49 PM <josef...@gmail.com> wrote:
Hi Vincent,

You can use model predict which takes `params` as the first argument.
The other difference is that model.predict expects exog to be an numpy array, while results predict can take a pandas DataFrame that is transformed with the formula in the same way as the training sample data.

> Background: I want to use numerical differentiation to get derivatives of predictions (and functions of) w.r.t. parameters, for some Delta Method applications. I'm exploring the possibility of porting my `marginaleffects` package for R to Python and `statsmodels`: https://vincentarelbundock.github.io/marginaleffects/

That would be great. I looked at it and similar R packages in the last year.

I did most of the background implementation already, e.g. delta method for prediction is available through `_test_wald_nonlinear` which can take user provided functions.
It's currently used in get_prediction.
I have notebooks to illustrate how to use it for computing predictive margins and marginal/partial effects (with some unit tests against get_margeff)

The two main missing pieces
- creating "interesting" exog, sets of explanatory variables that can be used in predict. (I showed some of my experiments with pandas in an earlier comment in the mailing list)
- figuring out terms in the formulas and their derivative, e.g. interaction effects, or polynomials and similar


main issue for discussing implementation is https://github.com/statsmodels/statsmodels/issues/5387

My notebooks are not published, so I have to look for them
both notebooks are "dirty". They are just a collection of experiments to see how margins for nonlinear terms and interaction terms can be implemented based on nonlinear delta covariance

VincentAB

unread,
Jun 23, 2023, 8:40:30 AM6/23/23
to pystatsmodels
Thanks Josef, this is great.

I tried it this morning and things seem to work as expected. Excellent!

If you look at the `marginaleffects` website, you'll see that things have changed a lot in the last year, and that there are now *a ton* of features. It'll take me a while to get anywhere close to parity (and I'm leaving on vacation next week). But once I have a working python prototype I'll ping yoy. We can then chat to see if it makes sense to integrate it in `statsmodels` or if it would be best as a standalone product.

Cheers!

Vincent

josef...@gmail.com

unread,
Jun 23, 2023, 9:26:34 AM6/23/23
to pystat...@googlegroups.com
The core computation will have to be integrated in the models.
We need the supporting model methods, e.g. derivatives https://github.com/statsmodels/statsmodels/issues/8833 (margeff ignores offset).
But marginal/partial/predictive effects are in high demand and we will need to support it directly.
That's why I was working on it during the last year. 
I extended `get_prediction` for 0.14 to already support some of the computation, but I was focused mainly on discrete models.

related issue:
margeff follows the Stata implementation.
In the tradition of "causal" analysis similar computation as margins "overall" are for average treatment effect ATE. However, the variance computation for the ATE differs from the delta method in margeff.
(Greene versus Wooldridge in econometrics)
Essentially, `margins` assumes parametric model, ATE allows for non-parametric identification with heterogeneity.
This might share some of the code with margeff.
The topic is relatively new to me and I don't have a clear overview yet of what we need to do.

I will also be on vacation in July.

Josef


VincentAB

unread,
Jun 23, 2023, 9:35:44 AM6/23/23
to pystatsmodels
Sounds great.

I'm not sure much needs to be changed in the models themselves. As you can see, the `marginaleffects` package for R support 80+ different modeling packages, and I didn't have to make any changes to those model fitting functions (maintained by different developers). The only thing that mattered was that their `predict()` methods supported the required options (e.g., offsets).

I've become pretty familiar with this area, and have dealt with a lot of user feedback, so I feel like I have a good sense of what (many) users are asking for. Will show you a prototype when it's ready.

Vincent
Reply all
Reply to author
Forward
0 new messages