Interaction terms in statsmodels regression?

4,586 views
Skip to first unread message

christoph...@ucsf.edu

unread,
Sep 10, 2014, 8:58:04 PM9/10/14
to pystat...@googlegroups.com
Hi

I'm trying to build a regression model to tell me if two input variables are interacting.

Right now I'm doing this:

InteractionModelpVals=sm.OLS(response,covariateMatrix).fit().pvalues

It kind of works, but I created the interaction term as a new variable having the possible values of -3, -2, 2, or 3. This is maybe not the best. Or is that a legit way to create an interaction term?

In R you can literally multiply two terms together to get an interaction term and then stick it in your formula.
interactionTerm<-term1*term2
response~term1+term2+interactionterm
https://www.inkling.com/read/r-cookbook-paul-teetor-1st/chapter-11/recipe-11-6

Should I not be using OLS to fit such a model?
Bonus points for explaining to me a good way to incorporate categorical covariates into my models. (Is there a faster way than creating binary dummy variables for each category?)

Thanks!
Chris

josef...@gmail.com

unread,
Sep 10, 2014, 9:33:40 PM9/10/14
to pystatsmodels
Hi,

Use our. i.e. patsy's formulas, especially if you are familiar with R.

See the patsy documentation for some differences between R and patsy, most definitions of formulas are very similar or the same.

Also use dmatrix or dmatrices to get a hold of the design matrix directly, to see if it's what you think it should be, and for reuse.

anova_lm is a function that provides anova tables after regression


a few more examples are spread throughout the notebooks here

I hope that helps.

Josef


 

Thanks!
Chris

christoph...@ucsf.edu

unread,
Sep 17, 2014, 10:15:21 PM9/17/14
to pystat...@googlegroups.com

Thanks, Josef.
Just an update for any newbies out there seeing this.
I'm using pandas dataframes and they simplify the readability of the code.
For categorical variables just write "C(variable)" inside your model formula and then it's categorical (I think it automatically generates dummy variables).
Here's some code:

from statsmodels.formula.api import ols
...
ModelFormula="ResultVariable~C(dataBatch)+diet*votingHistory"# best interaction variable ever.
...collecting data in MergedDataFrame...
#here's the guts of my fit and extract the pvalues
InteractionModelpVals=ols(formula=ModelFormula, data=MergedDataFrame).fit().pvalues

peace,
Chris
Reply all
Reply to author
Forward
0 new messages