Outreg and outreg2 are Stata modules which allow you to combine several regressions into a single table. The output looks something like this (if you choose the latex option):
res.summary().as_latex())
to out put a single regression result like this:As far as I know there is no equivalent in Statsmodels for combining multiple regression results like the first figure.
Inspired by a economics stackexchage question (Outputting Regressions as Table in Python (similar to outreg in stata)?) I would like to help solve this problem. I've never contributed to a project like this or even an application used by anyone other than my research collaborators so I'm not sure my coding quite up to the challenge. However, I do have some basic code which is functional. I'd like to share that code and get feedback on how I might make it robustly functional, statsmodels standards compliant, and ready for inclusion more broadly. Here is my code:
import pandas as pd
import statsmodels.formula.api as smf
def makecoefftable(regresult):
d = pd.concat([regresult.params, regresult.HC0_se, regresult.tvalues, regresult.pvalues], axis=1)
df = pd.DataFrame(d)
df.columns = ["Betas", "Std. Errors", "Z Scores", "P Values"]
df['asterisks'] = pd.cut(df["P Values"], [0, 0.01, 0.05, 0.1, 1], include_lowest=False, labels=["***", "**", "*", ""])
# tablefoot1 = "Standard errors in parentheses"
# tablefoot2 = "*** p<0.01, ** p<0.05, * p<0.1"
outputformatfncsign = lambda x: "{:+.3f}".format(x)
outputformatfncnosign = lambda x: "{:.3f}".format(abs(x))
outputformatfncnosignwparens = lambda x: "(" + outputformatfncnosign(x) + ")"
df['BetaswStars'] = df['Betas'].apply(outputformatfncsign) + df["asterisks"].apply(str)
df['StarPadding'] = (df['BetaswStars'].map(len).max() - df['BetaswStars'].map(len)).values
df['StarPadding'] = df['StarPadding'].apply(lambda x: x*" " )
# df['BetaswStars'] = df['Betas'].apply(lambda x: int(x>=0)*" ") + df['BetaswStars'] + df['StarPadding']
df['BetaswStars'] = df['BetaswStars'] + df['StarPadding']
del df['StarPadding']
df['Betas'] = df['BetaswStars']
del df['BetaswStars']
del df['asterisks']
df['Std. Errors'] = df['Std. Errors'].apply(outputformatfncnosignwparens)
df["Z Scores"] = df["Z Scores"].apply(outputformatfncsign)
df["P Values"] = df["P Values"].apply(outputformatfncnosign)
df2 = pd.melt(df.reset_index()[["index", "Betas", "Std. Errors"]], id_vars=["index"]).sort_values(by=["index"])
df2["fieldtype"]="coef"
df2.loc[df2.index.max() + 1,:] = ["Observations", "Betas", int(regresult.nobs), "stats"]
df2.loc[df2.index.max() + 1,:] = ["R-Squared", "Betas", outputformatfncnosign(regresult.rsquared), "stats"]
# df2["fieldtype"]=df2["variable"]
df2["Variable"] = (df2["variable"]=="Betas") * (df2["index"])
del df2["variable"]
df2 = df2[["index", "Variable", "fieldtype", "value"]]
return(df2)
def makeoutregtable(listofregresults):
if len(listofregresults) != 1:
dfoutput = makecoefftable(listofregresults[0])
# dfoutput["source"] = 0
for result in listofregresults[1:]:
dftmp = makecoefftable(result)
# dftmp["source"] = i+1
dfoutput = dfoutput.merge(right=dftmp, how="outer", on=["index", "Variable", "fieldtype"])
else:
dfoutput = makecoefftable(listofregresults)
regidxstrarray = (np.arange(len(listofregresults))+1).astype(str)
outregstylelabel = [("(" + element + ")") for element in regidxstrarray.astype(str)]
# print(dfoutput.columns.values[:2])
# print(outregstylelabel)
# print(dfoutput)
dfoutput.columns = np.concatenate((dfoutput.columns.values[0:3].astype(str), outregstylelabel))
dfoutput = dfoutput.fillna("")
dfoutput = dfoutput.sort_values(["fieldtype", "index"])
return(dfoutput)
def finishoutregtable(dfoutput):
dfformatted = dfoutput.copy()
del dfformatted["index"]
del dfformatted["fieldtype"]
tablefoot1 = "Standard errors in parentheses"
tablefoot2 = '*** p$<$0.01, ** p$<$0.05, * p$<$0.1'
dfformatted.loc[dfformatted.index.max() + 1,"Variable"] = tablefoot1
dfformatted.loc[dfformatted.index.max() + 1,"Variable"] = tablefoot2
dfformatted = dfformatted.fillna("")
return(dfformatted)
def writelatexdocfromdf(df):
beginningtex = """\\documentclass{report}
\\usepackage{booktabs}
\\begin{document}"""
endtex = "\end{document}"
"""
f = open(filename, 'w')
f.write(beginningtex)
f.write(df.to_latex(escape=False))
f.write(endtex)
f.close()
"""
textable = beginningtex + '\n' + df.set_index("Variable").to_latex(escape=False) + '\n' + endtex
return(textable)
x = [1, 3, 5, 6, 8, 3, 4, 5, 1, 3, 5, 6, 8, 3, 4, 5, 0, 1, 0, 1, 1, 4, 5, 7]
y = [0, 1, 0, 1, 1, 4, 5, 7,0, 1, 0, 1, 1, 4, 5, 7,0, 1, 0, 1, 1, 4, 5, 7]
d = { "x": pd.Series(x), "y": pd.Series(y)}
df = pd.DataFrame(d)
df['xsqr'] = df['x']**2
mod = smf.ols('y ~ x', data=df)
res = mod.fit()
print(res.summary())
df['xcube'] = df['x']**3
mod2= smf.ols('y ~ x + xsqr', data=df)
res2 = mod2.fit()
print(res2.summary())
mod3= smf.ols('y ~ x + xsqr + xcube', data=df)
res3 = mod3.fit()
print(res2.summary())
reslistlong = [res, res2, res3]
makeoutregtable(reslistlong)
f = open("myregs.tex", 'w')
f.write(writelatexdocfromdf(finishoutregtable(makeoutregtable(reslistlong))))
f.close()
The resulting tex file compiles to the following table:
The heavy usage of outreg in the Stata community suggests this would be a much used feature if included as part of statsmodels. Hopefully my code is useful and with help and advice it could be expanded into a more full featured functionality but at a minimum it can serve as proof of concept. Please let me know how I can help.
Thanks,