[pystatsmodels] design: instance flags

14 views
Skip to first unread message

josef...@gmail.com

unread,
Apr 16, 2010, 3:48:42 PM4/16/10
to pystat...@googlegroups.com
Given that we are having some design discussions:

I'm not a big fan of global flags, but using instance flags could
reduce the number of keyword arguments that would be useful.

example:
currently the sandwich estimators, robust covariances, are not used
for any result statistics. Pandas is using Newey-West everywhere (?)
by default.
At least ftest and ttest should allow the choice of parameter
covariance matrix, and it would be useful if the summary method also
could provide robust t- and p-values.

Instead of using keywords everywhere, we could use a flag that can be
changed at any time, e.g.
res = sm.OLS(endog, exog).fit()
res.usecov = 'HC1'
print res.summary() # or whatever it will be after the changes
print res.ttest(...)

by default usecov is None or standard cov

I'm doing something similar when I need to override default settings,
e.g. the degrees of freedom after model.__init__, there are no
problems in python with setting attributes unless it screws up the
calculations. The alternative would require additional keywords in
many places.

Josef


--
Subscription settings: http://groups.google.com/group/pystatsmodels/subscribe?hl=en

Skipper Seabold

unread,
Apr 16, 2010, 4:33:15 PM4/16/10
to pystat...@googlegroups.com
On Fri, Apr 16, 2010 at 3:48 PM, <josef...@gmail.com> wrote:
>
> Given that we are having some design discussions:
>
> I'm not a big fan of global flags, but using instance flags could
> reduce the number of keyword arguments that would be useful.
>
> example:
> currently the sandwich estimators, robust covariances, are not used
> for any result statistics. Pandas is using Newey-West everywhere (?)
> by default.
> At least ftest and ttest should allow the choice of parameter
> covariance matrix, and it would be useful if the summary method also
> could provide robust t- and p-values.
>
> Instead of using keywords everywhere, we could use a flag that can be
> changed at any time, e.g.
> res = sm.OLS(endog, exog).fit()
> res.usecov = 'HC1'

Generally, I try to stay away from stuff like this since it feels too
procedural for my tastes, but I don't have a strong opinion.

> print res.summary()  # or whatever it will be after the changes
> print res.ttest(...)
>
> by default usecov is None or standard cov
>
> I'm doing something similar when I need to override default settings,
> e.g. the degrees of freedom after model.__init__, there are no
> problems in python with setting attributes unless it screws up the
> calculations. The alternative would require additional keywords in
> many places.
>

If it's easier this way, then that's fine. Also note that setting a
flag might need to trigger updating cache results, so we would need to
take advantage of the updatable cache results.

Skipper

josef...@gmail.com

unread,
Apr 16, 2010, 10:39:57 PM4/16/10
to pystat...@googlegroups.com
I would use them only where we they have no cache effects, at least at
the beginning (or in my case, changing some model parameters before
calling fit)

summary, ftest and ttest don't store anything in cache, for them it
would be useful to set the flag (HC1, ...) for reporting and testing.

However, after checking the result classes: bse and pvalues are cached
(t is a function), so for them to enable a change in the flag would
require changing the cached values.
An alternative for those could be to add new attributes bse_robust,
pvalue_robust (and t_robust) instead of making the existing ones
resettable.

Josef

Bruce Southey

unread,
Apr 17, 2010, 12:37:11 PM4/17/10
to pystat...@googlegroups.com
Global-like values tend to get forgotten and confused.

Probably the worst technical usage is when you want to do different
things at the same time like a standard f-test but get the t-test with
a different structure. Also, for certain models, you need to use
different error structures or terms. On the practical side you need
two lines of code when it is more explicit to with one line (then
again complex vs simple and other Python Zen's may conflict).

This is also a problem of design because I do not understand " [t]he
alternative would require additional keywords in many places." So
perhaps you are trying to get to many options into a single function.
For example, why is res.ttest(...) a function? Everything about it is
determined by the data, the model and the subsequent fit. All that you
can really do is change the degrees of freedom and error term -
perhaps the term being tested. I would suggest that the robust
estimators are probably a method of the results or summary class if
the use the same model fit. If not then these are a separate class.

Bruce

josef...@gmail.com

unread,
Apr 17, 2010, 2:00:32 PM4/17/10
to pystat...@googlegroups.com
I'm not sure what you mean here.
In the example, I specifically mean to optionally replace the use of
Results.cov_params by HC or Newey-West like pandas. This wouldn't
change any other assumptions or definitions in the model.

> On the practical side you need
> two lines of code when it is more explicit to with one line (then
> again complex vs simple and other Python Zen's may conflict).
>
> This is also a problem of design because I do not understand " [t]he
> alternative would require additional keywords in many places." So
> perhaps you are trying to get to many options into a single function.
> For example, why is res.ttest(...) a function? Everything about it is
> determined by the data, the model and the subsequent fit. All that you
> can really do is change the degrees of freedom and error term -
> perhaps the term being tested. I would suggest that the robust
> estimators are probably a method of the results or summary class if
> the use the same model fit. If not then these are a separate class.

t_test and f_test are very flexible to test any kind of linear
restriction on the parameters, ttest for a single constraint, and
ftest for e.g. a contrast matrix.
it's the main tool to run for example anova type tests on the
significance of the coefficients of several dummy variables

for example from example_ols_tftest.py
R3 = np.eye(ncat)[:-1,:]
Ftest = res2.f_test(R3)
print repr((Ftest.fvalue, Ftest.pvalue))
R3 = np.atleast_2d([0, 1, -1, 2])
Ftest = res2.f_test(R3)
print repr((Ftest.fvalue, Ftest.pvalue))

print 'simultaneous t-test for zero effects'
R4 = np.eye(ncat)[:-1,:]
ttest = res2.t_test(R4)
print repr((ttest.tvalue, ttest.pvalue))


Currently these tests only use the standard parameter covariance from
the model, but in practice in econometrics, it is now more common to
use robust estimators of the covariance, variations on White or
Newey-West sandwich estimators.

we could add a keyword like t_test(...., usecov='HC1')

f_test has a keyword invcov=None but I'm not sure what this is used for.

BTW:
I thought f_test tests also linear restrictions R*beta = r and not
just R*beta = 0, with R the restriction matrix, but I haven't looked
at this end of last summer.

Josef
Reply all
Reply to author
Forward
0 new messages