generic results that are theoretically invalid - start a blacklist?

12 visualizações
Pular para a primeira mensagem não lida

josef...@gmail.com

não lida,
28 de mai. de 2012, 10:56:5728/05/2012
para pystatsmodels
How much consumer protection do we need?

I'm still trying to figure out what Stata is doing (and it's a lot
easier than finding out what is available in R)

from xtgls postestimation help
'''
(1) AIC and BIC are available only if igls and corr(independent)
were specified at estimation.
(2) Likelihood-ratio tests are available only if igls and
corr(independent) were specified at
estimation.
'''

The problem here is that some methods are not even asymptotically
equivalent to maximum likelihood estimation. In these cases Stata does
not provide those results. (xtgls is a function that allows for
various options and model assumptions.)
The likelihood ratio test is theoretically not appropriate in this case.

another example, after calling stcox, Stata has this
e(marginsnotok) : "CSNell DEViance DFBeta ESR LDisplace LMax
MGale SCAledsch SCHoenfel.."
margins_not_ok sounds like prohibiting some results, although I
haven't looked at it yet.


The same problem shows up in statsmodels, when we use a generic class
or inherit generic results in a subclass. It does not show up in
"single-purpose" classes, where we have the results specifically
targeted for the model.
As example, I'm looking at linear models, that estimate a covariance
structure (similar to xtgls or others in Stata). The estimation
results returns a linear RegressionResults instance, which might
include, depending on the estimation details/options, results that are
theoretically not appropriate (theoretically not justified or
theoretically incorrect).

Question
-------------
Should we try to follow Stata's example and try to prohibit or warn
users from using theoretically incorrect results?

Currently we just have sometimes a warning in the docs: "this inherits
RegressionResults and not all results might be appropriate" or
something like this.

It will be quite a bit of work to actually check the theory for all
inherited results, but we could set up the infrastructure for this.

Possible Implementation
-----------------------------------
models define a blacklist, the result instances check the blacklist
and raise a warning or exception if the method or attribute is
blacklisted.
completely deleting the method or attribute might work in some
cases, but is more difficult to keep track of.

related:
similar to the z-values versus t-values discussion that we had several times
for FGLS Stata chooses one or the other depending on the model details
https://github.com/statsmodels/statsmodels/issues/285

approximately: GLS with known weights or covariance has t-values, with
estimated covariance matrix has z-values

The main location where we can present "opinionated" results to users
right now is summary() which should be expanded more.
But for attributes/methods conditional return, warnings and exceptions
are the only way.


----------------
aside: Misspecification

The issue here is different from users requesting inappropriate
results because their model is misspecified.
If a user uses OLS standard errors, when there are autocorrelated
residuals, then it's the problem of the user that the results are
incorrect. All we can do is provide specification tests and correct
methods. If a user insists on fitting the wrong model, then no
statistical package can force the user not to.

In the case above we have results that are theoretically inconsistent
or unjustified with the model that the user requested.


<copied as SMEP-D:
https://github.com/statsmodels/statsmodels/wiki/SMEP-D:-Blacklisting-Results
>
D for design, the other ones are SMEP-E, for enhancement


Josef
"All models are wrong, but are they good enough."
Responder a todos
Responder ao autor
Encaminhar
0 nova mensagem