knowledge encoded ?

josef...@gmail.com

unread,

Oct 29, 2013, 1:53:51 PM10/29/13

to pystatsmodels

(partial followup on the robust covariance discussion)

"The user is responsible to check whether a statistical analysis is valid."

In many cases, statsmodels will happily produce results even if the analysis doesn't make much sense, or is statistically not justified.

some older cases

http://statsmodels.sourceforge.net/devel/pitfalls.html

As we add more options and models to statsmodels there will be more combinations that are statistically not justified.

(Besides the statistical "validity", there are also possible numerical problems that we also warn about only partially.)

Users Beware! or More Warnings Please!

Encoding statistical knowledge requires that the developer also has it. But that's not always true.

Help Wanted:

If you know a case where we get an invalid result, then please help us and open an issue or a pull request.

Stata has in some cases a warning in the docstring for cases where the calculations are not valid, but the program has no way to verify this.

In other cases, Stata's estimation results have something like a blacklist for results that can or should not be calculated.

In other cases, Stata just refuses to calculate.

But we don't have the knowledge base nor the manpower of Stata development.

(I have no idea what R is doing in these cases.)

Josef

example

Statsmodels

>>> res_olsg.compare_lr_test(res_ols2)

(4.6679440835894184, 0.030730693840286833, 1.0)

Stata (refuses to calculate without 'force')

. lrtest A B, stats

LR test likely invalid for models with robust vce

r(498);

. lrtest A B, stats force

Likelihood-ratio test LR chi2(1) = 4.67

(Assumption: B nested in A) Prob > chi2 = 0.0307

-----------------------------------------------------------------------------

Model | Obs ll(null) ll(model) df AIC BIC

-------------+---------------------------------------------------------------

B | 202 . -766.3092 2 1536.618 1543.235

A | 202 . -763.9752 3 1533.95 1543.875

-----------------------------------------------------------------------------

Note: N=Obs used in calculating BIC; see [R] BIC note

Kevin Sheppard

unread,

Oct 30, 2013, 6:30:42 AM10/30/13

to pystat...@googlegroups.com

This is a difficult task, and I think most of the burden has to fall on the person applying a method to understand what she is doing.

In the above example, there is no (necessry) contradiction in using robust VCV and using a LR test since the robust VCV is a consistent estimator of the covariance even when the model is homoskedastic, gaussian. The LR test statistic continues to be useful with heteroskedasticity although it has a different distribution and so the p-val is not usually correct, and depends on the eigenvalues of A^-1*B where A is the Hessian and B is the cov of the scores.

What may be more useful would be to supply testing frameworks which are valid under weak conditions. For example, a large-sample version of the LR test can be constructed using a GMM-type argument. Similarly LM tests are simple to implement in OLS/WLS models.

I'll give these two a shot later this week.

It may also be useful to verify that models are, in fact nested, when performing the test. This is simply a case of regressing the regressors from 1 model on the other model and checking that the R2 is 1.

Kevin Sheppard

unread,

Oct 30, 2013, 8:32:38 AM10/30/13

to pystat...@googlegroups.com

I have made an initial commit on this, no tests, examples or anything else.

https://github.com/bashtage/statsmodels/commit/c0d24a989f130c6ef21621c3b158921579e737de

Note sure about the change to the signature of compare_lr_test though, since all of the work is done by compare_lm_test.

josef...@gmail.com

unread,

Oct 30, 2013, 10:18:44 AM10/30/13

to pystatsmodels

On Wed, Oct 30, 2013 at 8:32 AM, Kevin Sheppard <kevin.k....@gmail.com> wrote:

I have made an initial commit on this, no tests, examples or anything else.

https://github.com/bashtage/statsmodels/commit/c0d24a989f130c6ef21621c3b158921579e737de

Nice, I didn't know we can calculate so easily the score of the unrestricted model at the restricted parameters.

I thought we need to know what the restrictions on the parameters are, which I guess we still do with other MLE models. (Until now I only ever coded LM tests with r_squared of auxilliary regressions.)

It took me a bit of time to figure out what was going on.

About S or Sinv: We only have the (robust) cov_params for the unrestricted and restricted models.

If we want the same type of robust cov as the unrestricted (or restricted ? what if cov_type differs?) model, then we would need to call the sandwich functions directly, which will need a small adjustments to handle

`(exog_unrestricted * resid_restricted)` or with `unrestricted.jac(restricted.params)` directly.

In discrete MLE we have scores per observation already available as `jac` method.

Note sure about the change to the signature of compare_lr_test though, since all of the work is done by compare_lm_test.

If they are conceptually different (LM versus LR), then I think I prefer to keep them separate with cross-linking in the docstring or in warnings, even if they are asymptotically the same.

Especially if we would be able to get the correct distribution for LR as you mentioned previously.

In cases like this, it's a lot easier for me to figure out the econometrics after seeing the code than just looking at a few hundred pages of text books.

Thanks,

Josef

josef...@gmail.com

unread,

Oct 30, 2013, 1:18:02 PM10/30/13

to pystatsmodels

On Wed, Oct 30, 2013 at 6:30 AM, Kevin Sheppard <kevin.k....@gmail.com> wrote:

This is a difficult task, and I think most of the burden has to fall on the person applying a method to understand what she is doing.

At the end it is always the responsibility of the user to choose the methods. However, it would be very helpful **if** the software package could help out, either warning about the use of some methods or pointing to some alternative methods.

I doubt many users have a good overview once it's outside of their specialty or outside of the common model usages.

another question at the end

In the above example, there is no (necessry) contradiction in using robust VCV and using a LR test since the robust VCV is a consistent estimator of the covariance even when the model is homoskedastic, gaussian. The LR test statistic continues to be useful with heteroskedasticity although it has a different distribution and so the p-val is not usually correct, and depends on the eigenvalues of A^-1*B where A is the Hessian and B is the cov of the scores.

I ran into this linear combination of chisquare distributions already often in other contexts.

It's on my "eventually we need to have this" list, but often the references were not very clear from which matrix we are supposed to calculate the eigenvalues.

What may be more useful would be to supply testing frameworks which are valid under weak conditions. For example, a large-sample version of the LR test can be constructed using a GMM-type argument. Similarly LM tests are simple to implement in OLS/WLS models.

I have big gaps in my knowledge of these areas.

However, in the last month I worked another round on GMM, which also targets the GMM version of some models (like Poisson/Exponential) with minimal assumptions, and the same list of cov_type sandwiches.

I don't have many associated hypothesis tests yet, but this should be ready in the next months.

I'll give these two a shot later this week.

It may also be useful to verify that models are, in fact nested, when performing the test. This is simply a case of regressing the regressors from 1 model on the other model and checking that the R2 is 1.

good idea, I can add this, unless you want to open a pull request.

another case - partially a follow-up to your LM test, that uses wexog (other thread)

We have been somewhat conservative in extending OLS based tests to WLS or GLS, because it is often difficult to find clear references (maybe because the answer should be obvious or nobody would be doing that.)

Can we use LR or LM test if the two models differ in their whitening transformation exog -> wexog?

For example, OLS versus WLS with parametric heteroscedacity, or WLS with two different parametric (nested) heteroscedasticity functions, or OLS versus GLSAR (GLS with autoregressive error).

Maybe only the generic LR test is doing currently the right thing or would currently apply.

Josef

Kevin Sheppard

unread,

Oct 30, 2013, 5:20:26 PM10/30/13

to pystat...@googlegroups.com

another case - partially a follow-up to your LM test, that uses wexog (other thread)

We have been somewhat conservative in extending OLS based tests to WLS or GLS, because it is often difficult to find clear references (maybe because the answer should be obvious or nobody would be doing that.)

Can we use LR or LM test if the two models differ in their whitening transformation exog -> wexog?
For example, OLS versus WLS with parametric heteroscedacity, or WLS with two different parametric (nested) heteroscedasticity functions, or OLS versus GLSAR (GLS with autoregressive error).

Maybe only the generic LR test is doing currently the right thing or would currently apply.

Josef

Pretty much never. When using different weights, it is "as-if" a different y is being used in each model - and likelihood ratio tests never make sense when using different dependent variables. WLS models with the same weights are find for tests since these are still nested, and the exogenous regressors will still be nested (the R^2 from the multiple regression will still be 1). If different weights are used, these will no longer be valid.

I suppose that tests from weighted regressions should check whether the weights are the same? In addition to the test I mentioned above.

josef...@gmail.com

unread,

Oct 30, 2013, 5:45:42 PM10/30/13

to pystatsmodels

On Wed, Oct 30, 2013 at 5:20 PM, Kevin Sheppard <kevin.k....@gmail.com> wrote:

another case - partially a follow-up to your LM test, that uses wexog (other thread)

We have been somewhat conservative in extending OLS based tests to WLS or GLS, because it is often difficult to find clear references (maybe because the answer should be obvious or nobody would be doing that.)

Can we use LR or LM test if the two models differ in their whitening transformation exog -> wexog?
For example, OLS versus WLS with parametric heteroscedacity, or WLS with two different parametric (nested) heteroscedasticity functions, or OLS versus GLSAR (GLS with autoregressive error).

Maybe only the generic LR test is doing currently the right thing or would currently apply.

Josef

Pretty much never. When using different weights, it is "as-if" a different y is being used in each model - and likelihood ratio tests never make sense when using different dependent variables. WLS models with the same weights are find for tests since these are still nested, and the exogenous regressors will still be nested (the R^2 from the multiple regression will still be 1). If different weights are used, these will no longer be valid.

They can still be nested in terms of the full MLE parameter vector. ?

AR(1) with rho not equal zero versus rho = 0

Heteroscedasticity or ARCH effects: H0: coefficients in the variance function all zero except for constant.

I don't know if there is a direct connection to the LM tests that we have for homoskedasticity and ARCH effects.

I wrote them with only OLS in mind.

http://statsmodels.sourceforge.net/devel/stats.html#residual-diagnostics-and-specification-tests

Josef

josef...@gmail.com

unread,

Oct 30, 2013, 7:44:23 PM10/30/13

to pystatsmodels

As motivation for my "weird" questions:

I recently started to look at GMM with additional moment conditions for the second moments to see when we get consistent estimators with or without misspecification.

I'd like to clean up and add more support for models with parametric heteroscedasticity or cluster- and auto-correlation, when the parametric covariance matrix is assumed to be correctly specified, but also when it is not assumed to be correctly specified.

We will then also have the robust sandwich covariances available after GLSAR, GLSHet and similar.

In GMM, I have similar issues with `compare_jtest` where I'm still not sure about all the conditions that are required for it to be valid.