not clear on the 'simple case'

62 views
Skip to first unread message

shabbychef

unread,
Mar 3, 2009, 9:12:27 PM3/3/09
to Testing for Superior Predictive Ability

I have implemented SPA in matlab to the best of my ability, however, I
am a bit confused about the 'simple case' where max(\bar{d}) < 0, that
is when the mean losses of all the models in question are inferior to
the benchmark. Under section 2.2, the 2005 paper states that "In this
case there is no evidence against the null hypothesis, and
consequently the null should not be rejected." My feeling is,
however, that due to small sample size (small n), we might not be able
to reject the null at the nominal confidence. I understand that the
paper focuses mostly on the asymptotic (in n) properties of the
bootstrap method, so the small sample properties of the test are
ignored. That being the case, the test seems weirdly discontinuous in
\bar{d} near zero. Either that, or I have totally misunderstood
something about the centering of the bootstrap sample.

To put it another way, I am asking whether the 'test' should proceed
as follows:
1. compute \bar{d};
2. if max(\bar{d}) < 0, terminate (with no p value of any kind?)
without rejecting the null hypothesis;
otherwise, compute a bunch of bootstrap samples, and the p-value
for the null is the mean of a bunch of characteristic functions over
the bootstrap samples.

(I guess what I don't like about this is for all epsilon > 0, there
could be a sample with 0 < max(\bar{d}) < epsilon such that the test
rejects the null, but the test never rejects the null if max(\bar{d})
<= 0.

thanks for any help...


--shab

reinballe

unread,
Mar 4, 2009, 4:13:02 AM3/4/09
to Testing for Superior Predictive Ability
Good question.

Let me begin with the end of your post.

The SPA test will not have the discontinuity that you dislike,
assuming we do not use a crazy level for the test, e.g. \alpha =
99.999%.
The asymptotic distribution of the test statistic is continuous and
has CDF(0) <= 50%. So there will always be positive realizations of
the test statistic for which the corresponding p-value is very close
to, or exceeds, 50%.
I am not sure that it is helpful for understanding the issue, but
recall that we need to scale \bar{d} by root-n, otherwise we have a
degenerate limit distribution.

Your description of the SPA procedure is correct. "Terminate if the
test statistic is negative".
I suppose that one might be interested in a p-value even if the test-
statistics is negative, but as I write this I cannot think of a reason
why. In terms of make a decision about the null hypothesis the p-value
is irrelevant. My point is that max(\bar{d}) < 0 is fully consistent
with the null hypothesis, and the null should not be rejected. In the
context of SPA testing... You are testing the null that the benchmark
is "best" and in the sample you have, the benchmark was indeed better
than any alternative.
It is analogous to saying that we will not reject the null hypothesis
H_0: \theta = 0 if \hat\theta = 0, in a situation where we know that
\hat\theta ~ N(\theta,1).

Perhaps your point is that with small sample sizes the bootstrap
estimate of the CDF need not be good, and in some samples the
bootstrap estimate of the CDF is such that \hat{CDF}(0) is close to
one. Still the bootstrap may work well on average, and samples such as
that discussed above, fall into the cases where we will make a Type I
error.

Cheers,
-Peter

shabbychef

unread,
Mar 5, 2009, 1:57:33 PM3/5/09
to Testing for Superior Predictive Ability

I guess that does make sense: the onus is on the researcher to 'prove'
that the alternative methods
beat the benchmark; having max(\bar{d}) < 0 *and* a small sample size
indicates a failure to provide
both convincing evidence of superiority and enough of it. Regarding
the discontinuity question, I guess if
max(\bar{d}_{pop}) is positive but very small, where \bar{d}_{pop} is
the true population mean, then if the
test fails to reject the null, it is a problem of the power of the
test and doesn't concern the nominal rate of
type I errors. Moreover, any such test would have difficulties if max
(\bar{d}_{pop}) is small but positive.

thanks for your help, my code appears to work as I had hoped it would.

by the way, in the original paper, there are some hints about
estimating the off-diagonal values of the covariance
matrix. Do you think incorporating a full estimate of the covariance
matrix would suffer more from estimation errors
or computational slowdown? I am dealing with the case where the
prediction methods are correlated, but the number
of them is modest (say < 100).

thanks again,

--shab.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages