Good question.
Let me begin with the end of your post.
The SPA test will not have the discontinuity that you dislike,
assuming we do not use a crazy level for the test, e.g. \alpha =
99.999%.
The asymptotic distribution of the test statistic is continuous and
has CDF(0) <= 50%. So there will always be positive realizations of
the test statistic for which the corresponding p-value is very close
to, or exceeds, 50%.
I am not sure that it is helpful for understanding the issue, but
recall that we need to scale \bar{d} by root-n, otherwise we have a
degenerate limit distribution.
Your description of the SPA procedure is correct. "Terminate if the
test statistic is negative".
I suppose that one might be interested in a p-value even if the test-
statistics is negative, but as I write this I cannot think of a reason
why. In terms of make a decision about the null hypothesis the p-value
is irrelevant. My point is that max(\bar{d}) < 0 is fully consistent
with the null hypothesis, and the null should not be rejected. In the
context of SPA testing... You are testing the null that the benchmark
is "best" and in the sample you have, the benchmark was indeed better
than any alternative.
It is analogous to saying that we will not reject the null hypothesis
H_0: \theta = 0 if \hat\theta = 0, in a situation where we know that
\hat\theta ~ N(\theta,1).
Perhaps your point is that with small sample sizes the bootstrap
estimate of the CDF need not be good, and in some samples the
bootstrap estimate of the CDF is such that \hat{CDF}(0) is close to
one. Still the bootstrap may work well on average, and samples such as
that discussed above, fall into the cases where we will make a Type I
error.
Cheers,
-Peter