Confidence interval after Chi-squared test?

Stan Brown

unread,

Dec 1, 2010, 7:32:40 AM12/1/10

to

Hi! The M&M Mars company has color proportions of M&Ms on its Web
site. Let's say we take a random sample of 500 or so. (I know that
actually getting a random sample in this case would be quite
difficult, but for discussion purposes let's assume one.)

Six colors, so five degrees of freedom. We compute a chi-squared of
68.90 and a p-value on the order of 10^-12. Obviously we can say
that the data refute the model. Let us say further that the chi-
squared contributions of three of the colors are each on the order of
20, while those of the other three are much smaller, so we suspect
that three of the colors are quite different from what the model says
and the other three are not.

As I understand things, if we want actually to test that, say, brown,
green, and yellow occur in different proportions from what the
company's Web site says, we would need to take new samples for that
purpose. I understand that we cannot use the existing sample data to
perform a test of proportions on any of the three because we'd in
effect be multiplying our alpha.

But here, with all that prelude, is my question: is it valid to take
confidence intervals on the proportions of brown, green, and yellow
from the original sample of 500 that was used in the chi-squared
test? My gut says no, but I'm having a hard time articulating the
reason since there's no significance level involved.

--
Stan Brown, Oak Road Systems, Tompkins County, New York, USA
http://OakRoadSystems.com
Shikata ga nai...

Ray Koopman

unread,

Dec 1, 2010, 8:45:46 PM12/1/10

to

On Dec 1, 4:32 am, Stan Brown <the_stan_br...@fastmail.fm> wrote:
> Hi! The M&M Mars company has color proportions of M&Ms on its Web
> site. Let's say we take a random sample of 500 or so. (I know
> that actually getting a random sample in this case would be quite
> difficult, but for discussion purposes let's assume one.)
>
> Six colors, so five degrees of freedom. We compute a chi-squared of
> 68.90 and a p-value on the order of 10^-12. Obviously we can say
> that the data refute the model. Let us say further that the chi-
> squared contributions of three of the colors are each on the order
> of 20, while those of the other three are much smaller, so we
> suspect that three of the colors are quite different from what the
> model says and the other three are not.
>
> As I understand things, if we want actually to test that, say,
> brown, green, and yellow occur in different proportions from what
> the company's Web site says, we would need to take new samples for
> that purpose. I understand that we cannot use the existing sample
> data to perform a test of proportions on any of the three because
> we'd in effect be multiplying our alpha.

If you could specify precisely the rule by which you chose to test
brown, green, and yellow then you could simulate it and get a Monte
Carlo estimate of the sampling distribution of your test statistic.

>
> But here, with all that prelude, is my question: is it valid to take
> confidence intervals on the proportions of brown, green, and yellow
> from the original sample of 500 that was used in the chi-squared
> test? My gut says no, but I'm having a hard time articulating the
> reason since there's no significance level involved.

There is a way to do it that's always valid if the expected
frequencies are not too small, but it's messy and computationally
intensive. The 100C% simultaneous confidence region for all the true
proportions in a k-category classificaton is defined as the set of all
k-vectors [p1,...,pk], with all pj >= 0 and sum pj = 1, for which

(fj - n*pj)^2
sum ------------- <= chisquare(k-1,C).
n*pj

The upper (lower) confidencce limit for any one pj is the largest
(smallest) value of pj for which values of the k-1 other p's exist
that will satisfy the inequality.

For k = 3 you can draw a picture. Pick one pj to express in terms
of the other two: e.g., p3 = 1 - p1 - p2. Then plot the boundary of
the region in the (p1,p2) plane within which the inequality holds.
The extrema of the plot give the upper and lower confidence limits.
For p1 and p2, just use the p1 and p2 axes. For p3, look along the
line p1 = p2 to find the minimum and maximum of p1 + p2.

For hypothesis testing, a hypothesis must be retained if any vector
p exists that satisfies both the inequality and the hypothesis. A
hypothesis can be rejected only if no vector p exists that satisfies
both the inequality and the hypothesis. (This is a conservative test,
in much the same way that Scheffe tests are conservative.)

Rich Ulrich

unread,

Dec 1, 2010, 11:39:22 PM12/1/10

to

On Wed, 1 Dec 2010 07:32:40 -0500, Stan Brown
<the_sta...@fastmail.fm> wrote:

>Hi! The M&M Mars company has color proportions of M&Ms on its Web
>site. Let's say we take a random sample of 500 or so. (I know that
>actually getting a random sample in this case would be quite
>difficult, but for discussion purposes let's assume one.)
>
>Six colors, so five degrees of freedom. We compute a chi-squared of
>68.90 and a p-value on the order of 10^-12. Obviously we can say
>that the data refute the model. Let us say further that the chi-
>squared contributions of three of the colors are each on the order of
>20, while those of the other three are much smaller, so we suspect
>that three of the colors are quite different from what the model says
>and the other three are not.
>
>As I understand things, if we want actually to test that, say, brown,
>green, and yellow occur in different proportions from what the
>company's Web site says, we would need to take new samples for that
>purpose. I understand that we cannot use the existing sample data to
>perform a test of proportions on any of the three because we'd in
>effect be multiplying our alpha.

Ray writes nice responses which are more technically precise than
mine.

You raise a question here of "new samples" to confirm what has
been discovered. That is something that depends on the model
of how the question is being asked, which I think is something
that Ray has skipped past. He jumps right to giving the
conservative tests that would be formally correct to start with,
instead of the overall test.

- How formal do you want to be?
- I'd generally be satisfied if someone says that the overall test
demonstrates a difference, and the individual tests (here, 2x2
for each color) show where the differences are.

>
>But here, with all that prelude, is my question: is it valid to take
>confidence intervals on the proportions of brown, green, and yellow
>from the original sample of 500 that was used in the chi-squared
>test? My gut says no, but I'm having a hard time articulating the
>reason since there's no significance level involved.

I take the general approach that a CI is nothing but the inversion
of the respective test, so I could have the same problem.

You could do 6 separate tests in the *first* place, doing
Bonferroni correction to the p-value (0.05 divided by 6).

Ray mentions Sheffe -- the principle for Sheffe is that any
contrast in an ANOVA has to account for the full SS that would
produce an overall test that is significant. Applying this idea to
the problem, you would use the test cutoff for the 5 d.f. test
to generate tests and CIs for each proportion.

--
Rich Ulrich

Stan Brown

unread,

Dec 2, 2010, 6:36:04 AM12/2/10

to

On Wed, 1 Dec 2010 07:32:40 -0500, Stan Brown wrote:
>
[confidence interval after rejecting null hypothesis in a chi-squared
for one category with six possibilities]

Thanks to Rich for your helpful (as usual) response; and to Ray: nice
to "meet" you and thanks also for your helpful response. The two
complemented each other well.

... responding to Ray

I also like very much the idea of a *vector* confidence interval,
which was completely new to me. In the equation, Ray, I assume n is
the sample size. Are the fj's the actual frequencies found in the
sample?

"Computationally intensive" -- I'm guessing iterative techniques are
required here? I'd like to try something with Excel Solver or with
some specialized optimizer software that I have available.

... responding to Rich

I think Rich stated my own attitude well: I look on a CI as the other
side of the coin with respect to the significance test. That's why I
was uneasy with doing standard one-proportion confidence intervals on
the individual colors. I like the analogy to ANOVA, where in a post
test there are ways to compare pairs of means. I guess a Bonferroni
correction would make equal sense here.

But why do you refer to "individual tests (here, 2x2 for each
color)"? I've got only one sample, with six colors. Let's say that
there were 505 in the sample with 29 brown, and the company's stated
proportion was 13%. If that's the only test I was doing, I'd compute
standard error = sqrt(.13*.87/505) = .0150
sample proportion = 29/505 = .057
z = (.057-.13)/.0150 = -4.85
p-value (2-tailed) = 1.24*10^-6
which would certainly show a difference even with a Bonferroni
adjustment. But I guess you're thinking of something else?

Rich Ulrich

unread,

Dec 2, 2010, 5:14:35 PM12/2/10

to

oops, sorry, not appropriate. Yes, what was going through my
head was something else, coming from some place else.

Some other M&Ms problem with equal proportions?

--

Rich Ulrich

Ray Koopman

unread,

Dec 3, 2010, 1:57:37 AM12/3/10

to

On Dec 2, 3:36 am, Stan Brown <the_stan_br...@fastmail.fm> wrote:
> On Wed, 1 Dec 2010 07:32:40 -0500, Stan Brown wrote:
>

> [confidence interval after rejecting null hypothesis in
> a chi-squared for one category with six possibilities]
>
> Thanks to Rich for your helpful (as usual) response; and to Ray:
> nice to "meet" you and thanks also for your helpful response.
> The two complemented each other well.
>
> ... responding to Ray
>
> I also like very much the idea of a *vector* confidence interval,
> which was completely new to me. In the equation, Ray, I assume n
> is the sample size. Are the fj's the actual frequencies found in
> the sample?

Yes, n is the sample size, and the fj's are the observed frequencies.

>
> "Computationally intensive" -- I'm guessing iterative techniques are
> required here? I'd like to try something with Excel Solver or with
> some specialized optimizer software that I have available.

For numerical work it's easier (and probably more accurate) to work
with the expected frequencies, ej = n*pj, and divide by n at the end.
I use Mathematica and let it handle all the dirty work in between.

With k = 3 I plot the 100(1-alpha)% confidence region for (e1,e2) by
using ImplicitPlot to show the contour (f1-e1)^2/e1 + (f2-e2)^2/e2 +
(f3-n+e1+e2)^2/(n-e1-e2) = -2*Log[alpha] in the (e1,e2) plane. I've
also done some barycentric plots, but they're more work to produce,
and I find the pairwise plots easier to interpret.

For confidence limits on individual terms I use NMaximize or NMinimize
to maximize or minimize one of the ej's, subject to the constraints
sum (fj-ej)^2/ej = 2*InverseGammaRegularized[(k-1)/2,alpha],
sum ej = n, and all ej > 0. (For k = 3, the right hand side of the
first constraint simplifies to -2*Log[alpha].)

Stan Brown

unread,

Dec 3, 2010, 7:14:15 AM12/3/10

to

On Thu, 2 Dec 2010 22:57:37 -0800 (PST), Ray Koopman wrote:
>
> On Dec 2, 3:36 am, Stan Brown <the_stan_br...@fastmail.fm> wrote:
> > On Wed, 1 Dec 2010 07:32:40 -0500, Stan Brown wrote:
> > [confidence interval after rejecting null hypothesis in
> > a chi-squared for one category with six possibilities]
> >

> > ... responding to Ray
> >
> > I also like very much the idea of a *vector* confidence interval,
> > which was completely new to me. In the equation, Ray, I assume n
> > is the sample size. Are the fj's the actual frequencies found in
> > the sample?
>
> Yes, n is the sample size, and the fj's are the observed frequencies.
>

> For numerical work it's easier (and probably more accurate) to work
> with the expected frequencies, ej = n*pj, and divide by n at the end.
> I use Mathematica and let it handle all the dirty work in between.

[Mathematica advice]

Thanks, Ray! I don't actually have Mathematica. I'll look into
getting an academic discount through the school, or I believe there's
an online site too that has a lot of the same functions.

Stan Brown

unread,

Dec 3, 2010, 7:15:25 AM12/3/10

to

On Thu, 02 Dec 2010 17:14:35 -0500, Rich Ulrich wrote:
>
> On Thu, 2 Dec 2010 06:36:04 -0500, Stan Brown
> <the_sta...@fastmail.fm> wrote:
>
> >On Wed, 1 Dec 2010 07:32:40 -0500, Stan Brown wrote:
> >[confidence interval after rejecting null hypothesis in a chi-
> >squared for one category with six possibilities]
> >

> >... responding to Rich

> >
> >But why do you refer to "individual tests (here, 2x2 for each
> >color)"?
>

> oops, sorry, not appropriate. Yes, what was going through my
> head was something else, coming from some place else.
>
> Some other M&Ms problem with equal proportions?

No, problem, Rich. Thanks for clarifying!

Stan Brown

unread,

Jun 6, 2011, 11:22:31 AM6/6/11

to

I asked some questions about this in December, and Ray Koopman and
Rich Ulrich were kind enough to respond. I said at the time that I
wanted to create an Excel workbook to do this with Solver.
Recovering from surgery, I've finally had enough "thinking time" to
do that, plus a (hopefully correct) explanation of both approaches to
confidence intervals.

No obligation of course, but if you're interested the Web page with
Excel workbook is here:

http://www.tc3.edu/instruct/sbrown/stat/gof_ci.htm

By the way, the original Usenet thread is archived here:

https://groups.google.com/group/sci.stat.edu/browse_thread/thread/5ee
f0a33244e080f

Ray Koopman

unread,

Jun 8, 2011, 1:41:36 AM6/8/11

to

Look good. Your numbers agree with what I get from Mathematica. I
hope this prompts some students to think about the larger question of
marginal vs joint inference, not only for proportions in frequency
distributions but also for means in anova.

Somewhat off-topic, why do you use the Wald method, instead of the
Wilson or adjusted Wald (e.g., Agresti & Caffo, American Statistician,
2000), to get your simple binomial CIs? n = 628 is not so big that
the differences disappear:

Wilson Adj. Wald Wald
Blue (16.3, 24.8) (16.3, 24.8) (16.0, 24.4)
Brown ( 7.3, 13.6) ( 7.3, 13.7) ( 6.9, 13.2)
Green (15.6, 23.9) (15.6, 23.9) (15.3, 23.6)
Orange (19.3, 28.1) (19.3, 28.1) (19.0, 27.9)
Red (11.5, 18.9) (11.4, 18.9) (11.1, 18.5)
Yellow ( 9.1, 15.9) ( 9.1, 16.0) ( 8.7, 15.5)

Stan Brown

unread,

Jun 8, 2011, 9:50:33 PM6/8/11

to

On Tue, 7 Jun 2011 22:41:36 -0700 (PDT), Ray Koopman wrote:
>
> On Jun 6, 8:22 am, Stan Brown <the_stan_br...@fastmail.fm> wrote:
> > No obligation of course, but if you're interested the Web page
> > with Excel workbook is here:
> > http://www.tc3.edu/instruct/sbrown/stat/gof_ci.htm

> Look good. Your numbers agree with what I get from Mathematica. I

> hope this prompts some students to think about the larger question of
> marginal vs joint inference, not only for proportions in frequency
> distributions but also for means in anova.
>
> Somewhat off-topic, why do you use the Wald method, instead of the
> Wilson or adjusted Wald (e.g., Agresti & Caffo, American Statistician,
> 2000), to get your simple binomial CIs? n = 628 is not so big that
> the differences disappear:
>
> Wilson Adj. Wald Wald
> Blue (16.3, 24.8) (16.3, 24.8) (16.0, 24.4)
> Brown ( 7.3, 13.6) ( 7.3, 13.7) ( 6.9, 13.2)
> Green (15.6, 23.9) (15.6, 23.9) (15.3, 23.6)
> Orange (19.3, 28.1) (19.3, 28.1) (19.0, 27.9)
> Red (11.5, 18.9) (11.4, 18.9) (11.1, 18.5)
> Yellow ( 9.1, 15.9) ( 9.1, 16.0) ( 8.7, 15.5)

To be honest, I use the Wald method out of inertia. That's a lousy
reason, I admit, so I should probably do something about that.

I do get a difference with adjusted Wald, but not so great a
difference as you show.

Blue: phat = (127+2)/(628+4) = .2041
S.E. = sqrt(.2041*.7959/(628+4)) = .0160
CI Lo = .2041 - 2.6121*.0160 = 16.2%
CI Hi = .2041 + 2.6121*.0160 = 24.6%

What am I missing?

Ray Koopman

unread,

Jun 9, 2011, 12:46:54 AM6/9/11

to

A couple of different things are going on. First, we're using
different z-values. I told Mathematica to use alpha = 1 - .95^(1/6).
It then told me that alpha = .00851244 and that the corresponding z
= 2.63104. (alpha = .05/6 would have given z = 2.63826.) Second,
you added 2 pseudo-observations to each count, but I added z^2/2.
Both are correct, because there is no convention yet as to what
the "adjusted Wald" procedure is.

Stan Brown

unread,

Jun 9, 2011, 8:07:03 AM6/9/11

to

On Wed, 8 Jun 2011 21:46:54 -0700 (PDT), Ray Koopman wrote:
>
> On Jun 8, 6:50 pm, Stan Brown <the_stan_br...@fastmail.fm> wrote:
> > On Tue, 7 Jun 2011 22:41:36 -0700 (PDT), Ray Koopman wrote:
> >>
> >> Wilson Adj. Wald Wald
> >> Blue (16.3, 24.8) (16.3, 24.8) (16.0, 24.4)
> >

> > I do get a difference with adjusted Wald, but not so great a
> > difference as you show.
> >
> > Blue: phat = (127+2)/(628+4) = .2041
> > S.E. = sqrt(.2041*.7959/(628+4)) = .0160
> > CI Lo = .2041 - 2.6121*.0160 = 16.2%
> > CI Hi = .2041 + 2.6121*.0160 = 24.6%
> >
> > What am I missing?
>
> A couple of different things are going on. First, we're using
> different z-values. I told Mathematica to use alpha = 1 - .95^(1/6).
> It then told me that alpha = .00851244 and that the corresponding z
> = 2.63104. (alpha = .05/6 would have given z = 2.63826.) Second,
> you added 2 pseudo-observations to each count, but I added z^2/2.
> Both are correct, because there is no convention yet as to what
> the "adjusted Wald" procedure is.

Thanks, Ray! I don't have online access to the article you cited,
but I did google and found another article with the "other" adjusted
Wald, involving adding z².

I was using alpha = 1 - .95^(1/6), but rounded to 1-alpha = .991 or
alpha = .009, rather than .0085 as you had. When we're that far out
in the tails it's a really bad idea to round in the middle of the
calculation, so I'll look back at my workbook and if necessary change
it so that it doesn't do that.

As for your original question of Wald versus adjusted Wald, I'll give
that some thought too. The syllabus for my course (which I don't
control) uses Wald only, but then the syllabus for my course also
doesn't do anything with confidence intervals for goodness of fit.
So I have to decide just how many new concepts I want to put into the
one Web page, which will be read only by the occasional motivated
student.

Stan Brown

unread,

Jun 11, 2011, 7:17:10 AM6/11/11

to

On Tue, 7 Jun 2011 22:41:36 -0700 (PDT), Ray Koopman wrote:
>

> On Jun 6, 8:22 am, Stan Brown <the_stan_br...@fastmail.fm> wrote:

> > http://www.tc3.edu/instruct/sbrown/stat/gof_ci.htm

>
> Somewhat off-topic, why do you use the Wald method, instead of the
> Wilson or adjusted Wald (e.g., Agresti & Caffo, American Statistician,
> 2000), to get your simple binomial CIs? n = 628 is not so big that
> the differences disappear:

I ended up straddling the fence on this one. I did stick with the
wald interval in calculations, because that's the one on the TI-83/84
calculators that students use. But in the acknowledgments paragraph
I added a reference to using Wilson of adjusted Wald. I want to
think some more about how to let the brighter students know that the
Wald, though easy to compute, is not really the final or only answer.

I did fix the premature rounding of the confidence level and
therefore of the critical z.

Again, Ray, many thanks for your help both in the group and in email.