Alternative to chisquare when sample size is too large

34 views
Skip to first unread message

phcj...@gmail.com

unread,
Mar 9, 2014, 3:05:41 PM3/9/14
to piface-d...@googlegroups.com
I just ran a chisquare on a 2 way table of frequencies with a sample size that is so large, I am not surprized that I got a really outlandish significance level.  I have frequencies from two source of data. The frequencies are broken down within each data source by the 13 independent categories. One data source has a sample size of over 33 million, the other over 142 million. The issue is that when you inspect the frequencies for the 13 categories, it's pretty obvious that they (percentages) are distributed the same between then two data sources. The distribution of percentages between the two data sources vary in the most by 1 % for any given category. The ordering of the categories is the same between the two data sources, except for two. The reversal is by about 1-2%.  I was hoping to reflect the similarity of the two distributions by a non-significant test.  As far as I know, the chisquare test is notorious for inflating significance with large sample sizes.

Any suggestions as to how else I could test this? ... or do I just rely on the good old "eye-ball".

Thanks very much

Lenth, Russell V

unread,
Mar 10, 2014, 2:23:55 PM3/10/14
to phcj...@gmail.com, piface-d...@googlegroups.com
First of all, and this is important, a nonsignificant statistical test is never convincing evidence of anything. Not. Ever. Another way to say this is that absence of proof is not the same as proof of absence. Nonsignificance is absence of proof.

The way to address this is to do a formal test of equivalence -- a topic that, unfortunately, is missing from almost all statistics courses. In your situation, It could be stated in terms of a parameter theta, whereby N*theta is the noncentrality parameter of the chi-square statistic, and N is the total sample size. The hypotheses for the equivalence test are:

    H0: theta >= theta0
    H1: theta < theta0

where theta_0 is a specified equivalence threshold (i.e., is theta < theta0, we consider the groups close enough to be considered equivalent). With this framework, if one rejects H0, we then have strong evidence in favor of H1, that theta is small.

To do the test, proceed as follows:
  1. Create a 2 x 13 table of hypothetical proportions that reflects a pattern of proportions that you consider at the threshold between equivalence and nonequivalence. Each row should sum to 1. Do not use the observed data. Do this carefully in consideration of the scientific issues and opinions of experts, and without looking ahead to see if it yields the P value you want.
  2. Multiply the rows by the sample sizes (33 million and 142 million?)
  3. Compute the chi-square statistic for this hypothetical table. Call this value chi0^2. That will be the value of N*theta0.
  4. Compute the chi-square statistic for the observed data. Call this value chi^2.
  5. The P value for the equivalence test is the cumulative probability of chi^2, computed from the noncentral chi^2 distribution with (13-1)*(2-1)=12 d.f. and noncentrality parameter chi0^2. (Programs like SAS, R, and Minitab have such a function.)
For more on equivalence tests, there is a paper by Schuirmann and a book by Wellek. See also the 2001 paper in The American Statistician by Hoenig and Heisey, "The Abuse of Power", that explains more about why nonsignificant tests and power calculations are inappropriate methods for assessing equivalence. There is also an R package named 'equivalence' that looks promising, but I have not tried it myself.

Russ
Russell V. Lenth  -  Professor Emeritus
Department of Statistics and Actuarial Science   
The University of Iowa  -  Iowa City, IA 52242  USA   
Voice (319)335-0712 (Dept. office)  -  FAX (319)335-3017
--
You received this message because you are subscribed to the Google Groups "PiFace discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to piface-discuss...@googlegroups.com.
To post to this group, send email to piface-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/piface-discussion.
To view this discussion on the web visit https://groups.google.com/d/msgid/piface-discussion/b119262a-0baa-4459-b7cf-33b1cf01af8f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages