a combinatorial question

anal...@hotmail.com

unread,

Apr 19, 2012, 7:36:10 PM4/19/12

to

Although it seems elementary, I am not aware that standard textbooks
treat this problem.

There is a universal set U of N distinct objects. A fixed subset S of
n distinct objects is chosen from it (0 < n < N).

Another subset T of m (0 < m < N) distinct objects is then chosen from
U. The question is what is the probability distribution of the
cardinality of S intersection T. N may be considered to be infinity,
although m/N and n/N are not vanishingly small.

Herman Rubin

unread,

Apr 20, 2012, 2:16:09 PM4/20/12

to

This is exactly the hypergeometric distribution, for finite N.
That is usually given as taking a sample of size n from a population
of size N for which m are the "marked" elements.

By making it the intersection of two random sets, one can see
that the distribution is symmetric in m and n, which one can
see by expanding the usual formula. But this argument does
not require calculation, and shows why this symmetry occurs.

--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558

Lynne Vickson

unread,

Apr 22, 2012, 2:42:25 AM4/22/12

to

On Apr 19, 4:36 pm, "analys...@hotmail.com" <analys...@hotmail.com>
wrote:

If N is finite and the choice of the m objects comprising T is
"random", the cardinality of the intersection
has a hypergeometric distribution. (The hypergeometric distribution
gives the probability of k type 1 objects
when m objects are chosen without replacement from a population of N1
type 1 and N2 type 2 objects; in
your problem, N1 = n, N2 = N-n and you are asking how many objects in
the random set T are type 1.) If
N is "infinite" but n/N is nonzero, and if you pick a FINITE number m
of objects, you now have the binomial limit of the
hypergeometric, so the cardinality of T intersect S has the binomial
distribution with parameters m and p = n/N.

RGV

anal...@hotmail.com

unread,

Apr 28, 2012, 9:30:07 AM4/28/12

to

Thanks. I am looking at a contingency tables problem. Let x(i,j) =
observed count in row i and column j. r(i) = row sum of row i and
c(j) = column sum of column j and G = grand total count. Typically
r(i).c(j)/G is comapred to x(i,j) to test for interaction between rows
and columns. There seems to be an implied "binomial approximation"
here and now its clear exactly whats going on.

I have another question: Any particluar cell may over- or under-
perform with respect to the expected value under the null hypothesis
of no interaction. Are there one-sided tests for particular cells and
groups of cells to test for over/under performance?

Rich Ulrich

unread,

May 3, 2012, 3:00:26 PM5/3/12

to

I just noticed this post sitting without an answer.

I think you want to know if there is a test on the single cells of
a contingency table, as one question.

The usual chi-squared with k d.f. can be the result of the
sum of k independent 1 d.f. chi-squared variates. For a
contingency table, the cells are not independent, but they
are chi-squared-like. Under the Poisson derivation of a
cintingency table test (that's just one way to derive it),
the variance of each cell is equal to the Expected Value.

The usual test is the sum of chi-squared-like contributions
from the individual cells, using Expected and Observed,
X^2 - sum [ (O-E)^2/E ] ,

Especially for a table that is large, both across and down, where
the d.f. approaches the number of cells, each cell can be
regarded as a 1 d.f. chisquared. But a chisquared is simply
a normal variate, z, squared. So you can take the individual
cell contribution as, approximately, z= (O-E)/ sqrt(E), for a
one-tailed test.

I have only ever used that in a very casual way. I think that
there are slightly different versions available.

For "several cells" -- I wonder what you are going after.
It is possible to "partition" the contingency table into several
separate tests. When the tests are construed as the
Likelihood test, rather than the Pearson test, tests can be
devised that are independent and additive. This can be
useful for testing "linear trend" and so on.

--
Rich Ulrich