z score calc

AverageJoe

unread,

Aug 21, 2009, 10:51:26 AM8/21/09

to

Hi,
suppose the probability for a normal distributed event is 25%.
Someone achieves a hit rate of 34.46% (122 hits out of 354).
How significant is this result, ie. what is the z score for this result?
Is it z = 3.89 ?

adamk

unread,

Aug 21, 2009, 11:23:05 AM8/21/09

to

> Do you mean to say that the number of hits

sorry, that should be the % of hits.

is normally-distributed with a mean of 25% hits?.
> Do you know the population standard deviation
> sigma, or did you calculate s from a sample?. If you know now the pop. stand. dev. sigma, then the z-score is
> is automatic:
>
> z=(34.46-25)/sigma
>
> Otherwise, you may have gotten the statistic s
> ic s from sampling?. If so, how large is your sample of
> hits/attempts?. The problem becomes very different.

adamk

unread,

Aug 21, 2009, 11:21:29 AM8/21/09

to

Do you mean to say that the number of hits is

normally-distributed with a mean of 25% hits?.
Do you know the population standard deviation sigma,

or did you calculate s from a sample?. If you know the
pop. stand. dev. sigma, then the z-score is automatic:

z=(34.46-25)/sigma

Otherwise, you may have gotten the statistic s from

AverageJoe

unread,

Aug 21, 2009, 11:47:46 AM8/21/09

to

adamk wrote:
>> Hi,
>> suppose the probability for a normal distributed
>> event is 25%.
>> Someone achieves a hit rate of 34.46% (122 hits out
>> of 354).
>> How significant is this result, ie. what is the z
>> score for this result?
>> Is it z = 3.89 ?
>

> Do you mean to say that the % of hits is

> normally-distributed with a mean of 25% hits?.
> Do you know the population standard deviation sigma,
> or did you calculate s from a sample?. If you know the
> pop. stand. dev. sigma, then the z-score is automatic:
>
> z=(34.46-25)/sigma
>
> Otherwise, you may have gotten the statistic s from
> sampling?. If so, how large is your sample of
> hits/attempts?. The problem becomes very different.

I try to understand/verify the statistics below (which I tried to summarize above).
There is no sigma or stddev given, and maybe it is not neccessary in this case (?).
Here's the original wording:
"
Of the 354 trials, 122 produced direct hits. This is a 34% hit rate [...]
(25% is expected by chance). The 34% hit rate is statistically significant
with a z score of 3.89, meaning that there is a 1 in 45,000 chance that
a hit rate of at least 34% is observed in the experiment when the
true hit probability would really be 25%.
"
(Source: http://en.wikipedia.org/wiki/Ganzfeld_experiment#Analysis_of_results )

Is the given z score of 3.89 correct?

Thx

bert

unread,

Aug 21, 2009, 4:48:50 PM8/21/09

to

It seems from the description that the distribution
of successes should be Poisson. A 25% success rate
then implies an average score in 354 trials of 88.5
with a variance of 66.4, or standard deviation of
8.15. Assuming such a Poisson distribution closely
approximates a Gaussian, the difference (33.5) of the
actual score from the mean would be just over four
standard deviations. If the report of the experiment
calls it 3.89, I'll go along with that, as due to the
distribution being only approximately Gaussian.
--

AverageJoe

unread,

Aug 21, 2009, 5:47:35 PM8/21/09

to

Thx, but how did you compute the variance?

a 3

unread,

Aug 21, 2009, 5:56:54 PM8/21/09

to

In [[combinatorial mathematics]], the '''Catalan numbers''' form a
[[sequence]] of [[natural number]]s that occur in various [[counting
problem]]s, often involving
[[recursion|recursively]] defined objects. They are named for the
[[Belgium|Belgian]] [[mathematician]] [[Eugène Charles Catalan]] (1814–
1894).

The ''n''th Catalan number is given directly in terms of
[[binomial coefficient]]s by

:<math>C_n = \frac{1}{n+1}{2n\choose n} = \frac{(2n)!}{(n+1)!\,n!}
\qquad\mbox{ for }n\ge 0.</math>

The first Catalan numbers {{OEIS|id=A000108}} for ''n'' = 0, 1, 2,
3, ? are

==Properties==

An alternative expression for ''C''''n'' is
:<math>C_n = {2n\choose n} - {2n\choose n-1} \quad\mbox{ for }n\ge 1.</
math>
This shows that ''C''''n'' is a [[natural number]], which
is not ''a priori'' obvious from the first formula given. This
expression forms the basis for Andr�'s proof of the correctness of the
formula (see below under [[#Second proof|second proof]]).

The Catalan numbers satisfy the [[recurrence relation]]
:<math>C_0 = 1 \quad \mbox{and} \quad C_{n+1}=\sum_{i=0}^{n}C_i\,C_{n-
i}\quad\mbox{for }n\ge 0.</math>

They also satisfy:
:<math>C_0 = 1 \quad \mbox{and} \quad C_{n+1}=\frac{2(2n+1)}{n+2}C_n,</
math>
which can be a more efficient way to calculate them.

Asymptotically, the Catalan numbers grow as

:<math>C_n \sim \frac{4^n}{n^{3/2}\sqrt{\pi}}</math>

in the sense that the quotient of the ''n''th Catalan
number and the expression on the right [[limit (mathematics)|tends
towards]] 1 for ''n'' ? ?. (This can be proved by using [[Stirling's
approximation]] for ''n''!.)

== Applications in combinatorics ==

There are many counting problems in [[combinatorics]] whose solution
is given by the Catalan numbers. The book ''Enumerative Combinatorics:
Volume 2'' by combinatorialist [[Richard P. Stanley]] contains a set
of exercises which describe 66 different interpretations of the
Catalan numbers. Following are some examples, with illustrations of
the case ''C''3 = 5.

*''C''''n'' is the number of '''Dyck words''' of length
2''n''. A Dyck word is a [[string (computer science)|string]]
consisting of ''n'' X's and ''n'' Y's such that no initial segment of
the string has more Y's than X's (see also [[Dyck language]]). For
example, the following are the Dyck words of length 6:
<center><big> XXXYYY XYXXYY XYXYXY XXYYXY XXYXYY.</
big></center>

*Re-interpreting the symbol X as an open [[parenthesis]] and Y as a
close parenthesis, ''C''''n'' counts the number of
expressions containing ''n'' pairs of parentheses which are correctly
matched:
<center><big> ((())) ()(()) ()()() (())() (()()) </
big></center>

* ''C''''n'' is the number of different ways ''n'' + 1
factors can be completely [[bracket|parenthesized]] (or the number of
ways of [[associativity|associating]] ''n'' applications of a [[binary
operator]]). For ''n'' = 3 for example, we have the following five
different parenthesizations of four factors:
<center><math>((ab)c)d \quad (a(bc))d \quad(ab)(cd) \quad a((bc)d)
\quad a(b(cd))</math></center>

* Successive applications of a binary operator can be represented in
terms of a [[binary tree]]. It follows that ''C''''n'' is
the number of rooted ordered binary [[tree (graph theory)|tree]]s with
''n'' + 1 leaves:
[[Image:Catalan_number_binary_tree_example.png|center]]
If the leaves are labelled, we have the [[#Quadruple factorial|
quadruple factorial]] numbers.

*''C''''n'' is the number of non-isomorphic full binary
trees with ''n'' vertices that have children, usually called internal
vertices or branches. (A rooted binary tree is ''full'' if every
vertex has either two children or no children.)

*''C''''n'' is the number of '''monotonic paths''' along
the edges of a grid with ''n'' × ''n'' square cells, which do not
cross the diagonal. A monotonic path is one which starts in the lower
left corner, finishes in the upper right corner, and consists entirely
of edges pointing rightwards or upwards. Counting such paths is
equivalent to counting Dyck words: X stands for "move right" and Y
stands for "move up". The following diagrams show the case ''n'' = 4:
[[Image:Catalan_number_4x4_grid_example.svg|450px|center]]

* ''C''''n'' is the number of different ways a [[convex
polygon]] with ''n'' + 2 sides can be cut into [[triangle (geometry)|
triangle]]s by connecting vertices with [[straight line]]s. The
following hexagons illustrate the case ''n'' = 4:
[[Image:Catalan-Hexagons-example.svg|400px|center]]

* ''C''''n'' is the number of [[stack]]-sortable
[[permutation]]s of {1, ..., ''n''}. A permutation ''w'' is called
'''stack-sortable''' if ''S''(''w'') = (1, ..., ''n''), where
''S''(''w'') is defined recursively as follows: write ''w'' = ''unv''
where ''n'' is the largest element in ''w'' and ''u'' and ''v'' are
shorter sequences, and set ''S''(''w'') =
''S''(''u'')''S''(''v'')''n'', with ''S'' being the identity for one-
element sequences.

* ''C''''n'' is the number of [[noncrossing partition]]s of
the set { 1, ..., ''n'' }. ''A fortiori'', ''C''''n'' never
exceeds the ''n''th [[Bell numbers|Bell number]]. ''C''''n'' is also the number of noncrossing partitions of the set { 1, ...,
2''n'' } in which every block is of size 2. The conjunction of these
two facts may be used in a proof by [[mathematical induction]] that
all of the ''free'' [[cumulant]]s of degree more than 2 of the
[[Wigner semicircle law]] are zero. This law is important in [[free
probability]] theory and the theory of [[random matrices]].

== Proof of the formula ==

There are several ways of explaining why the formula
:<math>C_n = \frac{1}{n+1}{2n\choose n}</math>
solves the combinatorial problems listed above. The first proof below
uses a [[generating function]]. The second and third proofs are
examples of [[bijective proof]]s; they involve literally counting a
collection of some kind of object to arrive at the correct formula.

=== First proof ===

We start with the observation that several of the combinatorial
problems listed above can easily be seen to satisfy the [[recurrence
relation]]

:<math>C_0 = 1 \quad \mbox{and} \quad C_{n+1}=\sum_{i=0}^{n}C_i\,C_{n-
i}\quad\mbox{for }n\ge 0.</math>

For example, every Dyck word ''w'' of length ? 2 can be written in a
unique way in the form
:''w'' = X''w''1Y''w''2
with (possibly empty) Dyck words ''w''1 and ''w''2.

The [[generating function]] for the Catalan numbers is defined by

:<math>c(x)=\sum_{n=0}^\infty C_n x^n.</math>

As explained in the article titled [[Cauchy product]], the sum on the
right side of the above recurrence relation is the coefficient of
''x''''n'' in the product

:<math>\left(\sum_{i=0}^\infty C_i x^i\right)^2.</math>

Therefore

:<math>\left(\sum_{i=0}^\infty C_i x^i\right)^2 = \sum_{n=0}^\infty C_
{n+1} x^n.</math>

Multiplying both sides by ''x'', we get

:<math>x\left(\sum_{i=0}^\infty C_i x^i\right)^2 = \sum_{n=0}^\infty C_
{n+1} x^{n+1}
= \sum_{n=1}^\infty C_n x^n = -1 + \sum_{n=0}^\infty C_n x^n.</math>

So we have

:<math>c(x)=1+xc(x)^2,\,</math>

and hence

:<math>c(x) = \frac{1-\sqrt{1-4x}}{2x}.</math>

The square root term can be expanded as a power series using the
identity

:<math>\sqrt{1+y} = 1 - 2\sum_{n=1}^\infty {2n-2 \choose n-1} \left
(\frac{-1}{4}\right)^n \frac{y^n}{n} ,</math>

which can be proved, for example, by the [[binomial theorem#Newton's
generalized binomial theorem|binomial theorem]], (or else directly by
considering repeated derivatives of <math>\sqrt{1+y}</math>) together
with judicious juggling of factorials. Substituting this into the
above expression for ''c''(''x'') produces, after further
manipulation,

:<math>c(x) = \sum_{n=0}^\infty {2n \choose n} \frac{x^n}{n+1.} </
math>

Equating coefficients yields the desired formula for ''C''''n''.

===Second proof===

This proof depends on a trick due to D. Andr�, which is now more
generally known as the [[reflection principle (combinatorics)|
reflection principle]] (not to be confused with the [[Schwarz
reflection theorem]] in [[complex analysis]]). It is most easily
expressed in terms of the
"monotonic paths which do not cross the diagonal" problem (see
[[#Applications in combinatorics|above]]).

[[Image:Catalan_number_reflection_example.png|frame|right|Figure 1.
The green portion of the path is being flipped.]]
Suppose we are given a monotonic path in an ''n'' × ''n'' grid that
''does'' cross the diagonal. Find the first edge in the path that lies
above the diagonal, and ''flip'' the portion of the path occurring
after that edge, along a line parallel to the diagonal. (In terms of
Dyck words, we are starting with a sequence of ''n'' X's and ''n'' Y's
which is ''not'' a Dyck word, and exchanging all X's with Y's after
the first Y that violates the Dyck condition.) The resulting path is a
monotonic path in an (''n'' − 1) × (''n'' + 1) grid. Figure 1
illustrates this procedure; the green portion of the path is the
portion being flipped.

Since every monotonic path in the (''n'' − 1) × (''n'' + 1) grid must
cross the diagonal at some point, every such path can be obtained in
this fashion in precisely one way. The number of these paths is equal
to
:<math>{2n\choose n-1}</math>.
Therefore, to calculate the number of monotonic ''n'' × ''n'' paths
which do ''not'' cross the diagonal, we need to subtract this from the
''total'' number of monotonic ''n'' × ''n'' paths, so we finally
obtain
:<math>{2n\choose n}-{2n\choose n-1}</math>
which is the ''n''th Catalan number ''C''''n''.

===Third proof===

The following bijective proof, while being more involved than the
previous one, provides a more natural explanation for the term ''n'' +
1 appearing in the denominator of the formula for ''C''''n''.

[[Image:Catalan_number_exceedance_example.png|frame|right|Figure 2. A
path with exceedance 5.]]
Suppose we are given a monotonic path, which may happen to cross the
diagonal. The '''exceedance''' of the path is defined to be the number
of pairs of edges which lie ''above'' the diagonal. For example, in
Figure 2, the edges lying above the diagonal are marked in red, so the
exceedance of the path is 5.

Now, if we are given a monotonic path whose exceedance is not zero,
then we may apply the following algorithm to construct a new path
whose exceedance is one less than the one we started with.
* Starting from the bottom left, follow the path until it first
travels above the diagonal.
* Continue to follow the path until it ''touches'' the diagonal again.
Denote by ''X'' the first such edge that is reached.
* Swap the portion of the path occurring before ''X'' with the portion
occurring after ''X''.
The following example should make this clearer. In Figure 3, the black
circle indicates the point where the path first crosses the diagonal.
The black edge is ''X'', and we swap the red portion with the green
portion to make a new path, shown in the second diagram.
[[Image:Catalan_number_swapping_example.png|frame|center|Figure 3. The
green and red portions are being exchanged.]]
Notice that the exceedance has dropped from three to two. In fact, the
algorithm will cause the exceedance to decrease by one, for any path
that we feed it.

[[Image:Catalan_number_algorithm_table.png|frame|right|Figure 4. All
monotonic paths in a 3×3 grid, illustrating the exceedance-decreasing
algorithm.]]
It is also not difficult to see that this process is ''reversible'':
given any path ''P'' whose exceedance is less than ''n'', there is
exactly one path which yields ''P'' when the algorithm is applied to
it.

This implies that the number of paths of exceedance ''n'' is equal to
the number of paths of exceedance ''n'' − 1, which is equal to the
number of paths of exceedance ''n'' − 2, and so on, down to zero. In
other words, we have split up the set of ''all'' monotonic paths into
''n'' + 1 equally sized classes, corresponding to the possible
exceedances between 0 and ''n''. Since there are

:<math>{2n\choose n}</math>

monotonic paths, we obtain the desired formula

:<math>C_n = \frac{1}{n+1}{2n\choose n}.</math>

Figure 4 illustrates the situation for ''n'' = 3. Each of the 20
possible monotonic paths appears somewhere in the table. The first
column shows all paths of exceedance three, which lie entirely above
the diagonal. The columns to the right show the result of successive
applications of the algorithm, with the exceedance decreasing one unit
at a time. Since there are five rows, ''C''3 = 5.

==Hankel matrix==

The ''n''×''n'' [[Hankel matrix]] whose (''i'', ''j'') entry is the
Catalan number ''C''''i''+''j'' has [[determinant]] 1,
regardless of the value of ''n''. For example, for ''n'' = 4 we have
:<math>\det\begin{bmatrix}1 & 1 & 2 & 5 \\ 1 & 2 & 5 & 14 \\ 2 & 5 &
14 & 42 \\ 5 & 14 & 42 & 132\end{bmatrix} = 1</math>

Note that if the entries are ``shifted", namely the Catalan numbers
''C''''i''+''j''+1, the determinant is still 1, regardless
of the size of ''n''.
For example, for ''n'' = 4 we have
:<math>\det\begin{bmatrix}1 & 2 & 5 & 14 \\ 2 & 5 & 14 & 42 \\ 5 & 14
& 42 & 132 \\ 14 & 42 & 132 & 429 \end{bmatrix} = 1</math>.

The Catalan numbers is the unique sequence with this property.

==Quadruple factorial==
The quadruple factorial is given by <math>\frac{2n!}{n!}</math>, or
<math>\left(n+1\right)! C_n</math>. This is the solution to labelled
variants of the above combinatorics problems. It is entirely distinct
from the [[Factorial#Multifactorials|multifactorials]].

AverageJoe

unread,

Aug 21, 2009, 6:08:44 PM8/21/09

to

<snip>

Hmm. I think your posting is off-topic, isn't it?

AverageJoe

unread,

Aug 21, 2009, 6:52:49 PM8/21/09

to

Hmm. I get a stddev of 60.418 (variance = 3650.337):
I somehow (via stdnormal) computed the x-value belonging to z=-1
giving me 28.082 (not sure whether this is the correct value), then
stddev = (x - u) / z = (28.082 - 88.5) / -1 = 60.418
variance = stddev^2 = 3650.3347

Hmm. there is a big difference between these results. Which one is correct?

Ray Vickson

unread,

Aug 22, 2009, 2:47:24 AM8/22/09

to

For a large sample (large n) the observed proportion p_obs (when the
true proportion is p) has mean p and standard deviation sigma_p = sqrt
[p(1-p)/n]; see, eg.,
http://science.kennesaw.edu/~jdemaio/1107/proportion%20estimation.htm
. In your case n = 354 and you are assuming p = 0.25, so sigma_p = sqrt
(.2*.75/354) = .02301436545. Thus, p_obs = 122/354 = .3446327684 gives
(p_obs - p)/sigma_p = 4.111899961. Thus, the observation would be 4.11
standard deviations above the mean (z-score = 4.11). For a one-sided
test this corresponds to a probability of about .2067e-4, which is
about 1 in 48,000.

I don't see how the authors arrived at a z-score of 3.89.

R.G. Vickson

>
> Thx

AverageJoe

unread,

Aug 22, 2009, 5:09:55 AM8/22/09

to

Thx R.G., this helped me nuch.
Just a final question: what if the sample size is small, for example n=10 or 15 ?

Ray Vickson

unread,

Aug 22, 2009, 10:49:42 AM8/22/09

to

It all depends on what "proportion" means. The formulas I cited were
for estimating the 'p' in a Binomial population; that is, each member
of the population is regarded as having a certain probability 'p' of
having some property of interest. For example, in an election, 'p'
might be the probability that a randomly-selected person supports
candidate A. In a quality-control context, 'p' might be the
probability that a randomly-selected item is faulty, etc. We are,
therefore, considering the population as large, but the sample size to
be just part of the population. For example, we take an opinion poll
of n = 100 people to try to predict the outcome of an upcoming
election involving 400,000 voters, or we might take a sample of n
items of production output and test whether the chosen items are
faulty. In this case, the observed proportion p_obs = N(A)/n is just a
re-scaled version of the binomial.

It sounds like the "ESP testing" context is of this type, so the
binomial model seems appropriate. In general, if each observation has
the same 'p' of possessing some interesting property A, and the
observations are "statistically independent", then out of n
observations, the number N(A) having property A has the binomial
distribution with parameters n and p. For example, if p = 1/4 and n =
10, N(A) is binomial with parameters 10 and 1/4. That means that the
expected value (mean) of N(A) is m = n*p = 10*1/4 = 2.5, while the
standard deviation is V = n*p*(1-p) = 30/16. The observed proportion
p_obs = N(A)/10 would have a "scaled binomial" distribution with mean
1/4 (exactly) and standard deviation = V/n^2 = 30/1600. The sample
size n = 10 is too small to allow safe use of the normal
approximation, so we would need to use the exact binomial
distribution. Most decent spreadsheets have a binomial distribution
calculator; EXCEL, for example, has a BINOMDIST function that allows
you to calculate things like Pr{N(A) = k} or P{N(A) <= k} for given k.

A different type of situation applies in "sampling without
replacement". For example, in a small village of population N, we
might be interested in knowing the proportion (or total number) who
support a particular policy A. If we take a small poll of n randomly-
selected villagers (but making sure to NOT poll any person more than
once) we have sampling 'without replacement'. If the sample size n
(say n = 10) is small compared with the population size N, then the
binomial distribution is a good approximation. However, if N is not
very large compared with n, then the exact distribution is quite
different from the binomial; it is called the _hypergeometric
distribution_. You can Google that term for more details. In the
village case we might have have n = 10 and N = 60, so we should use
the exact hypergeometric distribution (although in that case, the
binomial approximation might give results that are good to one decimal
place, perhaps---at least if we are not too far towards the ends of
the range).

R.G. Vickson

bert

unread,

Aug 22, 2009, 11:04:30 AM8/22/09

to

Well, I mis-remembered the terminology,
and yours is a BINOMIAL distribution, not
a POISSON one. If the probability of an
event is p on each trial, and we make n
trials, then the mean expected number of
events is np, and the variance is np(1-p).

The Poisson distribution is the limiting
case as n increases and p decreases, with
a variance equal to the mean - so it's a
very useful distribution when there are
good practical grounds for expecting it;
you get 2 parameters for the price of 1.
--