It has been a long time since university for me, so this may be a
simple problem:
I have two binomially distributed independent variables (say X & Y)
representing the number of sales of two salespeople in two different
areas and I want to test whether one has a higher success rate than
the other.
This is my progress so far:
Both have a larg(ish) number of visits, so I approximate both X and Y
to a normal distribution with (different for each) parameters np and
np(1-p) with n the number of visits for each salesperson and p
unknown.
I want to test whether p_x=p_y so I make my test statistic T:= p_x -
p_y
My hypotheses will be H_0: T /= 0; H_1: T = 0
I use the MLE (yes, it has been that long I couldn't remember the MLE
for a Norm Dist, so I did it from first principles!) to find estimates
for p_x and p_y, namely X/n_x and Y/n_y
Substituting in the test statistic, I get: T= X/n_x - Y/n_y
I want to standardise this operation to then calculate a 95%
confidence interval and see if there is a zero in the interval. If
there is, I would then reject H_0 and conclude that it is unlikely the
two salespeople have the same success rate.
*Here* is where I get stuck - I recall (perhaps wrongly?) that if X
and Y have a normal distribution, a linear operation of them should
have a normal distribution also, but I can't remember how to work out
the mean or variance.
As I said, it is probably a basic question, and my terminology is
probably all over the place, but if someone could help me out, I'd be
very grateful.
Usually it's the other way around: H0: px = py, H1: px /= py.
>
> I use the MLE (yes, it has been that long I couldn't remember the MLE
> for a Norm Dist, so I did it from first principles!) to find estimates
> for p_x and p_y, namely X/n_x and Y/n_y
>
> Substituting in the test statistic, I get: T= X/n_x - Y/n_y
X/nx - Y/ny
The usual test statistic is z = ---------------------------,
sqrt[(1/nx + 1/ny)*p*(1-p)]
with p = (X + Y)/(nx + ny).
Refer z to the standard normal distribution.
>
> I want to standardise this operation to then calculate a 95%
> confidence interval and see if there is a zero in the interval. If
> there is, I would then reject H_0 and conclude that it is unlikely the
> two salespeople have the same success rate.
>
> *Here* is where I get stuck - I recall (perhaps wrongly?) that if X
> and Y have a normal distribution, a linear operation of them should
> have a normal distribution also, but I can't remember how to work out
> the mean or variance.
>
> As I said, it is probably a basic question, and my terminology is
> probably all over the place, but if someone could help me out, I'd be
> very grateful.
Getting a confidence interval for the difference between two
independent
proportions is a little more complicated, and there are many different
methods distributed along the simplicity-accuracy continuum. A nice
compromise is what is sometimes called a modified Wald interval:
px'-py' +- z*sqrt[(1/nx' + 1/ny')*p'*(1-p')], where
X' = X+1, Y' = Y+1, nx' = nx+2, ny' = ny+2,
px' = X'/nx', py' = Y'/ny', p' = (X'+Y')/(nx'+ny'),
and z for a c% confidence interval is the (100+c)/2 percentile point
in the standard normal distribution.
No, that's wrong. The interval is
px'-py' +- z*sqrt[px'(1-px')/nx' + py'(1-py')/ny'].
Sorry about that.