Calculating parameters from the difference of random variables

monkey_man

unread,

Sep 22, 2007, 6:32:14 PM9/22/07

to

Hi,

It has been a long time since university for me, so this may be a
simple problem:

I have two binomially distributed independent variables (say X & Y)
representing the number of sales of two salespeople in two different
areas and I want to test whether one has a higher success rate than
the other.

This is my progress so far:

Both have a larg(ish) number of visits, so I approximate both X and Y
to a normal distribution with (different for each) parameters np and
np(1-p) with n the number of visits for each salesperson and p
unknown.

I want to test whether p_x=p_y so I make my test statistic T:= p_x -
p_y

My hypotheses will be H_0: T /= 0; H_1: T = 0

I use the MLE (yes, it has been that long I couldn't remember the MLE
for a Norm Dist, so I did it from first principles!) to find estimates
for p_x and p_y, namely X/n_x and Y/n_y

Substituting in the test statistic, I get: T= X/n_x - Y/n_y

I want to standardise this operation to then calculate a 95%
confidence interval and see if there is a zero in the interval. If
there is, I would then reject H_0 and conclude that it is unlikely the
two salespeople have the same success rate.

*Here* is where I get stuck - I recall (perhaps wrongly?) that if X
and Y have a normal distribution, a linear operation of them should
have a normal distribution also, but I can't remember how to work out
the mean or variance.

As I said, it is probably a basic question, and my terminology is
probably all over the place, but if someone could help me out, I'd be
very grateful.

Ray Koopman

unread,

Sep 25, 2007, 5:17:39 PM9/25/07

to

On Sep 22, 3:32 pm, monkey_man <daehe...@gmail.com> wrote:
> Hi,
>
> It has been a long time since university for me, so this may be a
> simple problem:
>
> I have two binomially distributed independent variables (say X & Y)
> representing the number of sales of two salespeople in two different
> areas and I want to test whether one has a higher success rate than
> the other.
>
> This is my progress so far:
>
> Both have a larg(ish) number of visits, so I approximate both X and Y
> to a normal distribution with (different for each) parameters np and
> np(1-p) with n the number of visits for each salesperson and p
> unknown.
>
> I want to test whether p_x=p_y so I make my test statistic T:= p_x -
> p_y
>
> My hypotheses will be H_0: T /= 0; H_1: T = 0

Usually it's the other way around: H0: px = py, H1: px /= py.

>
> I use the MLE (yes, it has been that long I couldn't remember the MLE
> for a Norm Dist, so I did it from first principles!) to find estimates
> for p_x and p_y, namely X/n_x and Y/n_y
>
> Substituting in the test statistic, I get: T= X/n_x - Y/n_y

X/nx - Y/ny
The usual test statistic is z = ---------------------------,
sqrt[(1/nx + 1/ny)*p*(1-p)]
with p = (X + Y)/(nx + ny).
Refer z to the standard normal distribution.

>
> I want to standardise this operation to then calculate a 95%
> confidence interval and see if there is a zero in the interval. If
> there is, I would then reject H_0 and conclude that it is unlikely the
> two salespeople have the same success rate.
>
> *Here* is where I get stuck - I recall (perhaps wrongly?) that if X
> and Y have a normal distribution, a linear operation of them should
> have a normal distribution also, but I can't remember how to work out
> the mean or variance.
>
> As I said, it is probably a basic question, and my terminology is
> probably all over the place, but if someone could help me out, I'd be
> very grateful.

Getting a confidence interval for the difference between two
independent
proportions is a little more complicated, and there are many different
methods distributed along the simplicity-accuracy continuum. A nice
compromise is what is sometimes called a modified Wald interval:

px'-py' +- z*sqrt[(1/nx' + 1/ny')*p'*(1-p')], where

X' = X+1, Y' = Y+1, nx' = nx+2, ny' = ny+2,

px' = X'/nx', py' = Y'/ny', p' = (X'+Y')/(nx'+ny'),

and z for a c% confidence interval is the (100+c)/2 percentile point
in the standard normal distribution.

Ray Koopman

unread,

Sep 25, 2007, 8:03:51 PM9/25/07

to

On Sep 25, 2:17 pm, Ray Koopman <koo...@sfu.ca> wrote:
> [...] A nice

> compromise is what is sometimes called a modified Wald interval:
>

> px'-py' +- z*sqrt[(1/nx' + 1/ny')*p'*(1-p')], [...]

No, that's wrong. The interval is

px'-py' +- z*sqrt[px'(1-px')/nx' + py'(1-py')/ny'].

Sorry about that.