Hypothesis testing using statsmodels

Dror Atariah

unread,

Feb 19, 2018, 2:26:25 PM2/19/18

to pystatsmodels

I am still trying to get my head around hypothesis testing in the context of A/B tests. I posted another question on CV. I would appreciate some help nailing this topic. It seems like there are fairly clear answers when it comes to analyzing the results of A/B test using R, but when moving to Python, for me at least, things are more blurry.

josef...@gmail.com

unread,

Feb 19, 2018, 4:58:50 PM2/19/18

to pystatsmodels

On Mon, Feb 19, 2018 at 2:26 PM, Dror Atariah <dro...@gmail.com> wrote:

I am still trying to get my head around hypothesis testing in the context of A/B tests. I posted another question on CV. I would appreciate some help nailing this topic. It seems like there are fairly clear answers when it comes to analyzing the results of A/B test using R, but when moving to Python, for me at least, things are more blurry.

If you have a specific example and test case for R, then for most methods it would not be difficult to replicate.

As I mentioned earlier, one of the difficult parts is to decide which versions to implement and how to set the options.

For example, for the case of comparing two poisson rates, I implemented 7 methods. Initially triggered by a stackexchange question.

https://github.com/statsmodels/statsmodels/pull/2723/files

Here is a chisquare test for two independent proportion, used as a helper function for exact or Berger/Boos exact test for equality of two proportions

https://github.com/statsmodels/statsmodels/pull/2608/files#diff-26a83a395f464c482b7651425f0fa4f3R84

Another difficulty is to match up the power and sample size computation to the actual test that is used. In large samples it won't make much difference, but the differences can be large in small sample or with proportions close to the boundaries 0 and 1.

Josef

josef...@gmail.com

unread,

Feb 19, 2018, 10:01:56 PM2/19/18

to pystatsmodels

sorry, I think I was mostly off-topic

I didn't remember what I wrote years ago. The messy parts mainly come with confidence intervals for the difference of two proportions or testing whether two proportions differ by a specified amount, or when using exact distributions. Those cases were the focus of my last round for proportions and rates.

`proportions_chisquare` includes the 2 or k-sample test for equality of proportions

"If value is not given and count and nobs are not scalar, then the null hypothesis is that all samples have the same proportion."

the unit tests are against R's prop.test

Because the function is for k-sample tests, it only allows for two-sided alternatives, similar to prop.test

I separated out proportion_ztest for the two sample case because there we can also have one-sided alternatives (and I didn't want to add the options that only apply to this special case to proportions_chisquare).

(proportion_ztest allows for a specified difference in the proportion but uses the simplification that we have a common pooled variance in the two sample case.)

proportion_ztest and proportions_chisquare have identical results in the 2 sample test for equal proportions.

AFAICS, we don't have directly a power function for proportions_chisquare, both effectsize and degrees of freedom need to be checked for this to use the generic power class for chisquare distribution.

Josef

Dror Atariah

unread,

Feb 20, 2018, 2:32:18 AM2/20/18

to pystatsmodels

Thanks again for the helpful Q&A :)

I will try to summarize the situation and your feedback would be appreciated.

Setting: A/B/C/... test where the metric of interest is some proportion (e.g. the conversion rate).

A/B test (2-sample) case.

Null hypothesis is that the proportions are equal.

In this case I could use either `proportions_chisquare` or `proportions_ztest`. Both will yield the same p-values.
For power computations only `NormalIndPower` can be used.

Other null hypothesis (e.g. proportion of control group is smaller than the one of the variant)

Only `proportions_ztest` can be used.
For power computations only `NormalIndPower` can be used.

A/B/C/... test (k-sample): Currently, only the case where the null hypothesis is that the proportions are equal can be worked out. In particular in this case, only `proportions_chisquare` should be used and there is no way to do power computations.

I hope I didn't miss something or caused more confusion.

Dror Atariah

unread,

Feb 20, 2018, 2:36:40 AM2/20/18

to pystatsmodels

On Monday, February 19, 2018 at 10:58:50 PM UTC+1, josefpktd wrote:

On Mon, Feb 19, 2018 at 2:26 PM, Dror Atariah <dro...@gmail.com> wrote:
I am still trying to get my head around hypothesis testing in the context of A/B tests. I posted another question on CV. I would appreciate some help nailing this topic. It seems like there are fairly clear answers when it comes to analyzing the results of A/B test using R, but when moving to Python, for me at least, things are more blurry.

If you have a specific example and test case for R, then for most methods it would not be difficult to replicate.

I am not fluent in R, so it is going to be hard for me. Sorry.

Reply all

Reply to author

Forward