Multiple experiments on the same page?

Stavros

unread,

Aug 17, 2010, 9:36:30 PM8/17/10

to django-lean

I have added two experiments on the same page. One is the addition of
a simple sentence, and the other is a blank experiment, to test what
happens. The former is showing a 10% improvement at 91% confidence,
which is fine, but the latter is showing a 42% improvement at 98%
confidence.

To reiterate, for two pages which are *exactly the same*, I get a 42%
improvement at 98% confidence.

I wanted to ask, is there a bug with having two experiments on the
same page, or should I have the experiment run longer?

Thanks,
Stavros

Rory McCann

unread,

Aug 18, 2010, 10:30:39 AM8/18/10

to djang...@googlegroups.com, Stavros

Some of my pages have lots of experiments running, and in theory you
could have 2 experiments per page.

What sounds strange (almost like a bug) is how something that does
nothing (the second experiment) is showing an improvement with a high
confidence value. I would have thought the only logical output would be
0% improvement.

Rory

0x5373FB61.asc

signature.asc

Erik Wright

unread,

Aug 18, 2010, 9:27:05 AM8/18/10

to djang...@googlegroups.com

There should not be any bugs with two experiments on the same page.

First of all, 98% is not actually that confident. It means that, one in 50 times, you could run a "placebo" experiment and get results like these. So it's perfectly reasonable that you did run a placebo experiment and get those results.

I also seem to recall that reporting of confidence can be inaccurate when the numbers are very low. There are mathematical limits below which the numbers are not reliable. For conversion experiments, for example, there are four groups: control conversions, control non-conversions, test conversions, and test non-conversions. I believe, at a minimum, that you need at least 10 in each of the four groups to begin getting a meaningful "confidence" score. The code should actually be modified to not report confidence below those numbers.

I hope that helps.

-Erik

Poromenos

unread,

Aug 18, 2010, 11:27:20 AM8/18/10

to djang...@googlegroups.com

Well, the numbers support the confidence, which can mean one of three
things:

* django-lean is miscounting (I did not set a control, just a test.
Could that affect things?).
* django-lean is biased in distributing visitors to test and control
experiments.
* I have hit the 1 in 70 case.

I'm continuing to run it today, if the numbers are the same again, it's
certainly a bug...

Stavros

er...@erikwright.com

unread,

Aug 18, 2010, 11:34:48 AM8/18/10

to django-lean

Can you tell us how many users are in your test and control groups,
and what number of conversions you have in each?

Feel free to email me privately if you would like help investigating
this but prefer not to post those numbers to the board.

-e

Stavros

unread,

Aug 18, 2010, 2:19:35 PM8/18/10

to django-lean

Hello,
sorry, yes.

Control participants: 198
Test participants: 209

I just thought of something. Does django-lean classify users in
"control" and "test" groups per experiment, or per user? If the
latter, then it is biased, because the test user of my proper
experiment would also always be a test user in the placebo experiment,
skewing the numbers.

Is this what's going on?

Stavros

er...@erikwright.com

unread,

Aug 18, 2010, 2:42:19 PM8/18/10

to django-lean

Each user who is exposed to a given experiment is enrolled in a group
for that experiment independently of any other experiments.

In other words, if you have two experiments, A and B, the control
group of B should be equally divided amongst control and test users in
A (assuming that B is equally exposed to both groups).

On the other hand, if, for example, B is only visible to users in
Control(A), or if it is harder to find B when a user is in Test(A),
the population of B could be exclusively (or heavily) in Control(A).

If one had enough traffic, one could reduce the impact of such things
by only running each test on a small subset of users. You would need
to build that feature, however.

Assuming that your conversion rate is 5% (the minimum to be above the
10 user threshold I mentioned), you have 10 conversions in your
Control group and ~15 in your Test group. I think it's likely that
these numbers are correct, and that your users were indeed assigned to
(and exposed to) the appropriate groups. That being said, though, my
immediate impression is that these numbers are too low to be
significant with 98% confidence. If there is a possible bug, it would
be in the confidence calculation.

I did a quick check using this site:

http://abtester.com/calculator/

I used 10/198 and 16/209 as the conversion rates. It gave me a
confidence of 86%. I don't know why that differs from the rate
reported by Django-Lean. Perhaps it is a different formula, or perhaps
there is a bug in Django-Lean.

-e

Stavros

unread,

Aug 18, 2010, 3:05:17 PM8/18/10

to django-lean

Sorry, I thought I had quoted the rates. The numbers are:

198 209 36 (18.2 %) 22 (10.5 %)

Pasted from the django-lean table.

The formula seems to be correct. The experiment is, literally, an
empty tag, although from looking at the code I see that I didn't even
need to include that, as the users will be assigned to groups either
way (is this correct?).

It looks like we might need to chalk this one up to chance. I'll wait
for tonight's report (pity you can't generate reports mid-day and have
them update) and I'll get back to you then, hopefully the confidence
will fall to 0 pretty soon.

Thanks,
Stavros

er...@erikwright.com

unread,

Aug 18, 2010, 3:22:26 PM8/18/10

to django-lean

Yes, it looks like simple chance. Again, 1 in 50 is not that unlikely.
You will likely need to run to 1000 or more subjects in each group
before you see that even out. For example, even if you add 209 new
users to both Test and Control groups, and these new users convert
exactly equally (10.5%), you would have:

44/418 vs. 58/407, which is _still_ 95% confidence.

Unfortunately, it is a fact with A/B testing that you need high
volumes of users to get meaningful results.

No - you do need to "expose" the users to the experiment somehow in
order for them to be enrolled.

Someone submitted a patch to allow running reports mid-day, but it was
buggy. If you look back through the mailing list you should be able to
find their fork and my comments on why it's buggy. Feel free to create
your own fork, merge in their changes, and fix them.

-e

Stavros

unread,

Aug 18, 2010, 3:27:53 PM8/18/10

to django-lean

I will, thanks. I'll also post the results here. I see what you mean
about needing large numbers, I just feel that the confidence measure
should reflect that, because 99% not being enough isn't very useful to
anyone... Perhaps I'll submit a patch to enable two decimals.

Thanks again!

er...@erikwright.com

unread,

Aug 18, 2010, 3:35:38 PM8/18/10

to django-lean

Yes, there is a lot that can be done to better reflect the confidence.

For example, the calculator that I pointed to shows you the number of
participants required to achieve various "confidence intervals". I'm
not sure exactly what they mean by that, but there are a number of
ways to indicate how much longer an experiment should run. Google
Website Optimizer might be a good place to look for inspiration.

-Erik

Stavros

unread,

Aug 18, 2010, 3:46:49 PM8/18/10

to django-lean

Yep, I saw that calculator yesterday after I got the results from my
test. I think that they just calculate the N you need to produce a z-
confidence value of 75, 85 and 95... I'm not sure why it's per-row,
though. I had a look at the source and it seems that, to display 2
decimal digits, all one needs to change is the confidence.html
template from stringformat:"d" to stringformat:"0.2f", but I'm not
sure how useful this is, except if one wants to get > 99 confidence.

Stavros

Reply all

Reply to author

Forward