A/B Testing with cucumber

Roland Swingler

unread,

Nov 3, 2009, 12:46:32 PM11/3/09

to Cukes

Hi,

I'm wondering if anyone has any experience with the combination of
cucumber & A/B testing.

I have a sign up form and am doing an A/B test to see if you get more
signups if there are fewer fields on the form. With a traditional
scenario like:

Scenario: Sign up page
Given I am on the sign up page
When I fill in "name" with "Roland"
And I fill in "password" with "password"
etc. for all form fields

I'll get a failing step on the missing field in one branch of the A/B
test.

My problems at the moment are:

1. I don't really want to create separate scenarios for each branch of
the A/B test, both because the A/B test will only run for a short
period of time and I think it is confusing from a business perspective
to see two conflicting scenarios, however...

2. I *do* still want to test both branches, but maybe this sort of
test belongs "lower down" down the testing stack, in a controller/view
test or somesuch?

3. Even if I go for a more abstract step like "When I fill out the
sign up form", I'm still not sure how I'd ensure that a specific
branch was being executed without stubbing out the call to the A/B
test plugin (abingo, if relevant) somehow - which I thought was a big
no no in integration tests?

Has anyone come up with any good solutions to this sort of problem?

Cheers,
Roland

Stephen Eley

unread,

Nov 3, 2009, 1:57:50 PM11/3/09

to cu...@googlegroups.com

On Tue, Nov 3, 2009 at 12:46 PM, Roland Swingler
<roland....@gmail.com> wrote:
>
> I'm wondering if anyone has any experience with the combination of
> cucumber & A/B testing.

I think you may be overcomplicating the problem. You're not really
trying to describe one application workflow with indeterminate
behavior; you're trying to describe *two* totally separate workflows,
each of which has determinate behavior.

What I'd do, therefore, would be to make a separate branch in Git (or
other source control system) for the changes I wanted to try. I'd
develop the Cucumber features, write my code, etc., exactly as I would
for any other application change. The application itself isn't aware
that it's being A/B tested. Nothing in the code worries about that.

Then I'd deploy the branch separately, and do my A/B testing by
proxying to one application instance or the other. You might need to
do a custom proxy to do this, but that shouldn't be too crazy. Ilya
Grigorik gave a talk at the Golden Gate Ruby Conference on how to do
just that:

http://pivotallabs.com/gogaruco/talks/55-building-custom-web-proxies-in-ruby

Just make sure you set a session flag or a cookie or something so that
once a user's funneled into one branch or the other, they stay there.
Again, that's at the proxy level; your app doesn't need to know about
it.

Once you've run it long enough to determine the outcome of the
experiment, you can do the rest in Git. If the results show that the
change was worthwhile, just merge your "B" branch into master.
Otherwise, delete it. Because you never put any A/B "magic" into your
application, there's nothing to back out and no feature work to undo
in Cucumber.

Does this make sense?

--
Have Fun,
Steve Eley (sfe...@gmail.com)
ESCAPE POD - The Science Fiction Podcast Magazine
http://www.escapepod.org

Stephen Eley

unread,

Nov 3, 2009, 2:11:58 PM11/3/09

to cu...@googlegroups.com

On Tue, Nov 3, 2009 at 1:57 PM, Stephen Eley <sfe...@gmail.com> wrote:
>
> Just make sure you set a session flag or a cookie or something so that
> once a user's funneled into one branch or the other, they stay there.

Replying to myself with another thought: that might be unnecessary.
Assuming you only care about the workflow within a single session, you
could just branch from the originating IP address. Say, if the sum of
all digits in the IP address is even, you go to branch A; otherwise
you go to branch B. This isn't truly random, and there might be
reasons why the distribution wouldn't quite be 50/50; but it's likely
good enough. And it would be easier *and* more robust than setting
any kind of flag.

(Just make sure, when you write tests for your proxy, that you account
for the test environment where you're hitting it mostly from
127.0.0.1.) >8->

Paul Campbell

unread,

Nov 4, 2009, 5:56:40 AM11/4/09

to cu...@googlegroups.com

It seems like a bit of a chore to introduce significant architecture
investments and git branches just to test two versions of an AB test.

In this case, I'd say that it's one of the exceptions for stubbing.
Trying to integration test randomness is ... destined to fail.

A step like:

Given the A/B test has chosen branch "A"

would be reasonably elegant, perhaps a little bit more brittle, but
miles simpler and much faster, and very clear in its intent.

Paul

--

Paul Campbell
pa...@rushedsunlight.com
- - - - - - - - - - - - - - - - - - -
blog http://www.pabcas.com
twitter http://www.twitter.com/paulca
github http://www.github.com/paulca
phone +353 87 914 8162
- - - - - - - - - - - - - - - - - - -

Stephen Eley

unread,

Nov 4, 2009, 10:33:28 AM11/4/09

to cu...@googlegroups.com

On Wed, Nov 4, 2009 at 5:56 AM, Paul Campbell <pa...@rslw.com> wrote:
>
> It seems like a bit of a chore to introduce significant architecture
> investments and git branches just to test two versions of an AB test.

I see your point about the architecture. But I would argue that it
doesn't add much work. A/B testing is non-trivial work already: you
have to branch, you have to collect data, you have to analyze it. Any
code to support that work has to live *somewhere* -- either inside
your application or outside it. My argument is simply that "outside"
is a more logical and less risky choice. And if you think it's hard,
watch that video -- Ilya shows persuasively that writing a proxy is
not really a lot of work.

On your second list item: Git branching a chore? Really? One of the
key things I love about Git is that branching is finally *not* a
chore. Hopefully it's part of the development lifecycle already.
>8->

> A step like:
> Given the A/B test has chosen branch "A"
> would be reasonably elegant, perhaps a little bit more brittle, but
> miles simpler and much faster, and very clear in its intent.

I don't think it's clear in its intent at all. Is "the A/B test" a
user of the application? What's the feature description? Who are we
targeting in this scenario, and what's the business value to them?

We call them Cucumber 'features' because that's what Cucumber's good
at: describing application features. My principal objection to this
is that A/B testing isn't really a feature of the application. It's
more of a meta-feature, an activity that's _about_ the application but
conceptually lives outside of it, like deployment or monitoring. I
don't write Cucumber features for those either; but if I did, I'd
consider those features part of a separate project, not part of the
application.

Luke Melia

unread,

Nov 4, 2009, 10:52:32 AM11/4/09

to cu...@googlegroups.com

On Nov 4, 2009, at 10:33 AM, Stephen Eley wrote:

>> A step like:
>> Given the A/B test has chosen branch "A"
>> would be reasonably elegant, perhaps a little bit more brittle, but
>> miles simpler and much faster, and very clear in its intent.
>
> I don't think it's clear in its intent at all. Is "the A/B test" a
> user of the application? What's the feature description? Who are we
> targeting in this scenario, and what's the business value to them?

The business value is to the product marketing person who is
incrementally improving the site by leveraging A/B testing features of
the application. This is no different than admin CMS functionality
that you might have integrated into the site. The user segment served
may not be the primary users of the application, but they certainly
are users who are deriving value from the feature.

I'd give a +1 to Paul's suggestion.

Cheers,
Luke
--
Luke Melia
lu...@lukemelia.com
http://www.lukemelia.com/

Paul Campbell

unread,

Nov 4, 2009, 10:54:19 AM11/4/09

to cu...@googlegroups.com

>> It seems like a bit of a chore to introduce significant architecture
>> investments and git branches just to test two versions of an AB test.
>
> I see your point about the architecture. But I would argue that it
> doesn't add much work. A/B testing is non-trivial work already: you
> have to branch, you have to collect data, you have to analyze it. Any
> code to support that work has to live *somewhere* -- either inside
> your application or outside it. My argument is simply that "outside"
> is a more logical and less risky choice. And if you think it's hard,
> watch that video -- Ilya shows persuasively that writing a proxy is
> not really a lot of work.
> On your second list item: Git branching a chore? Really? One of the
> key things I love about Git is that branching is finally *not* a
> chore. Hopefully it's part of the development lifecycle already.
>>8->

Heh! I <3 git branches more than you!

The chore I'm referring to is introducing further architecture + git
branch maintenance, where the end goal could be reached by a far
easier path of least resistance.

The question is: Is stubbing the functionality of abingo a cheat?

The answer is probably: yes. When I consider cheating, I just look at
the easiest path to getting things working.

>> Given the A/B test has chosen branch "A"
>> would be reasonably elegant, perhaps a little bit more brittle, but
>> miles simpler and much faster, and very clear in its intent.
>
> I don't think it's clear in its intent at all. Is "the A/B test" a
> user of the application? What's the feature description? Who are we
> targeting in this scenario, and what's the business value to them?

I think the answer here rests on how much benefit you give the
business owner. "The A/B test" is very much just "the A/B test" ...
another kinda similar cheat where I feel one could get away with
stubbing is for something like Thinking Sphinx. In that case, I'd do
something like "Given the search engine returns 1 result" ...

It's a shadowy line between whether the behaviour of an outside
element should be described as part of a feature, but for me, it's
always about choosing the simplest choice that works, and makes sure
that at least most of the use case is covered.

>
> We call them Cucumber 'features' because that's what Cucumber's good
> at: describing application features. My principal objection to this
> is that A/B testing isn't really a feature of the application. It's
> more of a meta-feature, an activity that's _about_ the application but
> conceptually lives outside of it, like deployment or monitoring. I
> don't write Cucumber features for those either; but if I did, I'd
> consider those features part of a separate project, not part of the
> application.

I agree. One should always assume or at least check that a third party
tool is well tested before using it. That's why I think it's safe to
stub the behaviour: because you're stubbing behaviour that I think is
relatively safe to assume works. Particularly when the expected
behaviour is random, as in this case.

It's a really interesting discussion. I would worry that bringing in
relatively complicated architectural changes would be considered best
practice just for the sake of sticking to the "don't stub" rule.
Proxies may be simple to implement, and I've implemented many in my
day, but they increase the conceptual complexity of the system.

There comes a stage in every apps life where a proxy is probably going
to be implemented, but I'm not so certain that "I'm introducing a
proxy into my architecture so that I avoid an extra line in my
cucumber feature"

I guess it's not hard and fast either way, — just not a decision I'd
be quick to jump to a conclusion on.

Paul

>
>
> --
> Have Fun,
> Steve Eley (sfe...@gmail.com)
> ESCAPE POD - The Science Fiction Podcast Magazine
> http://www.escapepod.org
>
> >
>

Paul Campbell

unread,

Nov 4, 2009, 10:58:18 AM11/4/09

to cu...@googlegroups.com

> The business value is to the product marketing person who is
> incrementally improving the site by leveraging A/B testing features of
> the application. This is no different than admin CMS functionality
> that you might have integrated into the site. The user segment served
> may not be the primary users of the application, but they certainly
> are users who are deriving value from the feature.

You put it better than I did.

>
> I'd give a +1 to Paul's suggestion.
>
> Cheers,
> Luke
> --
> Luke Melia
> lu...@lukemelia.com
> http://www.lukemelia.com/
>
>
> >
>

Stephen Eley

unread,

Nov 4, 2009, 11:02:35 AM11/4/09

to cu...@googlegroups.com

On Wed, Nov 4, 2009 at 10:54 AM, Paul Campbell <pa...@rslw.com> wrote:
>
> It's a really interesting discussion. I would worry that bringing in
> relatively complicated architectural changes would be considered best
> practice just for the sake of sticking to the "don't stub" rule.

To be clear: I'm not advocating that the A/B test would be best
handled by a proxy outside the application for the sake of not
stubbing. I don't really care about that. I write so much stuff that
integrates with third-party services that not stubbing at all in my
Cucumber specs would get really complex.

I'm suggesting this approach because I think it's the better way to do
A/B testing. That it makes Cucumber BDD easier is part and parcel
with that.

Stephen Eley

unread,

Nov 4, 2009, 11:14:42 AM11/4/09

to cu...@googlegroups.com

On Wed, Nov 4, 2009 at 10:52 AM, Luke Melia <lu...@lukemelia.com> wrote:
>
>> I don't think it's clear in its intent at all. Is "the A/B test" a
>> user of the application? What's the feature description? Who are we
>> targeting in this scenario, and what's the business value to them?
>
> The business value is to the product marketing person who is
> incrementally improving the site by leveraging A/B testing features of
> the application.

For what it's worth, I agree. I'm certainly not arguing against the
potential value of A/B testing. My specific point was orthogonal to
that -- it's that you aren't really communicating *intent* if you
simply provide a single "Given" line.

From a larger scale, it's the final clause of your sentence that I am
playing Devil's Advocate against. "Leveraging A/B testing features of
the application" makes an assumption that I'm not ready to buy into by
default. I don't think it needs to be a feature of the _application._
In the case initially described, I think it makes more sense for it
to be a feature of the environment.

Paul Campbell

unread,

Nov 4, 2009, 11:38:35 AM11/4/09

to cu...@googlegroups.com

> From a larger scale, it's the final clause of your sentence that I am
> playing Devil's Advocate against. "Leveraging A/B testing features of
> the application" makes an assumption that I'm not ready to buy into by
> default. I don't think it needs to be a feature of the _application._
> In the case initially described, I think it makes more sense for it
> to be a feature of the environment.

From an architecture point of view, disregarding its effect on cukes,
it's certainly an interesting one. Most examples of A/B testing I've
seen are expressly outside of the app anyway, the main case being
Google Site Optimizer, which is simply a chunk of javascript ...

There could be another case where you do it that way, and actually
display both options to the user, where webrat sees both and you can
test that both work, but use css to affect the display.

Paul

>
>
> --
> Have Fun,
> Steve Eley (sfe...@gmail.com)
> ESCAPE POD - The Science Fiction Podcast Magazine
> http://www.escapepod.org
>
> >
>

Reply all

Reply to author

Forward