Cuk way of using sensitive, developer-specific data in tests

10 views
Skip to first unread message

JT Zemp

unread,
Jan 13, 2010, 3:04:46 PM1/13/10
to Cukes
We have a PHP app that talks to a JRuby on Rails app via an API. I'm
responsible for the JRuby on Rails side, and I've been developing the
tests with Cucumber. I know it's kind of off-label somewhat, but I
like it better than rspec for doing integration testing.

Anyway, my question is, what is the "Cucumber/outside-in way" for
storing really sensitive data that is individual to a specific
developer? I test access to all of my checking and savings accounts
and I don't really want to check that information into the repo.

I'm using Scenario Outlines and go through a handful of accounts at
different institutions as I'm developing. Until now, I've just kept
dummy data in the feature file in the Examples table checked into the
repo.I just swap it with real data kept in a scratch file on my
computer (encrypted, of course) when I want to work on it.

So, is there a way to keep Scenario Outline data tables in another
file like a fixture file, or is it better just to have a step that
loads the data from a fixture file? Or is there some better option?

I still like the ability and notation of writing steps like this:
When set_user_institution_data is called for "<name>" and "<id>"
with "<credentials>"

But I'd like to have the name, id and credentials come from a
different file that can be ignored in my local working copy of the
code.

Thoughts?

Thanks!

aslak hellesoy

unread,
Jan 13, 2010, 7:15:30 PM1/13/10
to cu...@googlegroups.com
> We have a PHP app that talks to a JRuby on Rails app via an API. I'm
> responsible for the JRuby on Rails side, and I've been developing the
> tests with Cucumber. I know it's kind of off-label somewhat, but I
> like it better than rspec for doing integration testing.
>
> Anyway, my question is, what is the "Cucumber/outside-in way" for
> storing really sensitive data that is individual to a specific
> developer?

The term "outside-in" in BDD refers to the sequencing of workflow
tasks in a particular order. Roughly:

1) Identify business need (the very outside)
2) Discuss what feature would solve this need (feature injection)
3) Write an acceptance test (Cucumber scenario) that interacts with
the system boundaries (that have yet to be written)
4) Run the test and watch it fail because the code being tested
doesn't exist yet.
5) Write a little piece of the system boundary code that was missing
and caused the test to fail.
6) Run the test and watch it fail again, this time because the code
doesn't do what it needs to yet.
7) Repeat 5-6 working yourself "in" until you have enough code (and
not more than just enough) to make the test pass.

Your use of the term outside-in sounded like it was describing
something different than this, so I thought I'd clarify the meaning.

> I test access to all of my checking and savings accounts
> and I don't really want to check that information into the repo.
>

Since you don't want to check this information into the repo it sounds
like your scenarios interact with rails code that again interacts with
sensitive data that originates from a live system. Is this the case?
This is usually a highly discouraged practice in testing, and there
are several techniques to avoid this. Common for them all is that you
use invented data to test against. What techniques to use depend on
your architecture and functional requirements.

Can you explain what your scenarios try to verify? Are those saving
accounts in an external system beyond your control? With this
knowledge we can recommend some techniques that will let you stop
using sensitive data in your tests altogether.

Aslak

> I'm using Scenario Outlines and go through a handful of accounts at
> different institutions as I'm developing. Until now, I've just kept
> dummy data in the feature file in the Examples table checked into the
> repo.I just swap it with real data kept in a scratch file on my
> computer (encrypted, of course) when I want to work on it.
>
> So, is there a way to keep Scenario Outline data tables in another
> file like a fixture file, or is it better just to have a step that
> loads the data from a fixture file? Or is there some better option?
>
> I still like the ability and notation of writing steps like this:
>    When set_user_institution_data is called for "<name>" and "<id>"
> with "<credentials>"
>
> But I'd like to have the name, id and credentials come from a
> different file that can be ignored in my local working copy of the
> code.
>
> Thoughts?
>
> Thanks!
>

> --
> You received this message because you are subscribed to the Google Groups "Cukes" group.
> To post to this group, send email to cu...@googlegroups.com.
> To unsubscribe from this group, send email to cukes+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cukes?hl=en.
>
>
>
>

Ben Mabey

unread,
Jan 14, 2010, 10:33:22 PM1/14/10
to cu...@googlegroups.com, JT Zemp
Hey JT,
Just to follow up with what Aslak said... Since you don't generally want
to be hitting live systems for a number of reasons I would recommend
looking into FakeWeb (if you are hitting the system over HTTP):
http://github.com/chrisk/fakeweb

If you are not hitting the system over HTTP then you will have to stub
one of your own classes to return an appropriate canned response. This
is one of the few cases where stubbing is the better option in
Cucumber. If you are using RSpec matchers in Cucumber then all you need
to do to allow stubbing is to require 'spec/stubs/cucumber' in your
env.rb file. For more info and pointers on stubbing in cucumber check
out:
http://wiki.github.com/aslakhellesoy/cucumber/mocking-and-stubbing-with-cucumber


(That page also has a link to library that will allow you to stub across
processes if your situation necessitates it.)

Stubbing has many benefits in this context. For one, your tests won't
fail when you can't reach the outside service. They are also not
coupled to any expectation that you may unintentionally be relying on
(like your checking account balance for example). These tests will also
be a lot faster and not be dependent on network speeds. Best of all you
can actually commit them to the repo and have the CI server and
coworkers run them since they won't have your personal data in them. :)

That said, I would not recommend throwing away your current set of
features that actually hit the live system. What I normally suggest is
that all external systems are stubbed out by default but are allowed to
hit the real system by changing an option or running a different test.
This will give you confidence that your system is still working with the
third-party and will alert you if they unexpectantly change the API.
The stubbed versions of the test will protect you against the more
common case of a bug being introduced in your codebase and these tests
should be ran all the time. The third-party version test only needs to
be ran every so often. In the past I've set up a CI server to have a
build specifically for that purpose which runs nightly. In your case it
sounds like *you* will be the CI server since it involves some personal
data.

Now, to actually answer your question... Cucumber has no support for
external scenario tables being loaded into a given feature. I don't
foresee such a feature being added anytime soon (or ever). Your case is
very specific and unique so I think you'll have to get creative here.
Maybe you could create an ERB template of the feature which is used to
generate the stubbed/dummy data version as well as your personal data
one? Just a thought...

Hope that helps,
Ben

JT Zemp

unread,
Jan 15, 2010, 12:57:06 AM1/15/10
to cu...@googlegroups.com
Sorry I didn't reply sooner. 

Aslak & Ben, I appreciate your help on the issue, and the time you put into your responses. I'll respond inline.

Yeah, that's what I was talking about when I referred to outside-in. That's the methodology I followed. What I didn't give was a lot of background hoping to get the real question I had relative to fixtures.

I'll go into a little more detail (probably more than is necessary). :)

We have a web-based flex app that helps users get out of debt. It aggregates their financial accounts (checking, savings, retirement, etc.), helps them categorize transactions to track spending and also catalog debts. We show them their net worth, and how to get out of debt quickly. We help them set up scenarios to let them play with data (how would it affect my retirement if I applied an extra $200/mo to my mortgage? etc.) We remind them when to make extra payments, we warn them when they're close to hitting budget limits, etc. There is a lot more to the system, but this is probably enough info to give an idea.

The flex app has little or no business logic it talks to a back-end PHP app. The PHP app has the majority of the business logic and persistence. It relies on another web service to get users' financial data from their banks. The first is an in-house-developed scraper that had access to 2-3k financial institutions. An API was established between this aggregation/scraping application and the main PHP app.

We decided to integrate with a 3rd party provider to use their aggregation engine which was written in Java that has access to 10k-ish institutions. It's the most complicated and over-engineerd piece of crap I've ever seen in my life (I've done a lot of API integration work before, and this is easily a couple of orders of magnitude more asinine than anything I've ever seen before</rant>) 

What we got delivered from the 3rd party were a bunch of jars we use to connect to their backend servers in their data center where the scraping actually happens. We developed a JRuby on Rails layer to bridge their crazy API with our API. Yeah, I know, that's a lot of layers. (Flex -> PHP -> JRuby on Rails( jars) -> hosted service) We made it conform to the original API for the first scraping system so we can use them side-by-side and have a backup.

There is a full suite of RSpecs that stub out all the major java code internally just to test our API. I've got good internal coverage with all the backend processes stubbed out so no communication hits our provider's system. 

For integration tests, I intentionally chose to write them in Cucumber for a handful of reasons: it was a language (English) my PHP team could understand and sign off on without needing to have familiarity with Ruby or RSpec. It serves as great documentation that is readable by anyone on the other teams (and the management team for that matter). It's easy to DRY up the integration tests and re-use steps and still maintain readability for someone unfamiliar with Ruby and RSpec. The Scenario Outlines allow me to put in a list as long as I like of live, individual accounts to make sure that the 3rd party service isn't having problems talking to the institutions who are important to us. We haven't done it yet, but are looking to integrate this process with a CI server, or with Nagios to monitor and report if/when things go wrong. I've seen a cucumber-nagios integration that looks promising.

The integration testing aspect is important to us, and the Cucumber language and outside-in process was very helpful as I worked with the PHP team. I agree it is kind of an edge case and a little bit of a stretch, but I think it's a credit to the flexibility and power of Cucumber to communicate with domain experts (PHP developers who had an existing API that we needed to replicate). So, that being said, thanks for making Cucumber as awesome as it is.

I see that since my needs are such an edge case, i'll likely need to come up with an individual solution.

<snip />

Hey JT,
Just to follow up with what Aslak said... Since you don't generally want to be hitting live systems for a number of reasons I would recommend looking into FakeWeb (if you are hitting the system over HTTP): http://github.com/chrisk/fakeweb
If you are not hitting the system over HTTP then you will have to stub one of your own classes to return an appropriate canned response.  This is one of the few cases where stubbing is the better option in Cucumber.  If you are using RSpec matchers in Cucumber then all you need to do to allow stubbing is to require 'spec/stubs/cucumber' in your env.rb file.  For more info and pointers on stubbing in cucumber check out: http://wiki.github.com/aslakhellesoy/cucumber/mocking-and-stubbing-with-cucumber

(That page also has a link to library that will allow you to stub across processes if your situation necessitates it.)

That sounds cool, I'll have to check it out, thanks.

Stubbing has many benefits in this context.  For one, your tests won't fail when you can't reach the outside service.  They are also not coupled to any expectation that you may unintentionally be relying on (like your checking account balance for example).  These tests will also be a lot faster and not be dependent on network speeds.  Best of all you can actually commit them to the repo and have the CI server and coworkers run them since they won't have your personal data in them. :)

That said, I would not recommend throwing away your current set of features that actually hit the live system.  What I normally suggest is that all external systems are stubbed out by default but are allowed to hit the real system by changing an option or running a different test.  This will give you confidence that your system is still working with the third-party and will alert you if they unexpectantly change the API.  The stubbed versions of the test will protect you against the more common case of a bug being introduced in your codebase and these tests should be ran all the time.  The third-party version test only needs to be ran every so often.  In the past I've set up a CI server to have a build specifically for that purpose which runs nightly.  In your case it sounds like *you* will be the CI server since it involves some personal data.

Yeah, like I mentioned above, I've got pretty good internal code coverage with RSpec on a granular level, I'm just using Cucumber as an integration testing engine.

Now, to actually answer your question... Cucumber has no support for external scenario tables being loaded into a given feature.  I don't foresee such a feature being added anytime soon (or ever).  Your case is very specific and unique so I think you'll have to get creative here.  Maybe you could create an ERB template of the feature which is used to generate the stubbed/dummy data version as well as your personal data one?  Just a thought...

That's an interesting thought, Ben--definitely creative. :) I'll look into it. I'll also brainstorm a little... if it's something super cool, I'll post a follow-up in case anyone else tries to do something similar.


Hope that helps,
Ben

It did, thanks gentlemen.
Reply all
Reply to author
Forward
0 new messages