Testing locker.git

Kristján Pétursson

unread,

Jan 27, 2012, 8:22:03 PM1/27/12

to Singly Developers

[TLDR: TJ Holowaychuk is to testing Node as Apple is to buying electronics]

I'm looking into cleaning up the locker's test suite, and much feedback would be

appreciated. For context, I've been in Rails-land for five years, which will

certainly influence my approach. If I'm trying to apply anything that doesn't

mesh with Node.

Currently, unit and functional tests are built in Vows and the integration tests

use Capybara. No one at Singly seems positive on Vows; I find its syntax

hard to parse and its thought process strange. In particular, letting batches

rely on state created by previous batches breaks test isolation. Capybara is a

great tool, but it sucks to make someone who wants to hack on locker write Ruby

to do any full-stack work.

Based on reading lots of frameworks and finding not a lot of conversation, if I

were to start fresh I'd opt for the half-dozen tools TJ Holowaychuk has written

for Node testing. Their ideals match Node's, they have active committers, and

because they're under the same umbrella, they should work well together and

improve in step. That path would boil down to:

--

Unit and Functional testing with Mocha (http://visionmedia.github.com/mocha/).

This is TJ's second take on testing Node, and presumably incorporates what he

learned writing Expresso. It's straightforward and clean (though admittedly

because I'm used to RSpec and they match), and comes with some nice bonuses in

the reporting department. Converting from Vows to Mocha would make this:

vows.describe("Module").addBatch({

"First batch": {

topic: // Value that will be tested,

"it does foo" : function(test_value) {

// Assert yourself

},

"it does bar" : function(test_value) { ... }

}

}).addBatch({

"Another batch": {

// Things in here know about things in the first batch

...

}

});

look like:

describe("Module", function() {

describe("First batch", function() {

beforeEach(function() {

//Set up testing state

// This replaces `topic`

});

it("does foo", function() {

// Assert yourself

});

it("does bar", function() {

// ...

});

describe("Another batch", function() {

// This is wholly independent of the first batch

});

It's more lines, but I find it far clearer what's going on. Again, it's

particularly valuable that the second batch shares no state with the first—you

are fully aware of and responsible for all initialization.

--

For functional testing (which I put at the level of verifying individual

controller behavior), there's Tobi (https://github.com/LearnBoost/tobi), which

has a representative example in the README that I won't replicate here. Tobi

spins up your app, makes requests, and uses jsdom to let you examine everything.

--

For integration testing, I think we should ditch Capybera in favor of a Node

solution so that there's only one language to worry about. If you're capable of

coding the locker, you should be equally capable of testing it.

There are two interesting tools here: Soda (https://github.com/learnboost/soda)

and Zombie (https://github.com/assaf/zombie). Soda completes the top-to-bottom

Holowaychuk hat trick as a Node adapter for Selenium, while Zombie appears to

emulate the DOM, CSS, JS and browsing all on its own.

Points for Soda are that in wraps a common and much-used tool (Selenium), has

what looks to me like a more coherent DSL, and already integrates with Sauce

Labs if we ever want to offload a monotonically slower integration suite.

Zombie, on the other hand, looks like it may have fewer moving parts at the

expense of a trickier install process (it's got an npm package, but actually

compiles, where Soda just uses Selenium's .jar). Because it's not just a

wrapper, one can mess a lot more with the tool itself. However, because it's

younger than Selenium, one might /have/ to mess more with the tool itself.

It's unclear to me whether one of Soda or Zombie would be faster than the other,

which is a great feature when you want to run them locally.

--

Lastly, I happened upon some helper libraries that just look handy in general:

. NodeReplay (https://github.com/assaf/node-replay) will catch and save HTTP

requests for later playback. This makes it easy to run tests that rely on

third parties once for real, then use their responses for the rest of your

runs. One flag turns on real requests again when you want to run final

acceptance on the API or just get a new set of recordings.

. node-database-cleaner (https://github.com/emerleite/node-database-cleaner)

gives you one line to obliterate your test data when you want to ensure

isolation.

. Should (https://github.com/visionmedia/should.js) makes your tests read more

like real sentences.

--

I haven't considered fixtures vs. factories yet, but in general prefer

factories. As code bases expand, I find the fixtures become difficult to

maintain, especially if they depend on each other. If someone feels strongly and

can advise their development, chime in.

Still reading? Did you do it in one sitting? Kudos. Whaddaya think?

Matt Zimmerman

unread,

Jan 28, 2012, 7:31:15 PM1/28/12

to singl...@googlegroups.com

Thanks for looking into this! Comments inline.

On Fri, Jan 27, 2012 at 05:22:03PM -0800, Kristján Pétursson wrote:
> Based on reading lots of frameworks and finding not a lot of conversation,
> if I were to start fresh I'd opt for the half-dozen tools TJ Holowaychuk
> has written for Node testing. Their ideals match Node's, they have active
> committers, and because they're under the same umbrella, they should work
> well together and improve in step. That path would boil down to:

Sounds solid.

> Unit and Functional testing with Mocha
> (http://visionmedia.github.com/mocha/). This is TJ's second take on
> testing Node, and presumably incorporates what he learned writing
> Expresso. It's straightforward and clean (though admittedly because I'm
> used to RSpec and they match), and comes with some nice bonuses in the
> reporting department. Converting from Vows to Mocha would make this:

> [...]

> It's more lines, but I find it far clearer what's going on. Again, it's
> particularly valuable that the second batch shares no state with the
> first—you
> are fully aware of and responsible for all initialization.

Looks more readable to me, though it'll be easier to tell with some
real-world locker tests converted.

> For functional testing (which I put at the level of verifying individual
> controller behavior), there's Tobi (https://github.com/LearnBoost/tobi),
> which has a representative example in the README that I won't replicate
> here. Tobi spins up your app, makes requests, and uses jsdom to let you
> examine everything.

Looks nice!

> For integration testing, I think we should ditch Capybera in favor of a
> Node solution so that there's only one language to worry about. If you're
> capable of coding the locker, you should be equally capable of testing it.
>
> There are two interesting tools here: Soda (
> https://github.com/learnboost/soda) and Zombie
> (https://github.com/assaf/zombie). Soda completes the top-to-bottom
> Holowaychuk hat trick as a Node adapter for Selenium, while Zombie appears
> to emulate the DOM, CSS, JS and browsing all on its own.

I didn't realize there were alternatives here, or I probably wouldn't have
spent time getting the existing front end stack running under Jenkins. ;-)
No worries, though. We should definitely choose the best tool available,
and I think ease of test development and maintenance are primary
considerations.

> Points for Soda are that in wraps a common and much-used tool (Selenium),
> has what looks to me like a more coherent DSL, and already integrates with
> Sauce Labs if we ever want to offload a monotonically slower integration
> suite.
>
> Zombie, on the other hand, looks like it may have fewer moving parts at
> the expense of a trickier install process (it's got an npm package, but
> actually compiles, where Soda just uses Selenium's .jar). Because it's not
> just a wrapper, one can mess a lot more with the tool itself. However,
> because it's younger than Selenium, one might /have/ to mess more with the
> tool itself.
>
> It's unclear to me whether one of Soda or Zombie would be faster than the
> other, which is a great feature when you want to run them locally.

Which one is nicer to write tests for? Sounds like maybe Soda?

If we need to improve the install process, we can do that with one-time
work, while we'll be writing tests for a long time to come, so I'd
prioritize features over packaging.

> Lastly, I happened upon some helper libraries that just look handy in
> general:
> . NodeReplay (https://github.com/assaf/node-replay) will catch and save HTTP
> requests for later playback. This makes it easy to run tests that rely on
> third parties once for real, then use their responses for the rest of your
> runs. One flag turns on real requests again when you want to run final
> acceptance on the API or just get a new set of recordings.

Nice find!

> . node-database-cleaner (https://github.com/emerleite/node-database-cleaner)
> gives you one line to obliterate your test data when you want to ensure
> isolation.

This reminds me, we should launch mongo with --noprealloc when running the
test suite, otherwise it's S-L-O-W to create the database. Currently, the
database gets reused (except under Jenkins where we clean everything up),
but it sounds like that's going to change.

> . Should (https://github.com/visionmedia/should.js) makes your tests read
> more like real sentences.

Looks Rubyish. :-)

> I haven't considered fixtures vs. factories yet, but in general prefer
> factories. As code bases expand, I find the fixtures become difficult to
> maintain, especially if they depend on each other. If someone feels
> strongly and can advise their development, chime in.

I'd lean toward factories, especially if it means we can keep the test data
closer to the test code. I'll yield to the folks writing tests, though.

> Still reading? Did you do it in one sitting? Kudos. Whaddaya think?

It sounds like there are better tools available than when we started, which
is great news.

Test coverage is not great at this point, but we should probably save that
for a follow-on project. It would be a success just to make the existing
tests (or those which are still valuable) a joy to use and extend.

--
- mdz

Jeremie Miller

unread,

Jan 29, 2012, 3:10:55 AM1/29/12

to singl...@googlegroups.com

All my thoughts are almost identical to Matt's so, yep, great stuff!

I'm def a fan of the mocha stuff based on your example, and so many of our tests have evolved some awful state dependencies that I'm not sure it'll be possible to or worth porting them versus figuring out what needs to be tested from a fresh state.

One sore point I got a bit grumpy on recently was the fixtures for the synclet testing, arguing that they're essentially useless as a representation of the APIs in the wild since the real APIs are both error prone and always a moving target, a fixture is a false sense of security and anyone touching synclet code knows they have to try it on the real API to actually know if it works anyway. It seems the only great way to get another layer of automated testing for these is probably having a CI service using legit auth tokens to hit the real apis :/

I've been super keen on the idea of each package having it's own tests (in it's own dir) as well, and probably a top-level batch tool that would test a given config and all the referenced package's tests. It'd be great to be dev'ing a collection or app and just run those locally while coding, and step up and run all before committing, etc.

Thanks for the research+sharing!

Jer

Forrest Norvell

unread,

Jan 29, 2012, 6:40:45 PM1/29/12

to singl...@googlegroups.com

On Fri, Jan 27, 2012 at 5:22 PM, Kristján Pétursson <kris...@singly.com> wrote:

I'm looking into cleaning up the locker's test suite, and much feedback would be
appreciated. For context, I've been in Rails-land for five years, which will

certainly influence my approach. If I'm trying to apply anything that doesn't
mesh with Node.

...

Capybara is a

great tool, but it sucks to make someone who wants to hack on locker write Ruby
to do any full-stack work.

...and it's yet another stack of dependencies to manage and yet another package-management system to tangle with. I know this has been a huge pain in the ass for several people. What I do like about Capybara is the DSL it lays on top of Selenium. Selenium is valuable. Writing native Selenium tests is awkward.

It's straightforward and clean (though admittedly
because I'm used to RSpec and they match)

This shouldn't be overlooked. I know there are folks on our team and in our community who don't have Rails experience, but I've written hundreds of RSpec test cases (maybe even thousands), and I'm thoroughly familiar with its patterns. Jasmine also follows a very RSpec-like pattern, and that made picking it up very easy for browser-side JS testing.

It's more lines, but I find it far clearer what's going on. Again, it's
particularly valuable that the second batch shares no state with the first—you

are fully aware of and responsible for all initialization.

This particular aspect of Vows has never made any sense to me – it seems to be part and parcel of their claims for Vows' speed of execution. For testing, I want the framework to be as straightforward and deterministic as possible - one of the big bummers, for example, of working with webrat and Authlogic with declarative-auth, was that we seemed to spend as much time figuring out what the test helpers were doing (and how they were doing it) as we did writing tests or code. Explicit flow of control and per-test isolation have been more valuable than shaving 30-45 seconds off running the test suite.

Which reminds me, have you run across anything like zentest while you were looking at frameworks? It can be a little finicky in the Ruby world, but when it was working, it was a very valuable utility for encouraging a BDD flow.

For functional testing (which I put at the level of verifying individual
controller behavior), there's Tobi (https://github.com/LearnBoost/tobi), which

has a representative example in the README that I won't replicate here. Tobi
spins up your app, makes requests, and uses jsdom to let you examine everything.

Would this mostly be for Integral? The locker core doesn't have a lot of controllers in the traditional sense, and it seems like Soda would work better with in-browser MVC-style "viewer" apps like the dashboard.

For integration testing, I think we should ditch Capybera in favor of a Node
solution so that there's only one language to worry about. If you're capable of
coding the locker, you should be equally capable of testing it.

+1

There are two interesting tools here: Soda (https://github.com/learnboost/soda)
and Zombie (https://github.com/assaf/zombie). Soda completes the top-to-bottom

Holowaychuk hat trick as a Node adapter for Selenium, while Zombie appears to
emulate the DOM, CSS, JS and browsing all on its own.

If trusting our test suite is an important part of moving towards continuous deployment (and I think the Singly group sentiment is that it is), then I would argue that biases towards using a testing environment that mimics the environment actual locker users will have as closely as possible. Using chromedriver and Selenium feel like kluges because they are, but the results are worth it. As we move more towards things like dashboardv3's single-page MVC frameworks (and the current style of app development engendered by the locker architecture seems to encourage this), we're going to need faithful and accurate ways of testing what's going on in real browsers. Zombie does sound simpler and more internally orthogonal, but I favor Soda.

It's unclear to me whether one of Soda or Zombie would be faster than the other,
which is a great feature when you want to run them locally.

When the test suite gets large, performance does get to be a significant concern. I've dealt with that in the past by ensuring the test suite is broken out enough that I only need to run the tests for the bits of the app under development, periodically running the whole suite when I take a break / prepare to merge into master. I try not to worry about suite performance, because I think having lots of small, precise tests gives the most useful form of test coverage, and I don't want to discourage myself from writing tests.

I haven't considered fixtures vs. factories yet, but in general prefer
factories. As code bases expand, I find the fixtures become difficult to

maintain, especially if they depend on each other. If someone feels strongly and
can advise their development, chime in.

Fixtures end up turning into a maintenance quagmire all their own, in my experience. You end up having to shim in more data over time and refactoring fixtures to match the needs of different codebases (or, worse, end up duplicating fixture data all over the place, which is its own form of hell). I eventually switched to using Factory Girl and really liking the flexibility and composability it gave me. If something like that doesn't exist in the Node.js world, we could probably build it without too much trouble.

Thank you very much for pulling this all together, Kristján! What you've sketched out here would be a major improvement over where things are today, and while it'd take us a while to get decent coverage (because I agree with Matt, we shouldn't try to replicate our current test suite in the new one, as that would be a lot of work, some of it wasted due to some of the existing tests being specious), it'll put us in a much better place, especially with regard to continuous deployment.

Did you run across Janky (https://github.com/github/janky) while you were looking at things? It doesn't sound like we'd have to make that many changes to Jenkins to get up and running with it, and I think we were talking about looking more closely at Hu-Bot anyway.

F

Forrest Norvell

unread,

Jan 29, 2012, 6:51:21 PM1/29/12

to singl...@googlegroups.com

On Sun, Jan 29, 2012 at 12:10 AM, Jeremie Miller <j...@singly.com> wrote:

One sore point I got a bit grumpy on recently was the fixtures for the synclet testing, arguing that they're essentially useless as a representation of the APIs in the wild since the real APIs are both error prone and always a moving target, a fixture is a false sense of security and anyone touching synclet code knows they have to try it on the real API to actually know if it works anyway. It seems the only great way to get another layer of automated testing for these is probably having a CI service using legit auth tokens to hit the real apis :/

...which points to the big issue I have with running unit and functional tests against live APIs -- we're not always connected to the network, the network isn't always reliable, nor are the services we run the synclets against. Yes, APIs change without notice, and having tests running against idealized flows can lead to a misleading feeling of confidence (we *do* need to know what happens when a synclet gets a response that it's been rate-limited), but running against live APIs introduces external dependencies that can result in nondeterministic behavior within the test suite. Nondeterministic tests are poison to having faith in the test framework / continuous deployment.

I do think that it's appropriate for integration tests (especially integration tests that aren't couple to CI) to be running against live APIs. Both kinds of tests are useful. I also think NodeReplay could be very helpful / a good replacement for traditional fixtures with API calls as we figure out our strategy wrt factories.

I've been super keen on the idea of each package having it's own tests (in it's own dir) as well, and probably a top-level batch tool that would test a given config and all the referenced package's tests. It'd be great to be dev'ing a collection or app and just run those locally while coding, and step up and run all before committing, etc.

I think that's part and parcel of breaking the bundled apps / collections / connectors out into their own repositories, but the (eminently surmountable) challenge there is figuring out how to build up a shared testing infrastructure that all the new apps can plug into.

F

Thomas Muldowney

unread,

Jan 30, 2012, 10:27:18 AM1/30/12

to singl...@googlegroups.com

I still have a few tabs open to look at these projects in a bit more depth, but it's very interesting to me. I'm curious if people could expand more on why they prefer such heavy DSL like testing frameworks over something like node-unit with basic assertions. I would venture a guess related to those that came from TDD & BDD backgrounds, largely starting in the Ruby world vs say C++ world and more usage as unit and regression testing.

On another front, has anyone looked at PhantomJS as another headless, but actually rendering, option for visual integration testing? I've been using it heavily in a personal project and absolutely loving it.

--temas

Kristján Pétursson

unread,

Jan 30, 2012, 12:37:32 PM1/30/12

to singl...@googlegroups.com

GMail needs a way to re-merge N replies into one quotes block that I can inline. Next time maybe I'll link a GDoc for comments.

I'm glad everyone agrees that fixtures are suck. Anyone know of a good factory?

My plan is to do this in stages, first swapping out Vows since that seems to be the prime irritation right now, unit tests are easier, and a good suite of unit tests makes for a functional testing foundation. Replacing Capybara will come last since that's the most divorced from the rest of the code - it'll be hardest to change, but should keep puttering on its own in the meantime.

On Sat, Jan 28, 2012 at 4:31 PM, Matt Zimmerman <m...@singly.com> wrote:

If we need to improve the install process, we can do that with one-time
work, while we'll be writing tests for a long time to come, so I'd
prioritize features over packaging.

Solid point

On Sun, Jan 29, 2012 at 3:40 PM, Forrest Norvell <for...@singly.com> wrote:

Which reminds me, have you run across anything like zentest while you were looking at frameworks? It can be a little finicky in the Ruby world, but when it was working, it was a very valuable utility for encouraging a BDD flow.

I didn't, but a quick search found this starting point:

https://github.com/wilkerlucio/huffman_js/blob/master/node/autotest.watchr

If we pick a good set of conventions, ZenTest's file-to-spec matching is simple enough to replicate. At Causes we wrote a script that did that based on what was modified relative to Git.

For functional testing (which I put at the level of verifying individual
controller behavior), there's Tobi (https://github.com/LearnBoost/tobi), which

has a representative example in the README that I won't replicate here. Tobi
spins up your app, makes requests, and uses jsdom to let you examine everything.

Would this mostly be for Integral? The locker core doesn't have a lot of controllers in the traditional sense, and it seems like Soda would work better with in-browser MVC-style "viewer" apps like the dashboard.

We will have to use Soda/etc. for the viewer apps, though I think we should consider figuring out a framework to make those more independent and individually testable. Something like a mock-locker perhaps.

Core still has plenty of pieces that handle requests though. Tobi probably doesn't add much if all we're doing is verifying response codes and JSON blobs, but something should.

Did you run across Janky (https://github.com/github/janky) while you were looking at things? It doesn't sound like we'd have to make that many changes to Jenkins to get up and running with it, and I think we were talking about looking more closely at Hu-Bot anyway.

That looks sweet. If we're moving to Hubot, this'll be a must, and if we're not, it looks like a strong argument that we should.

On Mon, Jan 30, 2012 at 7:27 AM, Thomas Muldowney <te...@singly.com> wrote:

I still have a few tabs open to look at these projects in a bit more depth, but it's very interesting to me. I'm curious if people could expand more on why they prefer such heavy DSL like testing frameworks over something like node-unit with basic assertions. I would venture a guess related to those that came from TDD & BDD backgrounds, largely starting in the Ruby world vs say C++ world and more usage as unit and regression testing.

If you want more tabs, https://github.com/joyent/node/wiki/modules#wiki-testing.

I don't feel strongly on the DSL style of assertions, since they just wrap assert* anyway, but I do find their stanza organization helpful. I like the way describe() contains a context independent of any other, and beforeEach() is the perfect single place to set up that context. The should-style libraries also tend to come with more powerful assertions than the base provides, especially when you get into verifying HTML bodies, etc.

I do think Cucumber and that genre go too far and just produce an unnecessary layer of translation that's prone to bugs itself. As the saying goes, someone has a problem and decides to use regular expressions...

On another front, has anyone looked at PhantomJS as another headless, but actually rendering, option for visual integration testing? I've been using it heavily in a personal project and absolutely loving it.

That looks great, though for the biggest bang we'd probably have to use Jasmine over Mocha (https://github.com/visionmedia/mocha/issues/160). I do like the sound of it over Zombie because it incorporates Webkit itself while Zombie seems to fake it.

Forrest Norvell

unread,

Jan 30, 2012, 6:06:09 PM1/30/12

to singl...@googlegroups.com

On Mon, Jan 30, 2012 at 7:27 AM, Thomas Muldowney <te...@singly.com> wrote:

I'm curious if people could expand more on why they prefer such heavy DSL like testing frameworks over something like node-unit with basic assertions. I would venture a guess related to those that came from TDD & BDD backgrounds, largely starting in the Ruby world vs say C++ world and more usage as unit and regression testing.

For me it's a simple matter of clarity and style. In practice, when I'm following a BDD flow, each specification (a test case in the xUnit world) would have 2-3 lines, of which 1-2 are assertions. So: a description of the expected behavior, a line or two of setup specific to the test, a line or two of assertions. Keeping things finely-grained makes it really easy to narrow in on what's broken when there are regressions. Having a large vocabulary of matchers helps keep the tests easy to read and concise. I think we probably want the same things from testing frameworks, just codified in different ways.

What about the DSL feels heavyweight to you? I eventually parted ways with jUnit because it got so overburdened with conventions and layers of abstraction that extending the framework to handle new test types pretty much became a full-time job. By comparison, I find xSpec frameworks a lot easier to use, because I just ignore the chunks of the vocabulary I don't care about. Also, the only times I've found performance to be an issue in practice have been when I've been bending the rules for unit testing and doing lots of stuff that involves either dealing with production data or external services, so I have never felt RSpec or its derivatives (including Jasmine) to be particularly heavy (with the pretty huge caveat that I've always felt like conflating a testing framework with a mocking framework to be a mistake).

F

Forrest Norvell

unread,

Jan 30, 2012, 6:17:50 PM1/30/12

to singl...@googlegroups.com

On Mon, Jan 30, 2012 at 9:37 AM, Kristján Pétursson <kris...@singly.com> wrote:

I'm glad everyone agrees that fixtures are suck. Anyone know of a good factory?

I haven't used any in Node, but this looks promising: https://github.com/bkeepers/rosie

Core still has plenty of pieces that handle requests though. Tobi probably doesn't add much if all we're doing is verifying response codes and JSON blobs, but something should.

Yeah, and there's a lot of traffic between collections and core, and Tobi could act as a starting point for capturing, in some form, an externalized representation of what the standard vocabulary should be for the various endpoints that each end is talking to. Hmmm...

I do think Cucumber and that genre go too far and just produce an unnecessary layer of translation that's prone to bugs itself. As the saying goes, someone has a problem and decides to use regular expressions...

+1 on this. There is a decent case to be made for cucumber-puppet, given what it does (which is to say, wraps around some weaknesses in Puppet's design (no dependency injection!) that make it challenging to test) and given that its target audience is mostly non-developers, but in general, cucumber has always struck me as a huuuge waste of time.

F

Reply all

Reply to author

Forward