Nice approach. I’m doing the example-testing for py2js in a similar way.
I’m writing the documentation in bitbucket’s wiki syntax (but plan on converting it to ReST, so I can generate docs with Sphinx) – the example-tester reads the Python example, runs it through py2js and affirms that the output exactly matches the corresponding Javascript example.
As Ondrej has noted, this approach is useful for “beauty”-testing and example testing, but Niall’s functionality-driven approach is more efficient for full regression testing. The reason is that my example-testing (and your hg testing) uses a character-for-character approach and would be a significant maintenance burden (changing tens or hundreds of tests) whenever we change any whitespace, formatting or the way the Javascript is output (for instance, if we change from vanilla for-loops to a foreach() function).
-- peter