[Note that replies are redirected to dev-apps-thunderbird]
As mentioned in previous post-Tb3 planning discussion centered around <https://wiki.mozilla.org/User:Dmose/Tb_Post-3.0_Scratchpad#Testing>, I believe Thunderbird needs to become significantly more agile by bringing our testing polices in line with those of the browser/ and toolkit/ directory in mozilla-central. Which is to say that I'd like to start requiring that all patches to mail/, mailnews/, directory/, and editor/ include automated tests starting Monday, January 4, 2010, except in cases where it's specifically impractical or not sensible.
I recognize that this will be a different style of development than some folks are used to and that the change may feel little a bumpy, so I want to be sure we make this policy change in a humane way. In particular, I'd like to us to be sensitive to the fact that there are a lot of patches in progress in Bugzilla today, and that we will suddenly be asking more of our contributors than we were at the time they started working on those patches.
Discussion subject changed to "State of testsuite (was: automated test policy for mail/, mailnews/, editor/, directory/ in comm-central)" by Ben Bucksch
> I recognize that this will be a different style of development than > some folks are used to and that the change may feel little a bumpy, so > I want to be sure we make this policy change in a humane way. In > particular, I'd like to us to be sensitive to the fact that there are > a lot of patches in progress in Bugzilla today, and that we will > suddenly be asking more of our contributors than we were at the time > they started working on those patches.
Can we first improve the test framework significantly, please?
I've followed your wishes of creating tests, spent the last 2 weeks with the testsuite, but it's been most painful.
I'll try to formulate my pain objectively. Main pain points:
setTimeout All I'll say is: do_timeout(500, "endTest()"); This is guaranteed undeterministic, and indeed the main problems were my tests (adapted from existing tests) just running astray or just stopping and I don't know where, why, or how to fix it. In my experience, almost every instance of setTimeout() is just a hidden bug that that's just papered over with setTimeout "because that's the only way it worked", but nobody cared to find out the real reason and fix properly.
No error output Sometimes, when I make a typo or error in tests or testsuite, it goes astray with no output whatsoever (bug 535310). This makes coding a severe pain, because I don't know what's wrong.
Copypasteritis Also, there's a *ton* of code cuplication, like prefSvc.setBoolPref("mail.biff.show_tray_icon", false); (and many others) (why isn't that in head_?) and var thread = gThreadManager.currentThread; while (thread.hasPendingEvents()) thread.processNextEvent(true); (why isn't that in tail_?).
No documentation There are currently 2-3 small documents, which don't even cover the API, much less how to write IMAP tests (without copy&paste). I could to create tests for POP3 and SMTP login, using copy&paste. But today, I tried to simply test login (connection) to an IMAP server, but couldn't make it work, the incomingServer.performExpand(null) continued further asyncronously, I don't know what to wait for and how to watch success. I gave up.
All over the place Many tests have globals which cross files. That makes it hard to see what's happening. I agree that tests need to be short and comfortable to write, so it's good to have lots of common code like createIMAPServer() or createLocalFolders() and friends, but these globals are not necessary, that's just not sane coding. Similarly, it's hard to find where stuff comes from, when it's comming from all of head_*, mailnews/test/fakeserver/, mailnews/test/resources/, mailnews/base/ and whatnot.
---
Sorry for this form, I wanted to write my experiences and main pain points in a more coherent form later, with possibly some ideas for improvement, but I'll be on vacation now, so I don't have much time to contribute in discussion or code and you're pressing the issue now... :-(
The whole step-by-step execution thing in our test suites leaves a lot to be desired: stepping (or letting the non-test environment do something) with setTimeout leads to intermittent orange in most cases, and IMHO using setTimeout in a test should make it fail right away - but doing event-triggered stepping needs a lot of overhead, makes it easy to not clean up correctly and makes us time out and fail the whole suite when one test fails, which is suboptimal as well.
I'd very much appreciate a simpler model there (and I won't go into having too many different test suites for the moment, as that affects Firefox and SeaMonkey more than Thunderbird).
> Can we first improve the test framework significantly, please?
I have been trying to avoid code duplication in the units tests for code I develop / work on and have done a lot of framework building that should hopefully be useful in many cases. My testing framework stack looks like this:
* logHelper.js - Logging abstraction that allows for 'rich' meta-data to be logged. Output is flattened for text output but also supports JSON-serialized output to disk and network. The current expected rich target is my 'logsploder' XULRunner application. One notable thing it does is cause script warnings/errors reported by the error console to become unit test failures. This covers a lot of the 'silent error' cases you note in your message.
You can run logsploder by doing "xulrunner application.ini". It will create a file in your temp dir that makes itself known to loghelper. Then running any unit tests that use logHelper will magically connect to logsploder. For example, the gloda unit tests. ('cd thunderbird-objdir/mailnews/db/gloda/test; make xpcshell-tests')
I am not going to claim excellent UI design, but it works surprisingly well for what it is and how much effort it has received.
* (currently lives in gloda resources space, I have a patch to move it out) folderEventLogHelper.js - Subscribes as an nsIFolderListener and nsIMsgFolderListener and logs these events (when logHelper has a rich consuder) with rich data to logHelper so that the log output includes useful information about what events are happening without the unit tests needing to provide this information directly. Incredibly useful in helping validation assumptions about what should be happening in a unit test.
* asyncTestUtils.js - An asynchronous test runner framework that maintains an async call stack and supports use of generators in tests and testing helpers. mozmill's strategy of spinning a nested event loop rather than yielding control back up to the containing event loop (which is how asyncTestUtils works) may be a better strategy for testing comprehension.
* messageGenerator.js - Synthetic message creation support. It's not a full-fledged message generator suitable for replacing Thunderbird's message generation (it grows as testing needs demand), but it can express multi-part MIME stuff. It has support for building threads of messages and other scenarios. This was brought into existence because simply copying in pre-mastered/existing .eml files from disk is insufficient for gloda testing; messages need to have distinct message-id's or things start getting unrealistic.
* messageModifier.js - Represents collections of synthetic messages created by messageGenerator and supports performing attribute modification on the messages. More significant operations (moving, copying) are performed by messageInjection.js and tend to operate in terms of the SyntheticMessageSet collection object.
* messageInjection.js - Abstracts away dealing with local message injection (via addMessage) and IMAP message injection (using the IMAP fake server). The POP code path is sortof there but not actually used/tested, so it likely does not work. POP may also have conceptual difficulties since both local and IMAP injection support targeting all folders, not just the Inbox. This file includes operations to copy/move/trash/delete messages.
While said files may not provide all you need, they should hopefully at least help you avoid needing to write code that does anything they do. I do imagine protocol testing might need to operate a level lower than the message*.js framework stuff, though.
> While said files may not provide all you need, they should hopefully > at least help you
A lot of this sounds like great work. Thanks!
Can we integrate that in the testsuite, esp. the log and async stuff? That was my main point: I think the testsuite has to be improved before seeing more widespread use. (Not catching error console errors in current testsuite is just *headbang*.)
> On 20.12.2009 10:40, Andrew Sutherland wrote: >> While said files may not provide all you need, they should hopefully >> at least help you
> A lot of this sounds like great work. Thanks!
> Can we integrate that in the testsuite, esp. the log and async stuff? > That was my main point: I think the testsuite has to be improved before > seeing more widespread use. (Not catching error console errors in > current testsuite is just *headbang*.)
Just to clarify: asyncTestUtils.js is already in the test suite.
>> I recognize that this will be a different style of development than >> some folks are used to and that the change may feel little a bumpy, so >> I want to be sure we make this policy change in a humane way. In >> particular, I'd like to us to be sensitive to the fact that there are >> a lot of patches in progress in Bugzilla today, and that we will >> suddenly be asking more of our contributors than we were at the time >> they started working on those patches.
> Can we first improve the test framework significantly, please?
> I've followed your wishes of creating tests, spent the last 2 weeks with > the testsuite, but it's been most painful.
Agreed this is still painful, however, I think that unless we require tests we're not going to find the pain points and we're not going to have any motivation to fix them.
> I'll try to formulate my pain objectively. Main pain points:
Excellent :-)
> setTimeout > All I'll say is: do_timeout(500, "endTest()"); > This is guaranteed undeterministic, and indeed the main problems were my > tests (adapted from existing tests) just running astray or just stopping > and I don't know where, why, or how to fix it. In my experience, almost > every instance of setTimeout() is just a hidden bug that that's just > papered over with setTimeout "because that's the only way it worked", > but nobody cared to find out the real reason and fix properly.
Agreed, we should be avoiding setTimeout wherever possible. I think that some of these were mainly because we didn't have async capabilities in the test suite, now we have it, we should find some time to go back and revisit some of the earlier tests.
I'm actually wondering if it is worth putting together a devmo page similar to "do's and don'ts" or a coding practice for unit tests (not just don't use setTimeout, but use tool foo instead.
> No error output > Sometimes, when I make a typo or error in tests or testsuite, it goes > astray with no output whatsoever (bug 535310). This makes coding a > severe pain, because I don't know what's wrong.
That's going to be a bit hard to track down. We've used long timeouts (!) in the past to try and ensure this doesn't happen, but I think it is a bit hard given the nature of the environment.
One problem with errors is that I've seen sometimes that mailnews "swallows" errors because they have been reported in listeners and aren't actually relevant to the bit of code calling the listeners. I think for them, we have to hunt them down on a case-by-case basis and be prepared to add in logging to the console (for example) so that extension authors (and tests) can track issues better.
> Copypasteritis > Also, there's a *ton* of code cuplication, like > prefSvc.setBoolPref("mail.biff.show_tray_icon", false); (and many others) > (why isn't that in head_?) and
Unfortunately it happens. Yes, we need to revisit some of these and make these into head_ or utility files.
> var thread = gThreadManager.currentThread; > while (thread.hasPendingEvents()) > thread.processNextEvent(true); > (why isn't that in tail_?).
Generally I think that should be a utility function not necessarily in tail - we need to call it at times that aren't just the end of the test.
> No documentation > There are currently 2-3 small documents, which don't even cover the API, > much less how to write IMAP tests (without copy&paste).
Which API are you talking about here?
> I could to create tests for POP3 and SMTP login, using copy&paste. But > today, I tried to simply test login (connection) to an IMAP server, but > couldn't make it work, the incomingServer.performExpand(null) continued > further asyncronously, I don't know what to wait for and how to watch > success. I gave up.
I think that's the point to ask and/or file a bug ;-) Documentation is something we need to get better at doing in an all-round sense, and I think that we should start requiring dev documentation on test suite changes and new additions.
> All over the place > Many tests have globals which cross files. That makes it hard to see > what's happening. I agree that tests need to be short and comfortable to > write, so it's good to have lots of common code like createIMAPServer() > or createLocalFolders() and friends, but these globals are not > necessary, that's just not sane coding.
This sounds like we just need a bug filing.
> Similarly, it's hard to find where stuff comes from, when it's comming > from all of head_*, mailnews/test/fakeserver/, mailnews/test/resources/, > mailnews/base/ and whatnot.
Do you mean globals or functions? Generally functions relevant to their area are kept in head files in mailnews/imap or mailnews/local for instance, whereas the fakeserver stuff should be fairly obvious, and everything else should be in mailnews/test/resources.
> Sorry for this form, I wanted to write my experiences and main pain > points in a more coherent form later, with possibly some ideas for > improvement...
It is certainly useful to get experiences of someone who hasn't written any tests before. I think getting some of these into bugs would be the best option and then we try and find some time in the new year to start improving them.
> Agreed this is still painful, however, I think that unless we require > tests we're not going to find the pain points
OK, I tried you use it, and made a list of concrete pain points. I think these all need to be fixed properly *before* tests can be required for everybody. Otherwise, people will suffer the same pain (with error output etc.), and learn the wrong test coding practices (like, thinking that setTimeout is fine).
> Agreed, we should be avoiding setTimeout wherever possible. I think > that some of these were mainly because we didn't have async > capabilities in the test suite, now we have it, we should find some > time to go back and revisit some of the earlier tests.
Good.
> I'm actually wondering if it is worth putting together a devmo page > similar to "do's and don'ts" or a coding practice for unit tests (not > just don't use setTimeout, but use tool foo instead.
Yes, please! Good idea.
>> No error output >> Sometimes, when I make a typo or error in tests or testsuite, it goes >> astray with no output whatsoever (bug 535310). This makes coding a >> severe pain, because I don't know what's wrong.
> That's going to be a bit hard to track down. We've used long timeouts > (!) in the past
Yes, I guess so. This is an absolute show-stopper for the testsuite, though.
> One problem with errors is that I've seen sometimes that mailnews > "swallows" errors because they have been reported in listeners and > aren't actually relevant to the bit of code calling the listeners.
That's then either a structural problem in the API (which should be fixed, because it will hurt other API users, too), or we just need to have utility functions (e.h. in head_) for the tests which hook up to the listeners which report errors and catch them.
>> while (thread.hasPendingEvents()) > Generally I think that should be a utility function not necessarily in > tail - we need to call it at times that aren't just the end of the test.
Yes, both utility function and calling it in tail_ . Note that there were just the most obvious examples. There's a lot more which is not as trivial as moving it to head_ or tail_, but should still be factored out into functions.
>> No documentation >> There are currently 2-3 small documents, which don't even cover the API, >> much less how to write IMAP tests (without copy&paste).
> Which API are you talking about here?
*snicker* "Which API?" (SCNR)
Any functions that test writers might use, either explicitly by calling it, or implicitly, because it's invoked by head_/tail_.
>> Similarly, it's hard to find where stuff comes from, when it's comming >> from all of head_*, mailnews/test/fakeserver/, mailnews/test/resources/, >> mailnews/base/ and whatnot.
> Do you mean globals or functions?
Both
> It is certainly useful to get experiences of someone who hasn't > written any tests before. I think getting some of these into bugs > would be the best option and then we try and find some time in the new > year to start improving them.
OK, thanks. My main point was, though, that this is so painful that this has to be solved *before* we can start requiring tests.