I have a question for those of you using test generation in industrial applications. Industrial projects typically have a test case management system that stores test cases, executes them, and tallies up metrics based on the test results.
My question: How do you fit test generation into that framework? Or, can you extend the framework to accommodate test generation?
For instance,
1) Traditional TCMs store test scripts. When generating tests, it is the test generation code that is important, not the generated scripts. So, what do you store?
2) Conventional metrics track test case counts. Test generation makes that measure irrelevant. So, what do you measure?
Here are approaches that don't work for me:
1) Generate test cases directly into the TCM and store them there. This approach allows you to pretend you have a static test suite, so the metrics fit better. But the approach feels awkward: it lacks the bug-finding power of continuously generating tests. And it doesn't work for online testing, where you generate tests and evaluate results in real-time.
2) Store just the test generator code itself. Generate tests only when you need them. In this case, the TCM is more like source code control. But what should you be measuring?
3) Generate some test cases into the TCM, and generate additional, on-the-fly tests that run but aren't counted or stored. This approach makes some sense to me, but it doesn't provide a way to measure the contribution of the generated tests. I am concerned that tests that are not part of the official structure will not be valued and maintained.
Finally, I could see maintaining two official efforts: one group of tests gets stored in the TCM; the other tests get generated on the fly. You could count the number of tests in the first case and something like coverage in the second. But that is an awkward arrangement that requires extra work, and you would have difficulty recoonciling the different metrics in reports.
I'd be very interested to hear people's thoughts.
thanks,
Harry
There were a whole lot of issues posed there, but I'm only going to
address one, because I had to do this recently.
I built a framework that would produce more-or-less arbitrary numbers
of tests. My validation measurements were all rates. For instance:
First run:
653 actions completed in 143 seconds with 7 failures:
Performance validation: 4.57 actions/second
Functional validation: 0.011 failures/actions
Second run:
714 actions completed in 162 seconds with 12 failures:
Performance validation: 4.41 actions/seconds
Functional validation: 0.017 failures/actions