Its important to keep in mind that we can (and IMO should) differentiate between how the tests are filed during the evaluation process, and how they later are presented. Its also important to keep in mind that in any of these stages, including speculation on why a test failed (because of a bug? because of it not being supported? because of a misinterpretation made by the evaluator? etc) makes things brittle and messy, since many times this information will not be available to the evaluator.
So - assume the following:
1) In the test materials themselves,
* we carry info on whether each test pertains to optional or required ("ctest"/"otest") functionality
* we notate the tests with either of "pass", "fail", or "not yet tested" (the latter obviously being the initial state for all tests' result field)
2) In the eventual presentation (and same holds for db query)
* there is info available on which tests have actually been run at any given point
* there is info on whether each run test has failed or passed
* there is info on optional vs required for each test
This gives the presentation layer the ability to provide separate pass percentages for optional vs required functionality. It provides the presentation layer to use whatever verbiage it wants to describe failure states for both optional and required tests.
Since we now would also have data on which tests have actually been run, this data can be used to weight the measures; this is important not only because of partial test runs, but also applies directly to test suite updates; whenever a suite update is done, the system will automatically set "not yet tested" to tests that have changed or have been added, which would serve as a direct indicator that the results are not up to date.
To me, this seems like the least brittle and most straight forward approach.