Due to the glitch in the liveness track (see the update on the results) it seems to be time to enforce witnesses next year, at least in a mild form. The most strict rule would be the same as in the SAT competition. If no witness is provided for a satisfiable instance, the instance is not scored. A wrong witness will mark the model checker as buggy and it will not take part in the official ranking.
While requiring a witness will help rule out tool errors, it may be very hard to rule out incorrect unsatisfiable answers since their solution may require a combination of transforms, BDD-based engines, and SAT-based engines. I think that attempting to obtain known correct results for the benchmarks would be valuable. I leave to Armin's discretion if witnesses are to be required. I agree that this is relatively easy way to rule out one class of tool problems, if the tools owners are willing to provide them.