R: Re: HWMCC'12 results announced - discrepancies and suspicious results.

Skip to first unread message

Gianpiero Cabodi

Nov 1, 2012, 3:45:17 AM11/1/12
to hw...@googlegroups.com

I'll start looking at logs next week.
What I can confirm is that pdtrav 2011 was buggy on beem*. 

This was due to a reduction that missed the very peculiar way beem reach failure: they seem like automata reaching a terminal state where the property holds forever. But the property can fail right on the transition to that state. When we reduced the terminal state we also removed transitions to it, which was unsound.


Inviato da Samsung Mobile

-------- Original message --------
Subject: Re: HWMCC'12 results announced - discrepancies and suspicious results.
From: brayton <bra...@eecs.berkeley.edu>
To: hw...@googlegroups.com

     There are discrepancies in the results for a few benchmarks for the solvers pdthrd11 and suprove11.
In most cases, I checked back to the beem data to see what was the expected result (i.e. proved by an explicit model checker). The expected result is shown as e.g. sokoban.1.prop1: reachable

On the following benchmarks, pdthrd11 which gets UNSAT  incorrectly.
  • beemskbn1f1 pdthrd11 gets UNSAT and suprove, simsat and suprove11 get SAT. Also, according to the expected beem results this example is reachable, i.e. SAT. - sokoban.1.prop1: reachable
  • similarly for beemskbn2f1 - sokoban.2.prop1: reachable
  • similarly for beemadd4b1 - adding.4.prop1: reachable
  • similarly for beembrdg3b1 - bridge.3.prop1: reachable
  • similarly for beemfrogs1b1 - frogs.1.prop1: reachable
  • similarly for beemfrogs1f1 - frogs.1.prop1: reachable
  • similarly for beemhanoi1b1 - hanoi.1.prop1: reachable

For the following, pdthrd11 is the only one that reports UNSAT which is a correct result but it looks like the same bug as above since it gets the result in 14 sec., close to the time that the above incorrect.results were obtained.

  • beembrptwo1b2 - brp2.1.prop2: safe

Also suprove11 has a wrong result and a suspicious result. It gets SAT on 

  • beemmsmie3b1 - It reports SAT but expected result is: msmie.3.prop1: safe. In addition beemmsmie3f1 is reported UNSAT by several engines. Since the 'f' and 'b' versions are the same property, and the expected resuslt is UNSAT, suprove11 is wrong here.
  • beemptrsn7b1 -I believe that the result SAT on this is suspect even though the correct result is not known. It looks like the same bug as above. The source of the bug was corrected several months ago.


On 10/31/2012 12:34 AM, Armin Biere wrote:
Yes, is this a problem?


On Tue, Oct 30, 2012 at 6:19 PM, brayton <bra...@eecs.berkeley.edu> wrote:


Looks like the .cvs files are separated by semi-colons


On 10/30/2012 1:44 AM, Armin Biere wrote:

The results are close to a final state.
See the competition home page for details.





Reply all
Reply to author
0 new messages