All,
I'll start looking at logs next week.
What I can confirm is that pdtrav 2011 was buggy on beem*.
This was due to a reduction that missed the very peculiar way beem reach failure: they seem like automata reaching a terminal state where the property holds forever. But the property can fail right on the transition to that state. When we reduced the terminal state we also removed transitions to it, which was unsound.
Gianpiero
Inviato da Samsung Mobile
-------- Original message --------
Subject: Re: HWMCC'12 results announced - discrepancies and suspicious results.
From: brayton <
bra...@eecs.berkeley.edu>
To:
hw...@googlegroups.com CC:
There are discrepancies in the results for a few benchmarks for the
solvers
pdthrd11 and
suprove11.
In most cases, I checked back to the beem data to see what was the
expected result (i.e. proved by an explicit model checker). The
expected result is shown as e.g. sokoban.1.prop1: reachable
On the following benchmarks,
pdthrd11 which gets UNSAT
incorrectly.
- beemskbn1f1 pdthrd11 gets UNSAT and suprove, simsat and
suprove11 get SAT. Also, according to the expected beem results
this example is reachable, i.e. SAT. - sokoban.1.prop1:
reachable
- similarly for beemskbn2f1 - sokoban.2.prop1: reachable
- similarly for beemadd4b1 - adding.4.prop1: reachable
- similarly for beembrdg3b1 - bridge.3.prop1: reachable
- similarly for beemfrogs1b1 - frogs.1.prop1: reachable
- similarly for beemfrogs1f1 - frogs.1.prop1: reachable
- similarly for beemhanoi1b1 - hanoi.1.prop1: reachable
For the following, pdthrd11 is the only one that reports
UNSAT which is a correct result but it looks like the same bug as
above since it gets the result in 14 sec., close to the time that
the above incorrect.results were obtained.
- beembrptwo1b2 - brp2.1.prop2: safe
Also suprove11 has a wrong result and a suspicious
result. It gets SAT on
- beemmsmie3b1 - It reports SAT but expected result is:
msmie.3.prop1: safe. In addition beemmsmie3f1 is reported UNSAT
by several engines. Since the 'f' and 'b' versions are the same
property, and the expected resuslt is UNSAT, suprove11 is wrong
here.
- beemptrsn7b1 -I believe that the result SAT on this is suspect
even though the correct result is not known. It looks like the
same bug as above. The source of the bug was corrected several
months ago.
Bob
On 10/31/2012 12:34 AM, Armin Biere
wrote:
Yes, is this a problem?
Armin
On Tue, Oct 30, 2012 at 6:19 PM, brayton <bra...@eecs.berkeley.edu> wrote:
Armin,
Looks like the .cvs files are separated by semi-colons
Bob
On 10/30/2012 1:44 AM, Armin Biere wrote:
The results are close to a final state.
See the competition home page for details.
http://fmv.jku.at/hwmcc12/**results.html<http://fmv.jku.at/hwmcc12/results.html>
Armin