HWMCC'12 results announced

91 views
Skip to first unread message

Armin Biere

unread,
Oct 30, 2012, 4:44:48 AM10/30/12
to hw...@googlegroups.com
The results are close to a final state.
See the competition home page for details.

http://fmv.jku.at/hwmcc12/results.html

Armin

Niklas Sörensson

unread,
Oct 30, 2012, 8:08:08 AM10/30/12
to hw...@googlegroups.com
Thanks Armin,

I want to generate subtables with more easily digested results. What
is the right way to extract the status of a single execution from the
logs? I.e. SAT/UNS/TIMEOUT/MEMOUT? If you have some script snippet for
this, can you send me a copy?

/Niklas

Armin Biere

unread,
Oct 30, 2012, 9:37:52 AM10/30/12
to hw...@googlegroups.com
This is a C file ... (scripts are too slow).  It is attached,
but you find the result of this program as the flat
CSV spread sheet(s) which I think is all you need.
These are on the competition result page.
Armin
analyze.c

Niklas Sörensson

unread,
Oct 30, 2012, 9:47:46 AM10/30/12
to hw...@googlegroups.com
Thanks,

I want to do things like print a table for only tip and superprove for
instance, perhaps taking away some other information as well. Also
want to get per benchmark information for the multi and deep bounds
tracks.

/Niklas

Armin Biere

unread,
Oct 30, 2012, 9:58:21 AM10/30/12
to hw...@googlegroups.com
On Tue, Oct 30, 2012 at 2:47 PM, Niklas Sörensson <nikl...@gmail.com> wrote:
I want to do things like print a table for only tip and superprove for
instance, perhaps taking away some other information as well. Also
want to get per benchmark information for the multi and deep bounds
tracks.

I do this by opening the CSV script in Openoffice/Excel and then remove the
columns you are not interested in.   Otherwise you could also tweak the attached
script.
 
For benchmark information for multi and deep you have to dig through
the log files or use 'analyze -v -v' (which will give discrepancies).  I do not know
how you want to present the data there.  For both the log file archives has
additional data though.

Armin
mktable.sh

Niklas Sörensson

unread,
Oct 30, 2012, 10:16:22 AM10/30/12
to hw...@googlegroups.com
Yeah, that's why I thought it better to just script it up myself. I do
need to know what is the official way to know the results from the log
for that though. I'll check out your c-code and figure it out. Thanks!

/Niklas

brayton

unread,
Oct 30, 2012, 1:19:05 PM10/30/12
to hw...@googlegroups.com
Armin,

Looks like the .cvs files are separated by semi-colons

Bob

Armin Biere

unread,
Oct 31, 2012, 3:34:17 AM10/31/12
to hw...@googlegroups.com
Yes, is this a problem?

Armin

brayton

unread,
Oct 31, 2012, 11:29:23 AM10/31/12
to hw...@googlegroups.com
No, I edited it and changed the all ; to , but in the last few years you
were using ,. It may cause others problems.
>>> http://fmv.jku.at/hwmcc12/**results.html<http://fmv.jku.at/hwmcc12/results.html>
>>>
>>> Armin
>>>
>>>

Armin Biere

unread,
Oct 31, 2012, 11:36:35 AM10/31/12
to hw...@googlegroups.com
Thanks!

Armin

brayton

unread,
Oct 31, 2012, 6:33:44 PM10/31/12
to hw...@googlegroups.com
There are discrepancies in the results for a few benchmarks for the solvers pdthrd11 and suprove11.
In most cases, I checked back to the beem data to see what was the expected result (i.e. proved by an explicit model checker). The expected result is shown as e.g. sokoban.1.prop1: reachable

On the following benchmarks, pdthrd11 which gets UNSAT  incorrectly.
  • beemskbn1f1 pdthrd11 gets UNSAT and suprove, simsat and suprove11 get SAT. Also, according to the expected beem results this example is reachable, i.e. SAT. - sokoban.1.prop1: reachable
  • similarly for beemskbn2f1 - sokoban.2.prop1: reachable
  • similarly for beemadd4b1 - adding.4.prop1: reachable
  • similarly for beembrdg3b1 - bridge.3.prop1: reachable
  • similarly for beemfrogs1b1 - frogs.1.prop1: reachable
  • similarly for beemfrogs1f1 - frogs.1.prop1: reachable
  • similarly for beemhanoi1b1 - hanoi.1.prop1: reachable

For the following, pdthrd11 is the only one that reports UNSAT which is a correct result but it looks like the same bug as above since it gets the result in 14 sec., close to the time that the above incorrect.results were obtained.

  • beembrptwo1b2 - brp2.1.prop2: safe

Also suprove11 has a wrong result and a suspicious result. It gets SAT on 

  • beemmsmie3b1 - It reports SAT but expected result is: msmie.3.prop1: safe. In addition beemmsmie3f1 is reported UNSAT by several engines. Since the 'f' and 'b' versions are the same property, and the expected resuslt is UNSAT, suprove11 is wrong here.
  • beemptrsn7b1 -I believe that the result SAT on this is suspect even though the correct result is not known. It looks like the same bug as above. The source of the bug was corrected several months ago.

Bob


On 10/31/2012 12:34 AM, Armin Biere wrote:

Armin Biere

unread,
Nov 5, 2012, 3:12:43 AM11/5/12
to hw...@googlegroups.com
The scripts for analyzing the results in the Multi track did not
work as expected (actually solvers with the same prefix
were counted multi times, which resulted in Tarmo and Tip
scoring too many properties).

After fixing this Tip and Mmpc change places in the Multi
track.  So congratulations go to the new comer Mmpc to
win the Multi track.  I have also updated the slides.

Armin

Armin Biere

unread,
Nov 12, 2012, 7:13:00 AM11/12/12
to hw...@googlegroups.com
Another update: the 'bob12m04' log files of Tip(BMC) were truncated
(from 3 GB to 2 GB, sigh).  After exploring a couple of alternatives
we figured that 'fflush' alone in Tip does not do the trick.  Either
an additional 'fsync' in Tip or adding a 'sync' command to the
launch script will make sure that the log file is not truncated.

Thus Tip and TipBMC actually score many more properties
(>18000 more) , but the ranking does not change.  Mmpc
remains the winner of the multiple property track.

This experience raises the question whether the current format
is not too verbose.  In general we do not want to have multi
GB witnesses.  So this probably needs to be addressed
for the next competition.

Armin
Reply all
Reply to author
Forward
0 new messages