HWMCC'12 results announced

Armin Biere

unread,

Oct 30, 2012, 4:44:48 AM10/30/12

to hw...@googlegroups.com

The results are close to a final state.
See the competition home page for details.

http://fmv.jku.at/hwmcc12/results.html

Armin

Niklas Sörensson

unread,

Oct 30, 2012, 8:08:08 AM10/30/12

to hw...@googlegroups.com

Thanks Armin,

I want to generate subtables with more easily digested results. What
is the right way to extract the status of a single execution from the
logs? I.e. SAT/UNS/TIMEOUT/MEMOUT? If you have some script snippet for
this, can you send me a copy?

/Niklas

Armin Biere

unread,

Oct 30, 2012, 9:37:52 AM10/30/12

to hw...@googlegroups.com

This is a C file ... (scripts are too slow). It is attached,
but you find the result of this program as the flat
CSV spread sheet(s) which I think is all you need.
These are on the competition result page.
Armin

analyze.c

Niklas Sörensson

unread,

Oct 30, 2012, 9:47:46 AM10/30/12

to hw...@googlegroups.com

Thanks,

I want to do things like print a table for only tip and superprove for
instance, perhaps taking away some other information as well. Also
want to get per benchmark information for the multi and deep bounds
tracks.

/Niklas

Armin Biere

unread,

Oct 30, 2012, 9:58:21 AM10/30/12

to hw...@googlegroups.com

On Tue, Oct 30, 2012 at 2:47 PM, Niklas Sörensson <nikl...@gmail.com> wrote:

I want to do things like print a table for only tip and superprove for
instance, perhaps taking away some other information as well. Also
want to get per benchmark information for the multi and deep bounds
tracks.

I do this by opening the CSV script in Openoffice/Excel and then remove the
columns you are not interested in. Otherwise you could also tweak the attached
script.

For benchmark information for multi and deep you have to dig through
the log files or use 'analyze -v -v' (which will give discrepancies). I do not know
how you want to present the data there. For both the log file archives has
additional data though.

Armin

mktable.sh

Niklas Sörensson

unread,

Oct 30, 2012, 10:16:22 AM10/30/12

to hw...@googlegroups.com

Yeah, that's why I thought it better to just script it up myself. I do
need to know what is the official way to know the results from the log
for that though. I'll check out your c-code and figure it out. Thanks!

/Niklas

brayton

unread,

Oct 30, 2012, 1:19:05 PM10/30/12

to hw...@googlegroups.com

Armin,

Looks like the .cvs files are separated by semi-colons

Bob

Armin Biere

unread,

Oct 31, 2012, 3:34:17 AM10/31/12

to hw...@googlegroups.com

Yes, is this a problem?

Armin

brayton

unread,

Oct 31, 2012, 11:29:23 AM10/31/12

to hw...@googlegroups.com

No, I edited it and changed the all ; to , but in the last few years you
were using ,. It may cause others problems.

>>> http://fmv.jku.at/hwmcc12/**results.html<http://fmv.jku.at/hwmcc12/results.html>
>>>
>>> Armin
>>>
>>>

Armin Biere

unread,

Oct 31, 2012, 11:36:35 AM10/31/12

to hw...@googlegroups.com

Thanks!

Armin

brayton

unread,

Oct 31, 2012, 6:33:44 PM10/31/12

to hw...@googlegroups.com

There are discrepancies in the results for a few benchmarks for the solvers pdthrd11 and suprove11.
In most cases, I checked back to the beem data to see what was the expected result (i.e. proved by an explicit model checker). The expected result is shown as e.g. sokoban.1.prop1: reachable

On the following benchmarks, pdthrd11 which gets UNSAT incorrectly.

beemskbn1f1 pdthrd11 gets UNSAT and suprove, simsat and suprove11 get SAT. Also, according to the expected beem results this example is reachable, i.e. SAT. - sokoban.1.prop1: reachable
similarly for beemskbn2f1 - sokoban.2.prop1: reachable
similarly for beemadd4b1 - adding.4.prop1: reachable
similarly for beembrdg3b1 - bridge.3.prop1: reachable
similarly for beemfrogs1b1 - frogs.1.prop1: reachable
similarly for beemfrogs1f1 - frogs.1.prop1: reachable
similarly for beemhanoi1b1 - hanoi.1.prop1: reachable

For the following, pdthrd11 is the only one that reports UNSAT which is a correct result but it looks like the same bug as above since it gets the result in 14 sec., close to the time that the above incorrect.results were obtained.

beembrptwo1b2 - brp2.1.prop2: safe

Also suprove11 has a wrong result and a suspicious result. It gets SAT on

beemmsmie3b1 - It reports SAT but expected result is: msmie.3.prop1: safe. In addition beemmsmie3f1 is reported UNSAT by several engines. Since the 'f' and 'b' versions are the same property, and the expected resuslt is UNSAT, suprove11 is wrong here.
beemptrsn7b1 -I believe that the result SAT on this is suspect even though the correct result is not known. It looks like the same bug as above. The source of the bug was corrected several months ago.

Bob

On 10/31/2012 12:34 AM, Armin Biere wrote:

http://fmv.jku.at/hwmcc12/**results.html<http://fmv.jku.at/hwmcc12/results.html>

Armin

Armin Biere

unread,

Nov 5, 2012, 3:12:43 AM11/5/12

to hw...@googlegroups.com

The scripts for analyzing the results in the Multi track did not
work as expected (actually solvers with the same prefix
were counted multi times, which resulted in Tarmo and Tip
scoring too many properties).

After fixing this Tip and Mmpc change places in the Multi
track. So congratulations go to the new comer Mmpc to
win the Multi track. I have also updated the slides.

Armin

Armin Biere

unread,

Nov 12, 2012, 7:13:00 AM11/12/12

to hw...@googlegroups.com

Another update: the 'bob12m04' log files of Tip(BMC) were truncated
(from 3 GB to 2 GB, sigh). After exploring a couple of alternatives
we figured that 'fflush' alone in Tip does not do the trick. Either
an additional 'fsync' in Tip or adding a 'sync' command to the
launch script will make sure that the log file is not truncated.

Thus Tip and TipBMC actually score many more properties
(>18000 more) , but the ranking does not change. Mmpc
remains the winner of the multiple property track.

This experience raises the question whether the current format
is not too verbose. In general we do not want to have multi
GB witnesses. So this probably needs to be addressed
for the next competition.

Armin

Reply all

Reply to author

Forward