Stockfish contempt testing

1125 views
Skip to first unread message

Leonardo Ljubičić aka Dragon Mist

unread,
Oct 29, 2019, 5:32:48 PM10/29/19
to FishCooking
A week ago I decided to do some contempt testing, so far only 2 data points (1 match lasts about 3 days). The plan is to pit SF20102019 bmi2 against SF10 bmi2, H6.03 x64-pext and K10 to cover close, mid-range, and weaker opponents over the range of contempt values - if I don't lose interest first.

Conditions:
Games - 1,000 games per match
HW - i5-4670k @ 4.3 GHz
PB - Off
TC - 1m+1s
Hash - 1GB each
TB - full 6 men syzygy on SSD
Book - ICCF database limited to 3 moves

Next up SFc20 vs H6.03, will report here.

Cont2.PNG


Cont1.PNG



Stefan Geschwentner

unread,
Oct 30, 2019, 6:10:57 AM10/30/19
to FishCooking
Thanks very interesting and important.

We should know more about the effect of different contempts against various opponents.
i explore on fishtest a little bit more contempt against SF itself.

puck

unread,
Oct 30, 2019, 7:41:37 AM10/30/19
to FishCooking
Thank you so much for the contempt testing.
I hope someone with a good GPU will also test different contempt values against Leela since Leela right now is our greatest rival.

Leonardo Ljubičić aka Dragon Mist

unread,
Oct 30, 2019, 8:19:54 AM10/30/19
to FishCooking
I'll switch to vs SF10 default testing as soon as I have any conclusion about optimum contempt vs H6.03.

Alayan

unread,
Oct 30, 2019, 10:47:53 AM10/30/19
to FishCooking
With 1000 games per contempt setting, the error bars are too big for reliable conclusions. On 1K games, your error bars must be something like +-15 95%.

lucasme...@gmail.com

unread,
Oct 30, 2019, 11:30:46 AM10/30/19
to FishCooking
How contempt works according to the game stage? It is diferent in midgame in comparison to the endgame?

Leonardo Ljubičić aka Dragon Mist

unread,
Oct 30, 2019, 12:35:31 PM10/30/19
to FishCooking
Fully aware of that, more games would be just too time consuming. I hoper several data points and against various class of opponents should still give some insight on trends. Also I am not aware anyone tried any serious testing on contempt like this, so better than nothing.

joost.van...@gmail.com

unread,
Oct 30, 2019, 6:13:17 PM10/30/19
to FishCooking
yes, I think it is useful (the graph should presumably have the error bars on it). We can compare with


From that graph, one could expect that against an engine with +- 120 Elo difference the results from the Stockfish 8 LTC graph are most relevant, and that would suggest contempt of 60 yield maximum Elo, roughly 10 Elo more than contempt of 20.

Nordlandia

unread,
Oct 31, 2019, 2:48:23 PM10/31/19
to FishCooking
Leonardo Ljubičić aka Dragon Mist: default contempt for Houdini is clearly now optimal against SF Dev or SF10. A value of zero if not negative is capable of scoring better overall. 

Leonardo Ljubičić aka Dragon Mist

unread,
Oct 31, 2019, 4:36:57 PM10/31/19
to FishCooking
Be that as it may, I have no interest in experimenting with Houdini contempt, only Stockfish.

Leonardo Ljubičić aka Dragon Mist

unread,
Nov 2, 2019, 5:34:02 AM11/2/19
to FishCooking
New data point, c20 vs H6.03. Next up c24 vs SF10.

Cont2.PNG


Cont1.PNG


 

Nordlandia

unread,
Nov 2, 2019, 11:32:28 AM11/2/19
to FishCooking
That's not my point. Considering the relative age of Houdini. Contempt of 0 is appropriate. No need to test various Houdini values. 0 is the objectively highest level of play per manual. So there is no good reason to use optimistic contempt for Houdini against SF now. 

Leonardo Ljubičić aka Dragon Mist

unread,
Nov 2, 2019, 1:10:19 PM11/2/19
to FishCooking
The point of this excercise is measuring how different contempt for Stockfish fares against various opponents. It can be against opponents A, B and C. Or it can be against D, E and F. Houdini default is just some opponent. I do not care if the elo diff for given SF contempt is X or Y against some opponent. What is interesting is elo diff for various STOCKFISH contempt.

Leebot

unread,
Nov 2, 2019, 4:53:19 PM11/2/19
to FishCooking
Interesting that increasing draw avoidance (C28) leads to more draws (578).


Nickolas

unread,
Nov 2, 2019, 9:36:04 PM11/2/19
to FishCooking
Well, I'm not sure that it's ever been established that the success of Stockfish's contempt implementation really has much to do with draw avoidance, though draw avoidance is why it was implemented. Maybe drawing less is merely a side effect of preferring more complicated midgamey positions to less complicated endgamey positions. 

Maybe the size of the contempt bonus itself isn't as important as the relative difference between the midgame and endgame bonuses. Maybe we should be tuning the ½ coefficient in make_score(ct, ct / 2)
 and make_score(dct, dct / 2) rather than Options["Contempt"] itself? Maybe that coefficient should be dynamic based on some relevant features in the current position?

Anyway, it's not at all surprising that once you get to the point where extra contempt objectively weakens Stockfish against competent opponents, some wins turn into draws and losses.

Leonardo Ljubičić aka Dragon Mist

unread,
Nov 3, 2019, 4:23:20 AM11/3/19
to FishCooking
Yeah, it is probably that.

Leonardo Ljubičić aka Dragon Mist

unread,
Nov 5, 2019, 5:12:35 PM11/5/19
to FishCooking
New data point SF20102019 default contempt (24) vs SF10 (default) 54 elo diff. This is 4 core testing so 54 is pretty close to expectation (judging by the regression test I expected 45-53). Next up SF20102019 c=28 vs SF10 default.

Cont2.PNG

Cont1.PNG



Leonardo Ljubičić aka Dragon Mist

unread,
Nov 8, 2019, 2:23:17 PM11/8/19
to FishCooking
Finished SF20102019 c=28 vs SF10 (default) and got +68 elo. I have to say I did not see that one coming. Next up SF20102019 c=20 vs SF10 (default).

Cont2.PNG


Cont1.PNG



joost.van...@gmail.com

unread,
Nov 9, 2019, 11:53:41 AM11/9/19
to FishCooking


On Friday, 8 November 2019 20:23:17 UTC+1, Leonardo Ljubičić aka Dragon Mist wrote:
Finished SF20102019 c=28 vs SF10 (default) and got +68 elo. I have to say I did not see that one coming. Next up SF20102019 c=20 vs SF10 (default).

This is within the +- 15 Elo error bars of the C=24 result... unfortunately the error margin is large on this test. Still, especially those tests that can't be done on the framework (i.e. vs other engines) are useful. I would be quite interested in seeing C=60 or so vs Houdini.
 

Cont2.PNG


Cont1.PNG



Leonardo Ljubičić aka Dragon Mist

unread,
Nov 10, 2019, 12:58:23 PM11/10/19
to FishCooking
I am fully aware that the error bars deminish value of this test significantly. But I have no practical alternative. So the main goals here are 1) I have some fun testing, 2) obtain some trends that might give more insight on how contempt behaves versus various opponents.
After I finish vs SF10, I'll do 3 data points vs K10 (even weaker opponent than H6.03). After that I will try to widen the test with maybe C=16 and C=32 against all.
Reply all
Reply to author
Forward
0 new messages