Strategic Test Suite results against our PicoChess Engines

595 views
Skip to first unread message

Al

unread,
Aug 7, 2017, 1:18:55 PM8/7/17
to PicoChess
Strategic Test Suite scores (all 1 core):

Hi all,

Thanks to Marc’s find, we now have a series of test positions to evaluate our Engines ELO Rating. Here’s the results of my tests against the current PicoChess Engines.

Stockfish 7: 3374 (3362 Marc)
Texel 1.06: 3050
Arasan v20.0.0: 2878
Rodent II 0.9.66: 2966
Zurichess graubuenden: 2379 *
Floyd x7da0e241: 2623
Cinnamon 2.0: 2107
Galjoen 0.36: 2031

*There is clearly something wrong with the Zurichess tests, as we know from running tournaments on SCID vs PC and Arena that it’s strength lies somewhere between Rodent II & Floyd. Most tests ran for 1.3 secs per test, however Zurichess clearly showed 1.3 secs per test, but the test number increment was clearly changing quicker than 1 second, probably more like 0.7 secs per test, hence the poor results (I tried 3 tests). Maybe the default is not the strongest?

Other Engine Tests:

Stockfish 8: 3407 + 3391

The complete set of results are available to anyone who wishes to see them (1500 tests per Engine)


Cheers,

Al.

Jürgen Precour

unread,
Aug 7, 2017, 2:42:03 PM8/7/17
to PicoChess
Thats great work, Al !

The results are similar to ccrl 40/40 (my current settings). I will change the pgn ratings from
stockfish 3300 => 3360
texel 3140 => 3050
cause the diff >50 points (my choosen offset, ha - anything below is not worth the effort).

This ranking (for example inside pgn file) are for INFO anyway....If y choose >1 core, other uci settings,... the result change - so dont blame me :-)

Jürgen

Al

unread,
Aug 13, 2017, 10:42:14 AM8/13/17
to PicoChess
Hi again,

We've updated 3 of our Engines, so here are their results plus a few more.
All the tests were run on RPi 3 using 1 core:

Zurichess Luzern: 2559*
Arasan v20.2.0: 2962
Floyd 0.9: 2617

Laser 1.5: 2877
Amoeba 2.4.1: 2765
Ethereal 8.23: 2657
Marvin 2.0.0: 2476
DanaSah 6.5: 2414
Pedone 1.6: 2804
Demolitio 1.0: 2891

* Zurichess continues to race through the tests and although showing a big improvement
on the previous version is clearly stronger than this result suggests.

Cheers,

Al.

Al

unread,
Sep 13, 2017, 4:23:33 AM9/13/17
to PicoChess
Here’s a few more Engine Tests, All 1 core:

Vajolet2: 1838
DanaSah 6.5 Limited Strength: 1493
Rodent III 0.210: 2903
Stockfish6: 3342

The following were run on a RPi3 running Raspberry Pi Stretch rather than Jessie:

Stockfish 8: 3472
Ethereal 8.25: 2674
Demolito 170826: 2995
Ethereal 8.27: 2692
Rodent III 0.213: 2982
Zurichess Neuchatel: 2680


Cheers,

Al.

Al

unread,
Sep 14, 2017, 7:36:13 AM9/14/17
to PicoChess
A couple more run on Stretch, 1 core:

Arasan v20.2.0-34-g80202a1: 2917
Ethereal 8.27: 2724


Cheers,

Al.

Al

unread,
Sep 14, 2017, 10:22:03 AM9/14/17
to PicoChess
The latest Rodent 3, again on Stretch 1 core:

Rodent III 0.214: 2981


Al.

Al

unread,
Sep 15, 2017, 6:53:47 AM9/15/17
to PicoChess
2 more on Stretch 1 core:

Stockfish 8 140917: 3421
Rodent III 0.216: 2989

Al

Al

unread,
Sep 15, 2017, 7:31:28 AM9/15/17
to PicoChess
Correction: Yesterday’s Ethereal 8.27 should have read Ethereal 8.28


Al.

Al

unread,
Sep 16, 2017, 5:52:14 AM9/16/17
to PicoChess
Hi all,

Sctr (no levels): 2799

Al.

Marc

unread,
Sep 18, 2017, 12:26:35 AM9/18/17
to PicoChess
My the rating results of Stockfish 7 on DGTPi.

Any comments on how good this fits your playing experience with those levels are welcome. I will try to gather results on other engines as well.

Engine: Stockfish 7
Level: 1
STS rating: 1145

Engine: Stockfish 7
Level: 2
STS rating: 1179

Engine: Stockfish 7
Level: 3
STS rating: 1311
Engine: Stockfish 7
Level: 4
STS rating: 1370

Engine: Stockfish 7
Level: 5
STS rating: 1561

Engine: Stockfish 7
Level: 6
STS rating: 1640

Engine: Stockfish 7
Level: 7
STS rating: 1657

Engine: Stockfish 7
Level: 8
STS rating: 1866

Engine: Stockfish 7
Level: 9
STS rating: 2038

Engine: Stockfish 7
Level: 10
STS rating: 2226

Engine: Stockfish 7
Level: 11
STS rating: 2463

Engine: Stockfish 7
Level: 12
STS rating: 2510

Engine: Stockfish 7
Level: 13
STS rating: 2505

Engine: Stockfish 7
Level: 14
STS rating: 2575

Engine: Stockfish 7
Level: 15
STS rating: 2587

Engine: Stockfish 7
Level: 16
STS rating: 2687

Engine: Stockfish 7
Level: 17
STS rating: 2650

Engine: Stockfish 7
Level: 18
STS rating: 2661

Engine: Stockfish 7
Level: 19
STS rating: 2715

Engine: Stockfish 7
Level: 20
STS rating: 3345

Al

unread,
Sep 18, 2017, 10:23:37 AM9/18/17
to PicoChess
Hi Marc,

I’m interested in how you get the different levels, I couldn’t find the correct parameter, what do you use?

Cheers,

Al.

Marc

unread,
Sep 18, 2017, 12:50:39 PM9/18/17
to PicoChess
Hi Al,

Changed the python script and added too more options to it just like the hash size option. I added those parameters and values to where the UCI engine is initialized and created a shell script to start the python script a couple of times with the right values.

i.e. for arasan:

#!/bin/bash
for i in 1000 1080 1160 1240 1320 1400 1480 1560 1640 1720 1800 1880 1960 2040 2120 2200 2280 2360 2440 2520 2600
do
   echo    sudo python sts_rating_v13.1.py --ucielo $i -f STS1-STS15_LAN.EPD -e /home/pi/engine_elo/c-arasan --proto uci -h 128 --getrating --log
   sudo python sts_rating_v13.1.py --ucielo $i -f STS1-STS15_LAN.EPD -e /home/pi/engine_elo/c-arasan --proto uci -h 128 --getrating --log
done

my results for Arasan

Engine: Arasan v20.2.0
uci_elo: 1000
STS rating: 910

Engine: Arasan v20.2.0
uci_elo: 1000
STS rating: 868

Engine: Arasan v20.2.0
uci_elo: 1080
STS rating: 980

Engine: Arasan v20.2.0
uci_elo: 1160
STS rating: 922

Engine: Arasan v20.2.0
uci_elo: 1240
STS rating: 1257

Engine: Arasan v20.2.0
uci_elo: 1320
STS rating: 1362

Engine: Arasan v20.2.0
uci_elo: 1400
STS rating: 1423

Engine: Arasan v20.2.0
uci_elo: 1480
STS rating: 1711

Engine: Arasan v20.2.0
uci_elo: 1560
STS rating: 1749

Engine: Arasan v20.2.0
uci_elo: 1640
STS rating: 1792

Engine: Arasan v20.2.0
uci_elo: 1720
STS rating: 1873

Engine: Arasan v20.2.0
uci_elo: 1800
STS rating: 1937

Engine: Arasan v20.2.0
uci_elo: 1880
STS rating: 1955

Engine: Arasan v20.2.0
uci_elo: 1960
STS rating: 2116

Engine: Arasan v20.2.0
uci_elo: 2040
STS rating: 2152

Engine: Arasan v20.2.0
uci_elo: 2120
STS rating: 2289

Engine: Arasan v20.2.0
uci_elo: 2200
STS rating: 2440

Engine: Arasan v20.2.0
uci_elo: 2280
STS rating: 2646

Engine: Arasan v20.2.0
uci_elo: 2360
STS rating: 2804

Engine: Arasan v20.2.0
uci_elo: 2440
STS rating: 2833

Engine: Arasan v20.2.0
uci_elo: 2520
STS rating: 2868

Engine: Arasan v20.2.0
uci_elo: 2600
STS rating: 2954

Al

unread,
Sep 18, 2017, 1:17:59 PM9/18/17
to PicoChess
Thanks Marc, nicely done.

Al.

Uwe Badermann

unread,
Sep 18, 2017, 4:21:05 PM9/18/17
to PicoChess
Hi Marc,

Conc. yr. q. about SF7 ratings:
I think as of lev. 9 they are too high compated to humans.
I am rated 1500 in DE eq. approx 1600 elo an can score good against lev. 9 but must have a good day for lev. 12 - although there is not missing much to be better.

Hope this helps you for judging the results

Uwe

Al

unread,
Sep 18, 2017, 5:13:12 PM9/18/17
to PicoChess
Hi all,

The more I look at the STS code the more I think it’s tailored to give the same results as the top PC that the author did his original test on. His were obviously done on a high spec PC, his program then works out the bench mark of our RPi and compares it against his bench mark, then adjusts the length of the test to compensate for the lower bench mark, in affect mimicking the CCRL grade that would have been received on his PC.

Here’s the beginning of a test:

Engine: /opt/picochess/engines/armv7l/a-stockf
Hash: 128, Threads: 1, MoveTime: 1.0s
Number of positions in STS1-STS15_LAN_v3.epd: 1500

Your bench : 46.474321s
My bench : 2.553400s
Analysis Time to get CCRL 40/4 rating estimate : 3640ms
Starting engine /opt/picochess/engines/armv7l/a-stockf ...
id name: Stockfish 7

As you can see, a RPi3 on Stretch is 18 times slower, therefor the test takes much longer, normally around 90 mins.
Naturally the longer it has on each position, the better the results.
(Consequently the bench mark of Raspbian on the same RPi is around 17 and the test takes around 30 mins, very strange)

So I think that although we get a good idea of the different ratio of each level, the results are a lot higher than the actual grade on a RPi.
Of course I could be totally wrong, it’s just my opinion, what do you guys think?


Cheers,

Al.

Marc

unread,
Sep 19, 2017, 5:10:01 PM9/19/17
to PicoChess
I did rerun the STS testsuite with the defaut 200ms and this are the results for the Stockfish 7 engine found below. Nice side effect the test does not run as long as before :-)

As always any comments on the calculated rating are welcome.


Engine: Stockfish 7
Level: 1
Movetime: 200
STS rating: 1160

Engine: Stockfish 7
Level: 2
Movetime: 200
STS rating: 1278

Engine: Stockfish 7
Level: 3
Movetime: 200
STS rating: 1288

Engine: Stockfish 7
Level: 4
Movetime: 200
STS rating: 1448

Engine: Stockfish 7
Level: 5
Movetime: 200
STS rating: 1540

Engine: Stockfish 7
Level: 6
Movetime: 200
STS rating: 1516

Engine: Stockfish 7
Level: 7
Movetime: 200
STS rating: 1703

Engine: Stockfish 7
Level: 8
Movetime: 200
STS rating: 1932

Engine: Stockfish 7
Level: 9
Movetime: 200
STS rating: 2113

Engine: Stockfish 7
Level: 10
Movetime: 200
STS rating: 2113

Engine: Stockfish 7
Level: 11
Movetime: 200
STS rating: 2165

Engine: Stockfish 7
Level: 12
Movetime: 200
STS rating: 2188

Engine: Stockfish 7
Level: 13
Movetime: 200
STS rating: 2213

Engine: Stockfish 7
Level: 14
Movetime: 200
STS rating: 2253

Engine: Stockfish 7
Level: 15
Movetime: 200
STS rating: 2251

Engine: Stockfish 7
Level: 16
Movetime: 200
STS rating: 2298

Engine: Stockfish 7
Level: 17
Movetime: 200
STS rating: 2311

Engine: Stockfish 7
Level: 18
Movetime: 200
STS rating: 2315

Engine: Stockfish 7
Level: 19
Movetime: 200
STS rating: 2393

Engine: Stockfish 7
Level: 20
Movetime: 200
STS rating: 2907

Al

unread,
Sep 20, 2017, 4:23:18 AM9/20/17
to PicoChess
Hi Marc,

That’s a good idea not to take the calculated time based on the bench mark. I think they are a lot closer to reality now, I’d be happier if level 20 was closer to 3100, perhaps you could try some different ms values with level 20 only until it’s closer to 3100, then we could use the same formulae for all other engines & levels?


Cheers,

Al.

Marc

unread,
Sep 21, 2017, 12:18:10 AM9/21/17
to PicoChess
Hi Al,

After reading a little more I found the script is calculating the CCRL 40/4 rating.

--getrating, calculate CCRL 40/4 rating estimate for uci engines only


Time control: Equivalent to 40 moves in 4 minutes on Athlon 64 X2 4600+ (2.4 GHz), about 1.5 minutes on a modern Intel CPU.

Since the rating list is more or less fixed to a certain CPU "Athlon 64 X2 4600+ (2.4 GHz)" and time control is set to 40 moves in 4min the Python script tries to estimate the speed of the PC where the script runs to evaluate the rating according the 40 moves in 4 minutes on the "Athlon 64 X2 4600+ (2.4 GHz)".

According to this it makes sence to increase the "movetime" from the 200ms on the PC Ferdinand Mosca used to the ~1.32s on the Raspberry PI3 to generate the CCRL 40/4 rating estimate.

So forget about my last rating values posted which a created with a movetime of 200ms. 

The "truth" or what we consider to match our playing level might be somewhere in between. And I also found a picture with the formula used to calculate the rating by the test results and the formula itself is just an estimation.



  

Marc

unread,
Sep 21, 2017, 12:37:54 AM9/21/17
to PicoChess
Picture with the formula. 


And python script to create a picture with Bar charts to highlight the different strength and weaknesses of several engines using the results gained from the sts_rating_v13.1.py script :-)

Al

unread,
Sep 21, 2017, 3:37:24 AM9/21/17
to PicoChess
Hi Marc,

Yes that what I was trying to say the other day, the tests should give the same results regardless
of what hardware they are run on by giving more time to run them on the weaker hardware, hence checking the bench mark first.

To get the ‘true’ grades on our hardware we first need to know at least 1 of our Engines grade to begin with. Then we can either run a series of tournaments on SCID vs PC or Arena, or use the STS tests by bypassing the bench mark check and finding the correct ms to run them at. As you said, not by slowing it down, but by speeding it up and giving less time for each test.

Cheers,

Al

Al

unread,
Sep 23, 2017, 5:59:52 PM9/23/17
to PicoChess
Hi all,

WyldChess (no levels): 2548
Marvin 2.1.0 (no levels): 2583

Cheers,

Al.

Al

unread,
Sep 24, 2017, 5:25:07 PM9/24/17
to PicoChess
Hi all,

Fruit 2.1 (old engine, no levels): 2776
Arasan v20.2.0-35-g0fce2e6: 2911


Cheers,

Al.

Al

unread,
Sep 25, 2017, 6:37:34 AM9/25/17
to PicoChess
Hi all,

Arasan v20.2.0-38-gc17bcdf: 2932


Cheers,

Al.

Al

unread,
Sep 26, 2017, 5:10:01 PM9/26/17
to PicoChess
Hi all,

I decided to download the latest source for all the versions of Stockfish to date, compile them and run these tests, They show the advance in
Strength at each level with a couple of slight hiccups. With Stockfish 8 it was the latest source from 4 days ago. (I expect an update tomorrow)

Stockfish 1.9.1: 3170 (no levels)
Stockfish 2.3.1: 3229
Stockfish 3: 3220
Stockfish 4: 3284
Stockfish DD (4.5): 3300
Stockfish 5: 3296
Stockfish 6: 3404
Stockfish 7: 3420
Stockfish 8 220917: 3424
— — — — — —
I also finally managed to compile Sayuri:

Sayuri 260917: 1959 (no levels)


Cheers,

Al.

Al

unread,
Sep 28, 2017, 8:16:00 AM9/28/17
to PicoChess
Hi all,

3 updates today, compiled and tested:

WyldChess: 2608
Arasan 2.0.20-40-gfd17cc2: 2910
Rodent III 0.216: 2981


Cheers,

Al.

Al

unread,
Sep 30, 2017, 11:46:43 AM9/30/17
to PicoChess
Hi all,

A flurry of Updated Engine compiles:

WyldChess 09292017: 2621 (no levels)
Sayuri 2017.09.29: 1961 (no levels)
Zevra v1.6.1 r512: (freezes at Test 7)
Stockfish8 300917: 3434
Marvin 2.2.0: 2781 (no levels)
Ethereal 8.28: 2756 (no levels)
WyldChess 09302017: 2624 (no levels)


Cheers,

Al.

Al

unread,
Oct 20, 2017, 9:54:23 AM10/20/17
to PicoChess
Hi all,

I’ve been on a Cruise so here’s a few Engine updates plus some new ones,

These were all compiled on Jessie, however some Engines no longer compile on native Jessie (c++11) so with some tricks I installed c++14 to compile these:


---- Jessie ----
Arasan v20.2.0-52-g2a2e75a: 2894
CT800 v1.12: 2147 (no levels)
Ethereal 8.30: 2667 (fails on PicoChess)
K2 0.88 dev: fails to test
SCTR: 2714 (no levels)
Stockfish8 16/10/17: 3397
Vajolet2 2.3.1: 2836 (no levels)
Zevra 20171006: 1876 (no levels)

---- Jessie c++14 ----
Nemorino: 2839 (no levels)
Robocide: 2332 (no levels)
Rodent III 0.225 32-bit/GCC 6.3.0 20170516: 2906
Rodent III 0.227 32-bit/GCC 6.3.0 20170516: 2877
Sting SF 8.8 (Stockfish derived): 3167
SugaR S_XPrO 191017 x32 (Stockfish derived): 3366

If anyone can compile Amoeba, Demolito or DragonTooth please let me know how.
I have found pre-compiled binaries for these.


Cheers,

Al.

Message has been deleted

Al

unread,
Oct 22, 2017, 8:13:10 AM10/22/17
to PicoChess
Hi all,

I made a mistake with my last post, only Nemorino and the 2 Rodent Engines we’re compiled at c++14 so these will only run on Stretch.
The rest will run on Jessie & Stretch.

Here’s another Engine update:

Arasan v20.2.0-53-gfc77c46: 2898


Cheers,

Al.

Al

unread,
Oct 24, 2017, 10:04:03 AM10/24/17
to PicoChess
Hi all,

Here’s some more Engine updates:

---- Jessie ----
Arasan v20.2.0-53-gfc77c46: 2898
Arasan v20.2.0-55-gfa6b1e0: 2919
Arasan v20.2.0-57-gd39becf: 2908
ChessPuter: 1093 test >8.5 hrs, others 30 mins, slow play on PicoChess
Ethereal 8.30: 2717
McBrain 231017 (Stockfish derived): 3399
Sting SF 8.9 (Stockfish derived): 3159
Stockfish8 221017: 3403
SugaR S_XPrO 221017 x32: (Stockfish derived): 3375
SugaR S_XPrO 231017 x32: (Stockfish derived): 3357
Vajolet2 2.3.6: 2908

---- Jessie c++14 ----
Nemorino: 2873
Rodent III 0.228 32-bit/GCC 6.3.0 20170516: 2843


Cheers,

Al.

Al

unread,
Oct 24, 2017, 2:09:47 PM10/24/17
to PicoChess
Hi All,

The Arasan Developer is very active:

Arasan v20.2.0-59-g52353b9: 2893


Cheers,

Al.

Al

unread,
Oct 25, 2017, 12:55:42 PM10/25/17
to PicoChess
Hi all,

A couple of more Engine updates:

Arasan v20.2.0-60-g3e41704: 2896
Ethereal 8.31: 2711


Cheers,

Al.

Al

unread,
Oct 30, 2017, 1:18:45 PM10/30/17
to PicoChess
A few Engine updates, now only on Stretch.
The STS benchmark for Stretch is 3 times worse than Jessie, so the tests take 3 times longer, in my opinion this inflates the grades somewhat:

—- Stretch —
Nemorino: 2987
Stockfish8 281017: 3455
Zevra 20171006: fails
Ethereal 8.32: 2757

This is my last update for around 4 weeks as my Transatlantic Cruise looms.


Cheers,

Al.
Reply all
Reply to author
Forward
0 new messages