Strength of Stockfish 7 in DGT Pi

Darryl Buller

unread,

Jul 11, 2017, 5:27:09 PM7/11/17

to PicoChess

Is the estimate of Stockfish 7's strength in the DGT Pi (vs. 0.75) accurate? The user's guide says 2570 at level 20, which is the highest level. But I played it in a 6-game match against Deep HIARCS 14 on my 4-core laptop, and the result was 5 draws and a win for Stockfish. Deep HIARCS is supposed to be over 3100. I know that 6 games is a small sample, but it doesn't seem like that result is very likely if it's 2570.

Shivkumar Shivaji

unread,

Jul 11, 2017, 5:42:04 PM7/11/17

to pico...@googlegroups.com

The user guide is outdated. The strength with the quad core pi2/3 should be 3100+ strength. The 2570 strength is on the "ancient" raspberry pi 1 version, arm v6 with just 700 mhz clock speed.

Thanks for testing!, Shiv

On Tue, Jul 11, 2017 at 2:27 PM, Darryl Buller <darryl...@gmail.com> wrote:

Is the estimate of Stockfish 7's strength in the DGT Pi (vs. 0.75) accurate? The user's guide says 2570 at level 20, which is the highest level. But I played it in a 6-game match against Deep HIARCS 14 on my 4-core laptop, and the result was 5 draws and a win for Stockfish. Deep HIARCS is supposed to be over 3100. I know that 6 games is a small sample, but it doesn't seem like that result is very likely if it's 2570.

--
You received this message because you are subscribed to the Google Groups "PicoChess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picochess+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

darryl.buller

unread,

Jul 11, 2017, 9:51:00 PM7/11/17

to pico...@googlegroups.com

Shiv,

Thanks!

Sent from my Galaxy Tab A

You received this message because you are subscribed to a topic in the Google Groups "PicoChess" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/picochess/7r8y-yfd39k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to picochess+...@googlegroups.com.

Jürgen Precour

unread,

Jul 12, 2017, 5:41:53 AM7/12/17

to PicoChess

I would say something between. I don't think all cores are turned on (in their/our ini file) out of the box.

Jürgen

Am Mittwoch, 12. Juli 2017 03:51:00 UTC+2 schrieb Darryl Buller:

Shiv,

Thanks!

Sent from my Galaxy Tab A

-------- Original message --------
From: Shivkumar Shivaji <sshi...@gmail.com>
Date: 7/11/17 17:41 (GMT-05:00)
To: pico...@googlegroups.com
Subject: Re: Strength of Stockfish 7 in DGT Pi

The user guide is outdated. The strength with the quad core pi2/3 should be 3100+ strength. The 2570 strength is on the "ancient" raspberry pi 1 version, arm v6 with just 700 mhz clock speed.

Thanks for testing!, Shiv

On Tue, Jul 11, 2017 at 2:27 PM, Darryl Buller <darryl...@gmail.com> wrote:

Is the estimate of Stockfish 7's strength in the DGT Pi (vs. 0.75) accurate? The user's guide says 2570 at level 20, which is the highest level. But I played it in a 6-game match against Deep HIARCS 14 on my 4-core laptop, and the result was 5 draws and a win for Stockfish. Deep HIARCS is supposed to be over 3100. I know that 6 games is a small sample, but it doesn't seem like that result is very likely if it's 2570.

--
You received this message because you are subscribed to the Google Groups "PicoChess" group.

To unsubscribe from this group and stop receiving emails from it, send an email to picochess+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Al

unread,

Jul 12, 2017, 7:27:05 AM7/12/17

to PicoChess

I have 3 cores turned on, I found that the DGT Pi got very hot using all 4 cores (no heatsinks)

Cheers,

Al.

Shivkumar Shivaji

unread,

Jul 22, 2017, 1:06:27 PM7/22/17

to pico...@googlegroups.com

I forgot that most people use just one core on the pi. Anyone did a test on the elo? The manual clearly needs an update from the 2570 level..

Shiv

Marc

unread,

Aug 4, 2017, 2:20:40 PM8/4/17

to PicoChess

STS Rating v13.1

Number of cores: 4

Engine: Stockfish 7

Hash: 128, Threads: 1, time/pos: 1.341s

Number of positions in STS1-STS15_LAN.EPD: 1500

Max score = 1500 x 10 = 15000

Test duration: 00h:00m:08s

Expected time to finish: 00h:34m:16s

STS rating: 3362

STS ID STS1 STS2 STS3 STS4 STS5 STS6 STS7 STS8 STS9 STS10 STS11 STS12 STS13 STS14 STS15 ALL

NumPos 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1500

BestCnt 84 69 78 70 76 75 71 69 63 85 69 69 75 69 48 1070

Score 869 809 866 786 829 881 787 815 707 890 789 766 833 781 738 12146

Score(%) 86.9 80.9 86.6 78.6 82.9 88.1 78.7 81.5 70.7 89.0 78.9 76.6 83.3 78.1 73.8 81.0

Rating 3626 3359 3613 3257 3448 3680 3261 3386 2905 3720 3270 3168 3466 3234 3043 3362

:: STS ID and Titles ::

STS 01: Undermining

STS 02: Open Files and Diagonals

STS 03: Knight Outposts

STS 04: Square Vacancy

STS 05: Bishop vs Knight

STS 06: Re-Capturing

STS 07: Offer of Simplification

STS 08: Advancement of f/g/h Pawns

STS 09: Advancement of a/b/c Pawns

STS 10: Simplification

STS 11: Activity of the King

STS 12: Center Control

STS 13: Pawn Play in the Center

STS 14: Queens and Rooks to the 7th rank

STS 15: Avoid Pointless Exchange

:: Top 5 STS with high result ::

1. STS 10, 89.0%, "Simplification"

2. STS 06, 88.1%, "Re-Capturing"

3. STS 01, 86.9%, "Undermining"

4. STS 03, 86.6%, "Knight Outposts"

5. STS 13, 83.3%, "Pawn Play in the Center"

:: Top 5 STS with low result ::

1. STS 09, 70.7%, "Advancement of a/b/c Pawns"

2. STS 15, 73.8%, "Avoid Pointless Exchange"

3. STS 12, 76.6%, "Center Control"

4. STS 14, 78.1%, "Queens and Rooks to the 7th rank"

5. STS 04, 78.6%, "Square Vacancy"

DJ Dekker

unread,

Aug 4, 2017, 3:47:12 PM8/4/17

to PicoChess

Hi Darryl,

In February-March-April 2016 there was a discussion on this forum about measuring the strength of Stockfish 7 on Raspberry Pi. You can find it if you search for "Engine match on Raspberry Pi versus PC". It explains how the estimated strength of 3100 Elo was established.

Greetings,
DJ

Al

unread,

Aug 4, 2017, 4:09:51 PM8/4/17

to PicoChess

Hi Marc,

That’s a lot of work you’ve done there. Let me just make sure I understand your results correctly.

Are you saying that each of the 15 Strategic Test Suites (100 tests in each) are run for exactly 8 seconds each?

If so, I just ran all 100 positions of Strategic Test Suite 001 (undermining) at 8 secs per position using 4 cores on the DGT Pi (Rpi 3)

93 perfect 10s plus a 4, 6, 4, 0, 1, 1, 4 = 20, so a total of 950, 95%

I’ll refrain from running the other 14 tests until I’m sure I’m doing the right thing.

Cheers,

Al.

Marc

unread,

Aug 5, 2017, 5:20:27 AM8/5/17

to PicoChess

Hi Al

I used this Python script from below forum post. Ferdinand Mosca did a great job to create this script.

The script estimates the cpu speed in comparison to the developers cpu and defines how long each position should run. And then it siomply goes through all of the test suites.

http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=700402&t=56653

The only work was just to get it to run on linux. because it didn´t close a PIPE properly and stopped with an exception.

But just after fixing it myself I found this:

http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=700495&t=56653&sid=034edfa82d337a250c8a2af67deb9333

Marc

unread,

Aug 5, 2017, 5:33:19 AM8/5/17

to PicoChess

I started the script with:

python sts_rating_v13.1.py -f STS1-STS15_LAN.EPD -e /opt/picochess/engines/armv7l/a-stockf --proto uci -h 128 --getrating --log

Each test did run for 1.37 seconds ion the Raspberry PI3 but this is what the script estimated on its own.

The testsuites where included in the first download in the post

http://www.talkchess.com/forum/viewtopic.php?topic_view=threads&p=628006&t=56653

Al

unread,

Aug 6, 2017, 6:40:02 AM8/6/17

to PicoChess

Hi Marc,

I clearly ran my program for way too long, I was using a modified version of the Bratko-Kopec program I have for Linux, as I could only find a Windows version of the Strategic Test Suites.

I have now downloaded and added the corrections you pointed me at, however as I’m used to Python3, I was trying to get it work on that. I changed all the Print commands to include brackets but struggled with other strings, so ended up running it under Python only.

Here’s my results from this mornings run against my compile of Stockfish8 from the 30th of July on 1 core:

STS Rating v13.1
Number of cores: 4

Engine: Stockfish8 300717
Hash: 128, Threads: 1, time/pos: 1.334s

Number of positions in STS1-STS15_LAN_v3.epd: 1500

Max score = 1500 x 10 = 15000

Test duration: 00h:00m:09s
Expected time to finish: 00h:34m:06s
STS rating: 3407

STS ID STS1 STS2 STS3 STS4 STS5 STS6 STS7 STS8 STS9 STS10 STS11 STS12 STS13 STS14 STS15 ALL
NumPos 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1500

BestCnt 84 71 75 76 75 78 74 68 70 82 71 68 68 62 54 1076
Score 886 787 859 849 826 909 798 828 803 875 808 784 798 769 717 12296
Score(%) 88.6 78.7 85.9 84.9 82.6 90.9 79.8 82.8 80.3 87.5 80.8 78.4 79.8 76.9 71.7 82.0
Rating 3702 3261 3582 3537 3435 3804 3310 3444 3332 3653 3355 3248 3310 3181 2949 3407

I have just pulled the latest Stockfish 8 code so will compile that and re-test. This time I’ll try all 4 cores.

Cheers,

Al.

Al

unread,

Aug 6, 2017, 7:49:39 AM8/6/17

to PicoChess

Hi Marc,

Here’s the results from today’s compile of the latest Stockfish8 code using all 4 cores, weirdly it’s slightly weaker than the last 1 core test, I suspect my RPi3 with heat sincs inside the DGT Pi Clock overheated and throttled back:

Engine: Stockfish8 060817
Hash: 128, Threads: 4, time/pos: 1.313s

Number of positions in STS1-STS15_LAN_v3.epd: 1500
Max score = 1500 x 10 = 15000
Test duration: 00h:00m:09s

Expected time to finish: 00h:33m:34s
STS rating: 3395

STS ID STS1 STS2 STS3 STS4 STS5 STS6 STS7 STS8 STS9 STS10 STS11 STS12 STS13 STS14 STS15 ALL
NumPos 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1500

BestCnt 79 68 76 77 76 74 74 67 63 82 72 76 74 63 54 1075
Score 845 790 836 859 846 886 814 834 717 865 819 838 836 751 719 12255
Score(%) 84.5 79.0 83.6 85.9 84.6 88.6 81.4 83.4 71.7 86.5 81.9 83.8 83.6 75.1 71.9 81.7
Rating 3519 3274 3479 3582 3524 3702 3381 3470 2949 3608 3404 3488 3479 3101 2958 3395

Cheers,

Al.

Al

unread,

Aug 6, 2017, 4:25:11 PM8/6/17

to PicoChess

Hi Marc,

I re-tested the latest Stockfish 8 compile on 1 core, it scored 3391 close to the 3395 on 4 cores, so it seems as though the compile on 300717 was slightly stronger. It’s for this reason that I always keep my latest 3 Stockfish 8 compiled Engines on PicoChess.

As a matter of interest, I ran the STS test against Texel, which scored 3050, both the last 2 detailed results are available on request.

Cheers,

Al.

Marc

unread,

Aug 7, 2017, 6:45:13 PM8/7/17

to PicoChess

Since noone is expected to play with the strengh of stockfish it would be interesting how good this test would actually estimate strengh of i.e. Rodent characters and in comparison with some real games played.

Would it allow to estimate the relative strengh of all the different characters especially when changing nps and blur value with the script a had posted in another post.

Interesting would be if the assumed weaknesses matches with the parameters set within rodent.uci for the according characters.

This would allow to characterize the playing characters even better and to change the ~rating displayed when playing against in picochess.

Al

unread,

Aug 8, 2017, 4:12:56 AM8/8/17

to PicoChess

Hi Marc,

Yes that would be nice, I doubt if you could pass that many UCI parameters to the test before starting it though. In Rodent II, there are loads of parameter changes required for each personality, for example here are personality Mark’s parameters:

[Mark]
; Mark personality for Rodent II
; author: Pawel Koziol
; 1500 Elo defensive player who likes own mobility
;
PawnValue = 100
KnightValue = 325
BishopValue = 335
RookValue = 500
QueenValue = 1000
KeepPawn = 0
KeepKnight = 0
KeepBishop = 0
KeepRook = 0
KeepQueen = 0
BishopPair = 50
KnightPair = -10
ExchangeImbalance = 25
KnightLikesClosed = 6
RookLikesOpen = 3
Material = 100
OwnAttack = 100
OppAttack = 120
OwnMobility = 120
OppMobility = 100
KingTropism = 20
PiecePlacement = 100
PiecePressure = 100
PassedPawns = 100
PawnStructure = 100
Lines = 100
Outposts = 100
PawnShield = 120
PawnStorm = 100
Forwardness = 0
DoubledPawnMg = -12
DoubledPawnEg = -24
IsolatedPawnMg = -10
IsolatedPawnEg = -20
IsolatedOnOpenMg = -10
BackwardPawnMg = -8
BackwardPawnEg = -10
BackwardOnOpenMg = -8
PstStyle = 0
MobilityStyle = 0
NpsLimit = 450
EvalBlur = 50
Contempt = 0
SlowMover = 100
Selectivity = 175
BookFilter = 20
; GuideBookFile = books/guide/mini.bin JP! doesnt exist
MainBookFile = books/mini.bin

I’d be happier if we could pass on the ‘limitstrength’ UCI parameter used in our Engines at lower levels, for example in Arasan, 1 of the lower levels that I use is:

[Elo@1640]
uci_elo = 1640
uci_limitstrength = true

I think it’s more realistic we could pass these 2 parameters on to the test and see how close the ELO actually is, I’ll give it a try ....

Nice idea,

Al.

Uwe Badermann

unread,

Sep 5, 2017, 9:03:21 AM9/5/17

to PicoChess

Hello Al,

I know your testing a lot in RPi vs RPi matches, so I like to ask, if

* you have also ELO Rating estimates on other levels then maximum one?

* if not, if you are planning to do - or maybe not in case it makes no sense

I think it is a good idea to have at least any profund/realistic estimate on the engines strength rather than having the 2900, 3000 ... Ratings.

Thanks for your Information and expertise in advance

Cheers

Uwe

Al

unread,

Sep 5, 2017, 2:37:14 PM9/5/17

to PicoChess

Hi Uwe,

I did try to get the STS tests to evaluate lower levels but never managed it. However it will be possible to run a tournament between the levels and evaluate the results using ORDO using the grade for level 20 (the top level) as a reference anchor to get the ratings for the other levels. I won’t use every Stockfish Level but those selected via using the Black Queen, so levels 3, 6, 9, 12, 15, 18 & 20.

This will take a while but we have done something similar before.

Watch this space.

Al.

Uwe Badermann

unread,

Sep 5, 2017, 3:15:32 PM9/5/17

to PicoChess

Hello Al,

appreciate very, very much the efforts you spend by work and by electricity bill in this question!

Will catch an curious eye on your postings here.

Thank you very much!

Have a nice evening

Uwe

Al

unread,

Sep 8, 2017, 10:34:37 AM9/8/17

to PicoChess

Hi Uwe,

I ran a Stockfish 7 tournament over the last 2 days with levels 20 (top), 18, 15, 12, 09, 06 & 03.
7 Levels all play all, 8 rounds at 5 mins per game each. So 168 games taking just over 28 hours.
I double checked all my parameters, as level 15 slightly out performed level 18 which is strange.

Here’s the cross table:

Scid vs. PC
?, 2017.09.06 - 2017.09.07
Score Stf7l20 Stf7l15 Stf7l18 Stf7l12 Stf7l09 Stf7l06 Stf7l03
-------------------------------------------------------------------------------------------------
1: Stockfish lvl20 47.0 / 48 XXXXXXXX 111111=1 11=11111 11111111 11111111 11111111 11111111 (+46 -0 =2)
2: Stockfish lvl15 35.5 / 48 000000=0 XXXXXXXX 1==00101 11011111 11111111 11111111 11111111 (+34 -11 =3)
3: Stockfish lvl18 34.0 / 48 00=00000 0==11010 XXXXXXXX 111011== 11111=11 11111111 11111111 (+31 -11 =6)
4: Stockfish lvl12 27.0 / 48 00000000 00100000 000100== XXXXXXXX 11111111 11111111 11111111 (+26 -20 =2)
5: Stockfish lvl 09 14.5 / 48 00000000 00000000 00000=00 00000000 XXXXXXXX 01111011 11111111 (+14 -33 =1)
6: Stockfish lvl 06 9.5 / 48 00000000 00000000 00000000 00000000 10000100 XXXXXXXX 11=11111 (+9 -38 =1)
7: Stockfish lvl 03 0.5 / 48 00000000 00000000 00000000 00000000 00000000 00=00000 XXXXXXXX (+0 -47 =1)
-------------------------------------------------------------------------------------------------
168 games: +80 -80 =8

Here’s the Ordo grade calculations:

Firstly assuming Stockfish 7 Level 20 is graded at 3100:

# PLAYER : RATING POINTS PLAYED (%)
1 Stockfish 7 : 3100.0 47.0 48 97.9%
2 Stockfish lvl15 : 2623.4 35.5 48 74.0%
3 Stockfish lvl18 : 2577.7 34.0 48 70.8%
4 Stockfish lvl12 : 2357.6 27.0 48 56.2%
5 Stockfish lvl 09 : 1756.2 14.5 48 30.2%
6 Stockfish lvl 06 : 1535.8 9.5 48 19.8%
7 Stockfish lvl 03 : 1011.3 0.5 48 1.0%

White advantage = 0.00
Draw rate (equal opponents) = 50.00 %

Secondly assuming the STS Ratings are correct with Level 20 nearer 3350:

# PLAYER : RATING POINTS PLAYED (%)
1 Stockfish 7 : 3350.0 47.0 48 97.9%
2 Stockfish lvl15 : 2873.4 35.5 48 74.0%
3 Stockfish lvl18 : 2827.7 34.0 48 70.8%
4 Stockfish lvl12 : 2607.6 27.0 48 56.2%
5 Stockfish lvl 09 : 2006.2 14.5 48 30.2%
6 Stockfish lvl 06 : 1785.8 9.5 48 19.8%
7 Stockfish lvl 03 : 1261.3 0.5 48 1.0%

I must admit I’m more inclined to believe the first set of figures as I’m around 1600 - 1650 ELO and am on par with level 7.

What’s your thoughts, what’s your ELO rating and which level do you prefer playing

Cheers,

Al.

Uwe Badermann

unread,

Sep 8, 2017, 4:11:01 PM9/8/17

to PicoChess

Hello Al,

Many thanks for this profund work!

I can agree to your summary that the Ordo grades look more realistic.

My official rating is similar to yours and I have 75% chances against level 7, a slight edge against level 9 but need to train more for level 12 :-(

A few days ago I was aware that lichess is giving the following ratings for their SF8. The following I have noted:

level 1 @ 1350

level 2 @ 1420

level 3 @ 1500

level 4 @ 1600

level 5 @ 1700

But considering our human rating these figures also seem to be a little too high.

Kindest regards

Uwe

Marc

unread,

Nov 26, 2020, 10:10:32 AM11/26/20

to PicoChess

I changed the most recent sts_rating.py (version 14.0) available on github (https://github.com/fsmosca/STS-Rating.git) so that it can also read --ucielo parameter and pass that to the engine during initialization.

In addition to cover the new nnue feature in Stockfish and others I added an option --nnue to turn on or off a net called nn.nnue which needs to be in same folder as the engine.

I tested with the most current Stockfish compiled from https://github.com/official-stockfish/Stockfish.git & copied the nn-xxxxx.nnue from scr folder to nn.nnue to match the file name sts_ratingv14.1 is looking for.

I named the modified script v14.1mb

BTW

sts_tating.py it is now compatible with Python3

sts_rating_v14.1mb.py

Reply all

Reply to author

Forward