Strength of Stockfish 7 in DGT Pi

1,273 views
Skip to first unread message

Darryl Buller

unread,
Jul 11, 2017, 5:27:09 PM7/11/17
to PicoChess
Is the estimate of Stockfish 7's strength in the DGT Pi (vs. 0.75) accurate?  The user's guide says 2570 at level 20, which is the highest level.  But I played it in a 6-game match against Deep HIARCS 14 on my 4-core laptop, and the result was 5 draws and a win for Stockfish.  Deep HIARCS is supposed to be over 3100.  I know that 6 games is a small sample, but it doesn't seem like that result is very likely if it's 2570.

Shivkumar Shivaji

unread,
Jul 11, 2017, 5:42:04 PM7/11/17
to pico...@googlegroups.com
The user guide is outdated. The strength with the quad core pi2/3 should be 3100+ strength. The 2570 strength is on the "ancient" raspberry pi 1 version, arm v6 with just 700 mhz clock speed.

Thanks for testing!, Shiv

On Tue, Jul 11, 2017 at 2:27 PM, Darryl Buller <darryl...@gmail.com> wrote:
Is the estimate of Stockfish 7's strength in the DGT Pi (vs. 0.75) accurate?  The user's guide says 2570 at level 20, which is the highest level.  But I played it in a 6-game match against Deep HIARCS 14 on my 4-core laptop, and the result was 5 draws and a win for Stockfish.  Deep HIARCS is supposed to be over 3100.  I know that 6 games is a small sample, but it doesn't seem like that result is very likely if it's 2570.

--
You received this message because you are subscribed to the Google Groups "PicoChess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picochess+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

darryl.buller

unread,
Jul 11, 2017, 9:51:00 PM7/11/17
to pico...@googlegroups.com
Shiv,

Thanks!



Sent from my Galaxy Tab A
You received this message because you are subscribed to a topic in the Google Groups "PicoChess" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/picochess/7r8y-yfd39k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to picochess+...@googlegroups.com.

Jürgen Precour

unread,
Jul 12, 2017, 5:41:53 AM7/12/17
to PicoChess
I would say something between. I don't think all cores are turned on (in their/our ini file) out of the box.

Jürgen


Am Mittwoch, 12. Juli 2017 03:51:00 UTC+2 schrieb Darryl Buller:
Shiv,

Thanks!



Sent from my Galaxy Tab A

-------- Original message --------
From: Shivkumar Shivaji <sshi...@gmail.com>
Date: 7/11/17 17:41 (GMT-05:00)
Subject: Re: Strength of Stockfish 7 in DGT Pi

The user guide is outdated. The strength with the quad core pi2/3 should be 3100+ strength. The 2570 strength is on the "ancient" raspberry pi 1 version, arm v6 with just 700 mhz clock speed.

Thanks for testing!, Shiv
On Tue, Jul 11, 2017 at 2:27 PM, Darryl Buller <darryl...@gmail.com> wrote:
Is the estimate of Stockfish 7's strength in the DGT Pi (vs. 0.75) accurate?  The user's guide says 2570 at level 20, which is the highest level.  But I played it in a 6-game match against Deep HIARCS 14 on my 4-core laptop, and the result was 5 draws and a win for Stockfish.  Deep HIARCS is supposed to be over 3100.  I know that 6 games is a small sample, but it doesn't seem like that result is very likely if it's 2570.

--
You received this message because you are subscribed to the Google Groups "PicoChess" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picochess+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Al

unread,
Jul 12, 2017, 7:27:05 AM7/12/17
to PicoChess
I have 3 cores turned on, I found that the DGT Pi got very hot using all 4 cores (no heatsinks)


Cheers,

Al.

Shivkumar Shivaji

unread,
Jul 22, 2017, 1:06:27 PM7/22/17
to pico...@googlegroups.com
I forgot that most people use just one core on the pi. Anyone did a test on the elo? The manual clearly needs an update from the 2570 level..

Shiv

Marc

unread,
Aug 4, 2017, 2:20:40 PM8/4/17
to PicoChess
STS Rating v13.1
Number of cores: 4

Engine: Stockfish 7
Hash: 128, Threads: 1, time/pos: 1.341s

Number of positions in STS1-STS15_LAN.EPD: 1500
Max score = 1500 x 10 = 15000
Test duration: 00h:00m:08s
Expected time to finish: 00h:34m:16s
STS rating: 3362

  STS ID   STS1   STS2   STS3   STS4   STS5   STS6   STS7   STS8   STS9  STS10  STS11  STS12  STS13  STS14  STS15    ALL
  NumPos    100    100    100    100    100    100    100    100    100    100    100    100    100    100    100   1500
 BestCnt     84     69     78     70     76     75     71     69     63     85     69     69     75     69     48   1070
   Score    869    809    866    786    829    881    787    815    707    890    789    766    833    781    738  12146
Score(%)   86.9   80.9   86.6   78.6   82.9   88.1   78.7   81.5   70.7   89.0   78.9   76.6   83.3   78.1   73.8   81.0
  Rating   3626   3359   3613   3257   3448   3680   3261   3386   2905   3720   3270   3168   3466   3234   3043   3362

:: STS ID and Titles ::
STS 01: Undermining
STS 02: Open Files and Diagonals
STS 03: Knight Outposts
STS 04: Square Vacancy
STS 05: Bishop vs Knight
STS 06: Re-Capturing
STS 07: Offer of Simplification
STS 08: Advancement of f/g/h Pawns
STS 09: Advancement of a/b/c Pawns
STS 10: Simplification
STS 11: Activity of the King
STS 12: Center Control
STS 13: Pawn Play in the Center
STS 14: Queens and Rooks to the 7th rank
STS 15: Avoid Pointless Exchange

:: Top 5 STS with high result ::
1. STS 10, 89.0%, "Simplification"
2. STS 06, 88.1%, "Re-Capturing"
3. STS 01, 86.9%, "Undermining"
4. STS 03, 86.6%, "Knight Outposts"
5. STS 13, 83.3%, "Pawn Play in the Center"

:: Top 5 STS with low result ::
1. STS 09, 70.7%, "Advancement of a/b/c Pawns"
2. STS 15, 73.8%, "Avoid Pointless Exchange"
3. STS 12, 76.6%, "Center Control"
4. STS 14, 78.1%, "Queens and Rooks to the 7th rank"
5. STS 04, 78.6%, "Square Vacancy"

DJ Dekker

unread,
Aug 4, 2017, 3:47:12 PM8/4/17
to PicoChess
Hi Darryl,

In February-March-April 2016 there was a discussion on this forum about measuring the strength of Stockfish 7 on Raspberry Pi. You can find it if you search for "Engine match on Raspberry Pi versus PC". It explains how the estimated strength of 3100 Elo was established.

Greetings,
DJ

Al

unread,
Aug 4, 2017, 4:09:51 PM8/4/17
to PicoChess
Hi Marc,

That’s a lot of work you’ve done there. Let me just make sure I understand your results correctly.

Are you saying that each of the 15 Strategic Test Suites (100 tests in each) are run for exactly 8 seconds each?

If so, I just ran all 100 positions of Strategic Test Suite 001 (undermining) at 8 secs per position using 4 cores on the DGT Pi (Rpi 3)

93 perfect 10s plus a 4, 6, 4, 0, 1, 1, 4 = 20, so a total of 950, 95%

I’ll refrain from running the other 14 tests until I’m sure I’m doing the right thing.


Cheers,

Al.

Marc

unread,
Aug 5, 2017, 5:20:27 AM8/5/17
to PicoChess
Hi Al 

I used this Python script from below forum post. Ferdinand Mosca did a great job to create this script. 

The script estimates the cpu speed in comparison to the developers cpu and defines how long each position should run. And then it siomply goes through all of the test suites.


The only work was just to get it to run on linux. because it didn´t close a PIPE properly and stopped with an exception. 

But just after fixing it myself I found this:


Marc

Marc

unread,
Aug 5, 2017, 5:33:19 AM8/5/17
to PicoChess
I started the script with:

python sts_rating_v13.1.py -f STS1-STS15_LAN.EPD -e /opt/picochess/engines/armv7l/a-stockf --proto uci -h 128 --getrating --log

Each test did run for 1.37 seconds ion the Raspberry PI3 but this is what the script estimated on its own.

The testsuites where included in the first download in the post 



Al

unread,
Aug 6, 2017, 6:40:02 AM8/6/17
to PicoChess
Hi Marc,

I clearly ran my program for way too long, I was using a modified version of the Bratko-Kopec program I have for Linux, as I could only find a Windows version of the Strategic Test Suites.

I have now downloaded and added the corrections you pointed me at, however as I’m used to Python3, I was trying to get it work on that. I changed all the Print commands to include brackets but struggled with other strings, so ended up running it under Python only.

Here’s my results from this mornings run against my compile of Stockfish8 from the 30th of July on 1 core:

STS Rating v13.1
Number of cores: 4

Engine: Stockfish8 300717
Hash: 128, Threads: 1, time/pos: 1.334s

Number of positions in STS1-STS15_LAN_v3.epd: 1500
Max score = 1500 x 10 = 15000
Test duration: 00h:00m:09s
Expected time to finish: 00h:34m:06s
STS rating: 3407

STS ID STS1 STS2 STS3 STS4 STS5 STS6 STS7 STS8 STS9 STS10 STS11 STS12 STS13 STS14 STS15 ALL
NumPos 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1500
BestCnt 84 71 75 76 75 78 74 68 70 82 71 68 68 62 54 1076
Score 886 787 859 849 826 909 798 828 803 875 808 784 798 769 717 12296
Score(%) 88.6 78.7 85.9 84.9 82.6 90.9 79.8 82.8 80.3 87.5 80.8 78.4 79.8 76.9 71.7 82.0
Rating 3702 3261 3582 3537 3435 3804 3310 3444 3332 3653 3355 3248 3310 3181 2949 3407

I have just pulled the latest Stockfish 8 code so will compile that and re-test. This time I’ll try all 4 cores.


Cheers,

Al.

Al

unread,
Aug 6, 2017, 7:49:39 AM8/6/17
to PicoChess
Hi Marc,

Here’s the results from today’s compile of the latest Stockfish8 code using all 4 cores, weirdly it’s slightly weaker than the last 1 core test, I suspect my RPi3 with heat sincs inside the DGT Pi Clock overheated and throttled back:


Engine: Stockfish8 060817
Hash: 128, Threads: 4, time/pos: 1.313s

Number of positions in STS1-STS15_LAN_v3.epd: 1500
Max score = 1500 x 10 = 15000
Test duration: 00h:00m:09s
Expected time to finish: 00h:33m:34s
STS rating: 3395

STS ID STS1 STS2 STS3 STS4 STS5 STS6 STS7 STS8 STS9 STS10 STS11 STS12 STS13 STS14 STS15 ALL
NumPos 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 1500
BestCnt 79 68 76 77 76 74 74 67 63 82 72 76 74 63 54 1075
Score 845 790 836 859 846 886 814 834 717 865 819 838 836 751 719 12255
Score(%) 84.5 79.0 83.6 85.9 84.6 88.6 81.4 83.4 71.7 86.5 81.9 83.8 83.6 75.1 71.9 81.7
Rating 3519 3274 3479 3582 3524 3702 3381 3470 2949 3608 3404 3488 3479 3101 2958 3395


Cheers,

Al.

Al

unread,
Aug 6, 2017, 4:25:11 PM8/6/17
to PicoChess
Hi Marc,

I re-tested the latest Stockfish 8 compile on 1 core, it scored 3391 close to the 3395 on 4 cores, so it seems as though the compile on 300717 was slightly stronger. It’s for this reason that I always keep my latest 3 Stockfish 8 compiled Engines on PicoChess.

As a matter of interest, I ran the STS test against Texel, which scored 3050, both the last 2 detailed results are available on request.


Cheers,

Al.

Marc

unread,
Aug 7, 2017, 6:45:13 PM8/7/17
to PicoChess
Since noone is expected to play with the strengh of stockfish it would be interesting how good this test would actually estimate strengh of i.e. Rodent characters and in comparison with some real games played.

Would it allow to estimate the relative strengh of all the different characters especially when changing nps and blur value with the script a had posted in another post.

Interesting would be if the assumed weaknesses matches with the parameters set within rodent.uci for the according characters.

This would allow to characterize the playing characters even better and to change the ~rating displayed when playing against in picochess.

Al

unread,
Aug 8, 2017, 4:12:56 AM8/8/17
to PicoChess
Hi Marc,

Yes that would be nice, I doubt if you could pass that many UCI parameters to the test before starting it though. In Rodent II, there are loads of parameter changes required for each personality, for example here are personality Mark’s parameters:

[Mark]
; Mark personality for Rodent II
; author: Pawel Koziol
; 1500 Elo defensive player who likes own mobility
;
PawnValue = 100
KnightValue = 325
BishopValue = 335
RookValue = 500
QueenValue = 1000
KeepPawn = 0
KeepKnight = 0
KeepBishop = 0
KeepRook = 0
KeepQueen = 0
BishopPair = 50
KnightPair = -10
ExchangeImbalance = 25
KnightLikesClosed = 6
RookLikesOpen = 3
Material = 100
OwnAttack = 100
OppAttack = 120
OwnMobility = 120
OppMobility = 100
KingTropism = 20
PiecePlacement = 100
PiecePressure = 100
PassedPawns = 100
PawnStructure = 100
Lines = 100
Outposts = 100
PawnShield = 120
PawnStorm = 100
Forwardness = 0
DoubledPawnMg = -12
DoubledPawnEg = -24
IsolatedPawnMg = -10
IsolatedPawnEg = -20
IsolatedOnOpenMg = -10
BackwardPawnMg = -8
BackwardPawnEg = -10
BackwardOnOpenMg = -8
PstStyle = 0
MobilityStyle = 0
NpsLimit = 450
EvalBlur = 50
Contempt = 0
SlowMover = 100
Selectivity = 175
BookFilter = 20
; GuideBookFile = books/guide/mini.bin JP! doesnt exist
MainBookFile = books/mini.bin

I’d be happier if we could pass on the ‘limitstrength’ UCI parameter used in our Engines at lower levels, for example in Arasan, 1 of the lower levels that I use is:

[Elo@1640]
uci_elo = 1640
uci_limitstrength = true

I think it’s more realistic we could pass these 2 parameters on to the test and see how close the ELO actually is, I’ll give it a try ....

Nice idea,

Al.

Uwe Badermann

unread,
Sep 5, 2017, 9:03:21 AM9/5/17
to PicoChess
Hello Al,

I know your testing a lot in RPi vs RPi matches, so I like to ask, if

* you have also ELO Rating estimates on other levels then maximum one?
* if not, if you are planning to do - or maybe not in case it makes no sense

I think it is a good idea to have at least any profund/realistic estimate on the engines strength rather than having the 2900, 3000 ... Ratings.

Thanks for your Information and expertise in advance

Cheers

Uwe

Al

unread,
Sep 5, 2017, 2:37:14 PM9/5/17
to PicoChess
Hi Uwe,

I did try to get the STS tests to evaluate lower levels but never managed it. However it will be possible to run a tournament between the levels and evaluate the results using ORDO using the grade for level 20 (the top level) as a reference anchor to get the ratings for the other levels. I won’t use every Stockfish Level but those selected via using the Black Queen, so levels 3, 6, 9, 12, 15, 18 & 20.

This will take a while but we have done something similar before.

Watch this space.

Al.

Uwe Badermann

unread,
Sep 5, 2017, 3:15:32 PM9/5/17
to PicoChess
Hello Al,

appreciate very, very much the efforts you spend by work and by electricity bill in this question!
Will catch an curious eye on your postings here.
Thank you very much!
Have a nice evening
Uwe

Al

unread,
Sep 8, 2017, 10:34:37 AM9/8/17
to PicoChess
Hi Uwe,

I ran a Stockfish 7 tournament over the last 2 days with levels 20 (top), 18, 15, 12, 09, 06 & 03.
7 Levels all play all, 8 rounds at 5 mins per game each. So 168 games taking just over 28 hours.
I double checked all my parameters, as level 15 slightly out performed level 18 which is strange.

Here’s the cross table:

Scid vs. PC
?, 2017.09.06 - 2017.09.07
Score Stf7l20 Stf7l15 Stf7l18 Stf7l12 Stf7l09 Stf7l06 Stf7l03
-------------------------------------------------------------------------------------------------
1: Stockfish lvl20 47.0 / 48 XXXXXXXX 111111=1 11=11111 11111111 11111111 11111111 11111111 (+46 -0 =2)
2: Stockfish lvl15 35.5 / 48 000000=0 XXXXXXXX 1==00101 11011111 11111111 11111111 11111111 (+34 -11 =3)
3: Stockfish lvl18 34.0 / 48 00=00000 0==11010 XXXXXXXX 111011== 11111=11 11111111 11111111 (+31 -11 =6)
4: Stockfish lvl12 27.0 / 48 00000000 00100000 000100== XXXXXXXX 11111111 11111111 11111111 (+26 -20 =2)
5: Stockfish lvl 09 14.5 / 48 00000000 00000000 00000=00 00000000 XXXXXXXX 01111011 11111111 (+14 -33 =1)
6: Stockfish lvl 06 9.5 / 48 00000000 00000000 00000000 00000000 10000100 XXXXXXXX 11=11111 (+9 -38 =1)
7: Stockfish lvl 03 0.5 / 48 00000000 00000000 00000000 00000000 00000000 00=00000 XXXXXXXX (+0 -47 =1)
-------------------------------------------------------------------------------------------------
168 games: +80 -80 =8

Here’s the Ordo grade calculations:

Firstly assuming Stockfish 7 Level 20 is graded at 3100:

# PLAYER : RATING POINTS PLAYED (%)
1 Stockfish 7 : 3100.0 47.0 48 97.9%
2 Stockfish lvl15 : 2623.4 35.5 48 74.0%
3 Stockfish lvl18 : 2577.7 34.0 48 70.8%
4 Stockfish lvl12 : 2357.6 27.0 48 56.2%
5 Stockfish lvl 09 : 1756.2 14.5 48 30.2%
6 Stockfish lvl 06 : 1535.8 9.5 48 19.8%
7 Stockfish lvl 03 : 1011.3 0.5 48 1.0%

White advantage = 0.00
Draw rate (equal opponents) = 50.00 %

Secondly assuming the STS Ratings are correct with Level 20 nearer 3350:

# PLAYER : RATING POINTS PLAYED (%)
1 Stockfish 7 : 3350.0 47.0 48 97.9%
2 Stockfish lvl15 : 2873.4 35.5 48 74.0%
3 Stockfish lvl18 : 2827.7 34.0 48 70.8%
4 Stockfish lvl12 : 2607.6 27.0 48 56.2%
5 Stockfish lvl 09 : 2006.2 14.5 48 30.2%
6 Stockfish lvl 06 : 1785.8 9.5 48 19.8%
7 Stockfish lvl 03 : 1261.3 0.5 48 1.0%

I must admit I’m more inclined to believe the first set of figures as I’m around 1600 - 1650 ELO and am on par with level 7.

What’s your thoughts, what’s your ELO rating and which level do you prefer playing


Cheers,

Al.



Uwe Badermann

unread,
Sep 8, 2017, 4:11:01 PM9/8/17
to PicoChess
Hello Al,

Many thanks for this profund work!

I can agree to your summary that the Ordo grades look more realistic.
My official rating is similar to yours and I have 75% chances against level 7, a slight edge against level 9 but need to train more for level 12 :-(

A few days ago I was aware that lichess is giving the following ratings for their SF8. The following I have noted:
level 1 @ 1350
level 2 @ 1420
level 3 @ 1500
level 4 @ 1600
level 5 @ 1700

But considering our human rating these figures also seem to be a little too high.

Kindest regards

Uwe

Marc

unread,
Nov 26, 2020, 10:10:32 AM11/26/20
to PicoChess
I changed the most recent sts_rating.py (version 14.0) available on github (https://github.com/fsmosca/STS-Rating.git) so that it can also read --ucielo parameter and pass that to the engine during initialization.

In addition to cover the new nnue feature in Stockfish and others I added an option --nnue to turn on or off a net called nn.nnue which needs to be in same folder as the engine.
I tested with the most current Stockfish compiled from https://github.com/official-stockfish/Stockfish.git & copied the nn-xxxxx.nnue from scr folder to nn.nnue to match the file name sts_ratingv14.1 is looking for.

I named the modified script v14.1mb

BTW 
sts_tating.py it is now compatible with Python3







sts_rating_v14.1mb.py
Reply all
Reply to author
Forward
0 new messages