lazy smp behaviour of stockfish

1164 views
Skip to first unread message

Daniel

unread,
Jan 4, 2020, 6:31:38 PM1/4/20
to FishCooking

Could you please comment on the lazy smp behaviour of Stockfish that I detail here


I feel like there is a huge unintended consequence i.e. "widening" (maybe due to a bug) that is clouding comparison of smp algorithms. 
Sure it could be good to widen search with more processors, but do it intentionally then. Without understanding what causes widening,
other engines could implement same algorithm but get no widening. For starters, my YBW, ABDADA and SHT implementations show zero widening effect.
I still have to test "d/d+1" of LazySMP in my engine though.

Михаил Чалый

unread,
Jan 5, 2020, 2:36:03 AM1/5/20
to FishCooking
But it's natural for LazySMP to widen search, it's basically all about it, in terms of time/depth it's the worst algorythm but it's the best in terms of playing strength. I don't see any problem.

michel.va...@uhasselt.be

unread,
Jan 5, 2020, 9:57:04 AM1/5/20
to FishCooking
What is your evidence for these claims? Widening and worse in TTD?

joost.van...@gmail.com

unread,
Jan 5, 2020, 11:45:59 AM1/5/20
to FishCooking
Note that 'go depth N' first needs to be clearly defined in the context of SMP. There are various possible meanings.

- all threads finished depth N
- one thread finished depth N
- one specific thread ('master') finished depth N

SF uses the last definition. Which means some other threads can have reached higher or lower depths, since the lazySMP implementation does not enforce any depth relationship between threads.

Similarly, one has to agree on a definition of parallel efficiency. Natural for SF would be IMO: ' In a match of a threaded code with N threads, a sequential code with a factor M time-odds reaches the same score (0.5). In that case M/N is the parallel efficiency of that particular code, for that particular parallelization approach, under those testing conditions.'

Which algorithm is most efficient is probably program dependent. The optimal choice for a minimax program is probably different than for an alpha-beta program. Also the latter comes in many different flavors.Furthermore, one might have very different requirements on the parallelization scheme. Somebody could well want a search to be deterministic, and could be willing (by necessity)  to sacrifice some 'efficiency' for that.

It is also common to see that algorithms that are winners for sequential implementation are not optimal in parallel, this is likely true in chess search as well. In this sense, for example, a wider tree might be easier to parallelize, and so an algorithm that gradually moves from a narrow tree to a wider tree as the parallelism increases, could well be a winner. Provided it gets the balance right. This balance is where the 'breadcrumbs' algorithm comes to play, it is slightly playing with the reductions in LMR if other threads are searching the same position close to the root.

Yes, I do think that lazySMP makes the search wider, each thread quite independently explores the tree, and given they have their own history tables, shared TT, etc, these searches most likely follow different variations (things like breadcrumbs will increase that effect, I assume). IMO, depth and width could be considered to parameters of search (both related to pruning of course), and one can maintain very similar Elo strength trading one for the other. This widening could be quantified by storing all visited and evaluated nodes for each thread... this is for a few billion nodes pretty feasible these days.

Finally, I like to think about the case where hash (TT) is almost unlimited and eval most expensive. In that case, lazySMP seems pretty efficient, since a thread will never compute something another thread has already computed, assuming that threads don't compute stuff which will never be needed as search proceeds.

Lyudmil Antonov

unread,
Jan 5, 2020, 5:01:52 PM1/5/20
to FishCooking
Somewhat unrelated to the discussion and because I see experienced programmers in this thread, I would like to recommend a book:
Maurice Herlihy, Nir Shavit. 2008. The art of multiprocessor programming.
I haven't seen implemented in Stockfish even the simpler algorithms in the first chapters such as the Peterson Lock, the filter lock, the Lamport's Bakery algorithm, the bounded timestamps, etc. let alone the more complicated algorithms in the later chapters that account for hash concurrency, memory and i/o delays, etc. A small problems is that all examples are written in Java but it shouldn't be difficult for an experienced programmer to translate them in C++ all the more that in the appendices are given ways for doing this (libraries and other hints).

michel.va...@uhasselt.be

unread,
Jan 6, 2020, 3:11:07 AM1/6/20
to FishCooking


On Sunday, January 5, 2020 at 5:45:59 PM UTC+1, joost.va...@gmail.com wrote:
Note that 'go depth N' first needs to be clearly defined in the context of SMP. There are various possible meanings.

- all threads finished depth N
- one thread finished depth N
- one specific thread ('master') finished depth N

I think that the first definition is the only one that makes sense. 

Anyway using the first definition there is still substantial Elo gain at fixed depth N. So yes it seems correct that SHT (Shared Hash Table) "widens" the search. 

Still I do not really understand this. I accept that the threads desynchronize and that the total tree is therefore wider. However ultimately we pick a move from a single thread. So for there to be an Elo gain in fixed depth search, the individual threads must widen. Do you see what might cause this?




SF uses the last definition. Which means some other threads can have reached higher or lower depths, since the lazySMP implementation does not enforce any depth relationship between threads.

Similarly, one has to agree on a definition of parallel efficiency. Natural for SF would be IMO: ' In a match of a threaded code with N threads, a sequential code with a factor M time-odds reaches the same score (0.5). In that case M/N is the parallel efficiency of that particular code, for that particular parallelization approach, under those testing conditions.'

Agreed.  

joost.van...@gmail.com

unread,
Jan 6, 2020, 4:10:46 AM1/6/20
to FishCooking


On Monday, 6 January 2020 09:11:07 UTC+1, michel.v...@uhasselt.be wrote:


On Sunday, January 5, 2020 at 5:45:59 PM UTC+1, joost.va...@gmail.com wrote:
Note that 'go depth N' first needs to be clearly defined in the context of SMP. There are various possible meanings.

- all threads finished depth N
- one thread finished depth N
- one specific thread ('master') finished depth N

I think that the first definition is the only one that makes sense. 

Really depends on the purpose, and context. For example, you would like to keep threads idling till you satisfy definition 1 ? What about 'go mate N' ?
 

Anyway using the first definition there is still substantial Elo gain at fixed depth N. So yes it seems correct that SHT (Shared Hash Table) "widens" the search. 

Still I do not really understand this. I accept that the threads desynchronize and that the total tree is therefore wider. However ultimately we pick a move from a single thread. So for there to be an Elo gain in fixed depth search, the individual threads must widen. Do you see what might cause this?

Guess that depends on the definition of 'widen'. I would argue that a single thread within the team has a search that is as narrow as (or actually even more narrow than)  in the sequential case. However, through the shared hash table, it benefits from the (overall) wider search, e.g. it will have available a higher quality ttMove, or just be able to do a TT cutoff as that position might already have been searched more deeply by another thread as part of its search.
 
 

joost.van...@gmail.com

unread,
Jan 6, 2020, 4:52:03 AM1/6/20
to FishCooking


- all threads finished depth N


I think that the first definition is the only one that makes sense. 

Really depends on the purpose, and context. For example, you would like to keep threads idling till you satisfy definition 1 ? What about 'go mate N' ?

People testing should note that just commenting  ' && mainThread' in the ID loop doesn't implement definition 1, btw. Mainthread will still stop the search as soon as it reached depth N,  so no 'helper thread' will have exceeded depth N, but some might not have reached it yet.

Jörg Oster

unread,
Jan 6, 2020, 5:13:16 AM1/6/20
to FishCooking
It won't matter that much anyway.

bench 32 7 14
master

===========================
Total time (ms) : 3351
Nodes searched  : 29159358
Nodes/second    : 8701688


all threads up to limit depth

===========================
Total time (ms) : 3569
Nodes searched  : 28057912
Nodes/second    : 7861561

joost.van...@gmail.com

unread,
Jan 6, 2020, 9:41:31 AM1/6/20
to FishCooking


On Monday, 6 January 2020 11:13:16 UTC+1, Jörg Oster wrote:
It won't matter that much anyway.


It is actually quite different. I implemented 5 versions (drafts:  https://github.com/vondele/Stockfish/commits/threadDepth)

v1: Master version: all threads search irrespective of Limits.depth, mainThread stops search when it reaches limits depth.
v2: First thread to reach Limits.depth stops search.
v3: all threads at most Limits.depth, mainThread stops search reaching limit (simple '&& mainThread' removal)
v4: all threads search to exactly limits depth
v5: all threads search to at least limits depth (last one to reach stops search)

and had them play a depth=13 match, with 8 threads. Also in the pool: sequential runs with depth 13 ... 16. Interesting, even v2 is significantly stronger than sequential, and v1 ~ v4:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 v5                            141      15    1006   69.2%   48.2%
   2 seq16                         117      15    1005   66.2%   49.7%
   3 v4                             37      15    1007   55.3%   52.5%
   4 v1                             35      15    1008   55.0%   52.8%
   5 seq15                          23      15    1008   53.3%   53.9%
   6 v3                             22      15    1007   53.2%   53.3%
   7 seq14                         -86      15    1005   37.8%   49.2%
   8 v2                            -92      15    1008   37.1%   47.8%
   9 seq13                        -211      18    1006   22.9%   36.1%




Jörg Oster

unread,
Jan 6, 2020, 12:41:23 PM1/6/20
to FishCooking
So it seems I was right.

v1 and v3 are still pretty close in strength.

michel.va...@uhasselt.be

unread,
Jan 6, 2020, 1:02:40 PM1/6/20
to FishCooking


On Monday, January 6, 2020 at 10:10:46 AM UTC+1, joost.va...@gmail.com wrote:


On Monday, 6 January 2020 09:11:07 UTC+1, michel.v...@uhasselt.be wrote:


On Sunday, January 5, 2020 at 5:45:59 PM UTC+1, joost.va...@gmail.com wrote:
Note that 'go depth N' first needs to be clearly defined in the context of SMP. There are various possible meanings.

- all threads finished depth N
- one thread finished depth N
- one specific thread ('master') finished depth N

I think that the first definition is the only one that makes sense. 

Really depends on the purpose, and context. For example, you would like to keep threads idling till you satisfy definition 1 ? What about 'go mate N' ?

Well I would think that for "go depth N", N would be an upper bound on the search depth. "go depth N" and "go mate N" have different semantics (in the latter case you want a pv of length at most N leading to mate; how you got it is not important). But I see you point and I do not really feel strongly about it. 
 

Anyway using the first definition there is still substantial Elo gain at fixed depth N. So yes it seems correct that SHT (Shared Hash Table) "widens" the search. 

Still I do not really understand this. I accept that the threads desynchronize and that the total tree is therefore wider. However ultimately we pick a move from a single thread. So for there to be an Elo gain in fixed depth search, the individual threads must widen. Do you see what might cause this?

Guess that depends on the definition of 'widen'. I would argue that a single thread within the team has a search that is as narrow as (or actually even more narrow than)  in the sequential case. However, through the shared hash table, it benefits from the (overall) wider search, e.g. it will have available a higher quality ttMove, or just be able to do a TT cutoff as that position might already have been searched more deeply by another thread as part of its search.

Well maybe. But again it is not obvious to me why the ttMoves would be higher quality if they come from another thread executing a similar search algorithm. Of course ttMoves from other threads will speed up the search (more cutoffs etc...) but that alone does not help with a fixed depth search. And the Elo difference is really huge in the fixed depth case.
 
 
 

joost.van...@gmail.com

unread,
Jan 6, 2020, 1:35:22 PM1/6/20
to FishCooking

Guess that depends on the definition of 'widen'. I would argue that a single thread within the team has a search that is as narrow as (or actually even more narrow than)  in the sequential case. However, through the shared hash table, it benefits from the (overall) wider search, e.g. it will have available a higher quality ttMove, or just be able to do a TT cutoff as that position might already have been searched more deeply by another thread as part of its search.

Well maybe. But again it is not obvious to me why the ttMoves would be higher quality if they come from another thread executing a similar search algorithm. Of course ttMoves from other threads will speed up the search (more cutoffs etc...) but that alone does not help with a fixed depth search. And the Elo difference is really huge in the fixed depth case

the algorithm might be the same, but the search tree traversal itself will be 'pretty different'... exploration with a depth that 1-2 plies deeper is enough to improve the quality of a move by several 10s of Elo. 1-2 plies depth difference is very easily the result of different move ordering. Even repeated searches at the same depth would increase move quality significantly... that's an interesting test to run, actually.

michel.va...@uhasselt.be

unread,
Jan 6, 2020, 2:07:42 PM1/6/20
to FishCooking
Ok maybe I understand what you want to say. 

While the actual tree searched by a thread is probably smaller, the virtual tree being searched (which includes grafted trees from other threads whose depth/eval information has been communicated via the HT) could be much bigger. This is an explanation I can live with... Thanks!



  

joost.van...@gmail.com

unread,
Jan 6, 2020, 2:18:08 PM1/6/20
to FishCooking
So, very interesting... I've implemented this 'repeated search' https://github.com/vondele/Stockfish/tree/doubleSearch, this is just the sequential code calling a repeated number of times the 'go depth N' search routine before returning a bestmove. s2, s4, s8 repeat this search 2, 4, or 8 times respectively. The result of match (at depth 13 for s2, s4, s8 and depth 13,14,15 for the seqN codes):

Rank Name                          Elo     +/-   Games   Score   Draws
   1 s8                            102      14    1153   64.3%   51.3%
   2 seq15                          89      14    1168   62.5%   50.2%
   3 s4                             20      14    1161   52.9%   52.0%
   4 seq14                         -21      14    1170   47.1%   50.5%
   5 s2                            -44      14    1167   43.7%   51.5%
   6 seq13                        -148      15    1171   29.9%   44.6%

That is, large Elo gain by just sequentially repeating the 'go depth 13' search. In fact, s8 is very similar in performance to v4 @ 8 threads reported previously (s8: 250Elo vs seq13 and v4@8 248Elo vs seq13). Tempting ....

michel.va...@uhasselt.be

unread,
Jan 6, 2020, 2:59:14 PM1/6/20
to FishCooking
:) Iterative deepening without the deepening....

Daniel

unread,
Jan 6, 2020, 3:30:43 PM1/6/20
to FishCooking
Doesn't a second or more search return the value from the hashtable without searching further? It is the same depth so there should be a TT cutoff.

joost.van...@gmail.com

unread,
Jan 6, 2020, 3:33:56 PM1/6/20
to FishCooking
So, with a bit more games for v4@8 vs s8 :

Score of s8 vs v4: 1820 - 1993 - 4995  [0.490] 8808
Elo difference: -6.8 +/- 4.8, LOS: 0.3 %, DrawRatio: 56.7 %

So, really similar performance, with a slight edge for the multithreaded version.

joost.van...@gmail.com

unread,
Jan 6, 2020, 3:36:07 PM1/6/20
to FishCooking


On Monday, 6 January 2020 21:30:43 UTC+1, Daniel wrote:
Doesn't a second or more search return the value from the hashtable without searching further? It is the same depth so there should be a TT cutoff.

definitely not at PV nodes, and only if the ttValue is above beta.

    // At non-PV nodes we check for an early TT cutoff
    if (  !PvNode
        && ttHit
        && tte->depth() >= depth
        && ttValue != VALUE_NONE // Possible in case of TT access race
        && (ttValue >= beta ? (tte->bound() & BOUND_LOWER)
                            : (tte->bound() & BOUND_UPPER)))
    {

Daniel

unread,
Jan 6, 2020, 4:21:25 PM1/6/20
to FishCooking
Non-PV windows are zero-windows so either ttValue >= beta or ttValue <= alpha, so that condition is not restrictive.
I tried commenting out !PvNode condition but it returns horrible results. And the pvs are short even on the first search.

joost.van...@gmail.com

unread,
Jan 6, 2020, 4:38:36 PM1/6/20
to FishCooking
It is however the case that the number of nodes needed to do the repeated searches decreases, and eventually becomes very small... the 'go depth 13' reaches a kind of fixed point (for startpos) after ~55 iterations:
     1    info depth 13 seldepth 23 multipv 1 score cp 65 nodes 221935 nps 1668684 tbhits 0 time 133 pv d2d4 d7d5 c2c4 e7e6 e2e3 g
     6    info depth 13 seldepth 24 multipv 1 score cp 73 nodes 26013 nps 83912 tbhits 0 time 310 pv d2d4 d7d5 e2e3 e7e6 c2c4 g8f6
    11    info depth 13 seldepth 24 multipv 1 score cp 67 nodes 47010 nps 110611 tbhits 0 time 425 pv g1f3 e7e6 e2e3 d7d5 c2c4 c7c
    16    info depth 13 seldepth 24 multipv 1 score cp 64 nodes 25319 nps 45131 tbhits 0 time 561 pv d2d4 d7d5 g1f3 e7e6 e2e3 c7c5
    21    info depth 13 seldepth 20 multipv 1 score cp 47 nodes 20879 nps 34116 tbhits 0 time 612 pv d2d4 e7e6 c2c4 g8f6 b1c3 d7d5
    26    info depth 13 seldepth 24 multipv 1 score cp 54 nodes 26205 nps 38707 tbhits 0 time 677 pv d2d4 e7e6 c2c4 g8f6 g1f3 d7d5
    31    info depth 13 seldepth 22 multipv 1 score cp 59 nodes 2992 nps 4132 tbhits 0 time 724 pv d2d4 e7e6 c2c4 g8f6 g1f3 d7d5 b
    36    info depth 13 seldepth 25 multipv 1 score cp 68 nodes 9720 nps 13188 tbhits 0 time 737 pv d2d4 e7e6 c2c4 g8f6 g1f3 d7d5
    41    info depth 13 seldepth 23 multipv 1 score cp 46 nodes 9151 nps 11510 tbhits 0 time 795 pv d2d4 e7e6 e2e3 g8f6 c2c4 d7d5
    46    info depth 13 seldepth 23 multipv 1 score cp 46 nodes 1546 nps 1934 tbhits 0 time 799 pv d2d4 e7e6 e2e3 g8f6 c2c4 d7d5 f
    51    info depth 13 seldepth 23 multipv 1 score cp 48 nodes 34788 nps 39803 tbhits 0 time 874 pv d2d4 e7e6 c2c4 d7d5 e2e3 c7c5
    56    info depth 13 seldepth 19 multipv 1 score cp 45 nodes 2417 nps 2721 tbhits 0 time 888 pv d2d4 e7e6 c2c4 d7d5 e2e3 c7c5 g
    61    info depth 13 seldepth 17 multipv 1 score cp 45 nodes 1189 nps 1332 tbhits 0 time 892 pv d2d4 e7e6 c2c4 d7d5 e2e3 c7c5 g
    66    info depth 13 seldepth 20 multipv 1 score cp 45 nodes 1341 nps 1498 tbhits 0 time 895 pv d2d4 e7e6 c2c4 d7d5 e2e3 c7c5 g
    71    info depth 13 seldepth 20 multipv 1 score cp 45 nodes 1285 nps 1430 tbhits 0 time 898 pv d2d4 e7e6 c2c4 d7d5 e2e3 c7c5 g
    76    info depth 13 seldepth 20 multipv 1 score cp 45 nodes 1289 nps 1430 tbhits 0 time 901 pv d2d4 e7e6 c2c4 d7d5 e2e3 c7c5 g
    81    info depth 13 seldepth 20 multipv 1 score cp 45 nodes 1281 nps 1417 tbhits 0 time 904 pv d2d4 e7e6 c2c4 d7d5 e2e3 c7c5 g

The needed number depends on depth:
depth 14:
   331    info depth 14 seldepth 25 multipv 1 score cp 98 nodes 24119 nps 4009 hashfull 29 tbhits 0 time 6016 pv d2d4 d7d5 c2c4 e7
   336    info depth 14 seldepth 23 multipv 1 score cp 84 nodes 29779 nps 4831 hashfull 27 tbhits 0 time 6163 pv d2d4 d7d5 c2c4 c7
   341    info depth 14 seldepth 22 multipv 1 score cp 73 nodes 2196 nps 348 hashfull 18 tbhits 0 time 6298 pv d2d4 d7d5 c2c4 g8f6
   346    info depth 14 seldepth 18 multipv 1 score cp 73 nodes 1870 nps 296 hashfull 15 tbhits 0 time 6304 pv d2d4 d7d5 c2c4 g8f6
depth 16:
   106    info depth 15 seldepth 31 multipv 1 score cp 72 nodes 54383 nps 8061 hashfull 66 tbhits 0 time 6746 pv e2e4 e7e5 g1f3 d7
   111    info depth 15 seldepth 27 multipv 1 score cp 73 nodes 32333 nps 4650 hashfull 47 tbhits 0 time 6952 pv e2e4 e7e5 g1f3 d7
   116    info depth 15 seldepth 31 multipv 1 score cp 67 nodes 3220 nps 457 hashfull 3 tbhits 0 time 7043 pv e2e4 e7e5 g1f3 d7d6
   121    info depth 15 seldepth 31 multipv 1 score cp 64 nodes 4911 nps 695 hashfull 7 tbhits 0 time 7058 pv e2e4 e7e5 g1f3 d7d6
   126    info depth 15 seldepth 25 multipv 1 score cp 64 nodes 3207 nps 453 hashfull 16 tbhits 0 time 7069 pv e2e4 e7e5 g1f3 d7d6
depth 20:
 > 512

Daniel

unread,
Jan 6, 2020, 4:58:31 PM1/6/20
to FishCooking
Interesting! So in each iteration stockfish is getting a different move order due to history/refutation etc and trying different LMR reductions for the same move.
It will then retain the score where the move is searched with the largest depth, through the hashtable, and thus effectively getting
rid of LMR from iteration to iteration .... Does this make sense ?

Daniel

unread,
Jan 6, 2020, 5:01:41 PM1/6/20
to FishCooking
Also, single-threaded stockfish could potentially be improved by searching each depth twice or more ...

joost.van...@gmail.com

unread,
Jan 6, 2020, 5:15:55 PM1/6/20
to FishCooking


On Monday, 6 January 2020 22:58:31 UTC+1, Daniel wrote:
Interesting! So in each iteration stockfish is getting a different move order due to history/refutation etc and trying different LMR reductions for the same move.
It will then retain the score where the move is searched with the largest depth, through the hashtable, and thus effectively getting
rid of LMR from iteration to iteration .... Does this make sense ?


First part, definitely, second part, possibly. I indeed wonder if this iteration procedure indeed converges to the pure (i.e. no pruning) alpha-beta search result.

joost.van...@gmail.com

unread,
Jan 6, 2020, 5:16:33 PM1/6/20
to FishCooking


On Monday, 6 January 2020 23:01:41 UTC+1, Daniel wrote:
Also, single-threaded stockfish could potentially be improved by searching each depth twice or more ...


very unlikely, feel free to try a patch :-)

joost.van...@gmail.com

unread,
Jan 6, 2020, 5:51:37 PM1/6/20
to FishCooking
So, final data points for today.. at depth 13, somewhere between 64 and 128 iterations Elo gain seems to stop (but mind the large error bars).

Rank Name                          Elo     +/-   Games   Score   Draws
   1 seq17                         130      25     394   67.9%   46.4%
   2 s128                          104      26     358   64.5%   47.5%
   3 s64                            98      24     379   63.7%   53.6%
   4 seq16                          53      24     394   57.6%   51.3%
   5 s32                            32      24     386   54.7%   53.4%
   6 s16                            13      24     389   51.8%   50.6%
   7 s8                            -35      24     389   45.0%   50.4%
   8 seq15                         -37      24     395   44.7%   49.9%
   9 seq14                        -124      26     396   32.8%   44.9%
  10 seq13                        -249      31     394   19.3%   31.5%


joost.van...@gmail.com

unread,
Jan 8, 2020, 1:20:44 AM1/8/20
to FishCooking
Some more data:

Rank Name                          Elo     +/-   Games   Score   Draws
   1 v4-t32                        130      11    2282   67.9%   45.5%
   2 s32                           109      10    2280   65.1%   49.4%
   3 seq16                         102      10    2282   64.3%   49.1%
   4 v4-t16                         92      10    2282   62.9%   49.0%
   5 s16                            56      10    2281   58.0%   52.5%
   6 v4-t8                          38      10    2282   55.4%   51.2%
   7 seq15                          23      10    2282   53.2%   53.5%
   8 s8                             15      10    2282   52.2%   55.4%
   9 v4-t4                         -36      10    2281   44.8%   51.3%
  10 s4                            -43      10    2283   43.9%   50.4%
  11 seq14                         -76      10    2282   39.2%   48.1%
  12 v4-t2                        -110      11    2282   34.7%   45.3%
  13 s2                           -114      11    2281   34.2%   45.3%
  14 seq13                        -200      12    2282   24.0%   36.2%

So, this similarity between iterated (sN) vs threaded (v4-tN) holds to quite a number of threads (all tests at fixed depth 13).

mal

unread,
Jan 8, 2020, 9:50:02 AM1/8/20
to FishCooking
I tried skipping depths not long ago and the results made me try re-searching depths instead of skipping them. I think I did every 4th one again, so something like depth iterations; 1, 2, 3, 4, 4, 5, 6, 7, 7, 8, ...

I can't see it in my public repo so I guess it looked bad in local tests and I didn't try it on fishtest. I'll have a look for it ...

Jörg Oster

unread,
Jan 8, 2020, 10:00:47 AM1/8/20
to FishCooking
This will possibly work for fixed depth tests,
but it seems like a no-brainer for usual time-control tests/games ...

Daniel

unread,
Jan 8, 2020, 11:21:18 AM1/8/20
to FishCooking
Sequential search repeated 8 times gives more than 2 plies advantage and 2x repetition also is worth at least 200 elo points ...
You don't have to do every depth twice, for instance, you could do only the last one twice.
Stockfish probably stops search when it thinks it doesn't have enough time to finish search the next depth, you could try to
do the same depth twice in that case. I don't know stockfish code well enough to write this patch but it is definitely worth testing ..

Jörg Oster

unread,
Jan 8, 2020, 5:41:13 PM1/8/20
to FishCooking
A first quick test with a version that searches every iteration twice,
was much to my surprise still en par with master after 200 games, at tc 20+0.2.

Remarkable was the very high draw rate of over 80 %, though.
Maybe there was something wrong in my setup, idk.

However, my first thought is there must be something basically wrong in Stockfish
if something like repating the same iteration is useful ...

joost.van...@gmail.com

unread,
Jan 9, 2020, 12:36:41 AM1/9/20
to FishCooking
seems like it can be turned into something useful under time pressure: http://tests.stockfishchess.org/tests/view/5e162ca061fe5f83a67dd96d

Jörg Oster

unread,
Jan 9, 2020, 5:55:21 AM1/9/20
to FishCooking
Yes, nice job.

ts.tom...@gmail.com

unread,
Jan 9, 2020, 9:14:06 AM1/9/20
to FishCooking


W dniu środa, 8 stycznia 2020 23:41:13 UTC+1 użytkownik Jörg Oster napisał:
A first quick test with a version that searches every iteration twice,
was much to my surprise still en par with master after 200 games, at tc 20+0.2.

Jörg Oster

unread,
Jan 9, 2020, 10:36:04 AM1/9/20
to FishCooking
This seems like a nice way to simulate SMP search, however.
Especially on small devices with only a low number of threads available,
one might get results comparable to a search with a higher number of threads.
All you have to do is spend some more time, though. :-)

Here is my 1st implementation where you can set the number of repetitions for each iteration
as UCI Option "Virtual Threads".

joost.van...@gmail.com

unread,
Jan 9, 2020, 1:22:00 PM1/9/20
to FishCooking


On Thursday, 9 January 2020 16:36:04 UTC+1, Jörg Oster wrote:
This seems like a nice way to simulate SMP search, however.
Especially on small devices with only a low number of threads available,
one might get results comparable to a search with a higher number of threads.
All you have to do is spend some more time, though. :-)

yes I agree.. could be a very nice model.

Mikael

unread,
Jan 10, 2020, 2:59:37 AM1/10/20
to FishCooking
Hello!

Interesting research going on here!

The discussions in this thread and on talkchess seem to point out LMR as the reason for the gains in the repeated search, but I wonder if you, or anyone else, have done a test similar to the one below but with LMR disabled?

apospa...@gmail.com

unread,
Jan 10, 2020, 6:09:20 AM1/10/20
to FishCooking
Might these interesting results of repeating same depth be related to artificially reducing main thread depth on fail highs?

Günther Demetz

unread,
Jan 10, 2020, 7:47:57 AM1/10/20
to FishCooking

Am Freitag, 10. Januar 2020 12:09:20 UTC+1 schrieb apospa...@gmail.com:
Might these interesting results of repeating same depth be related to artificially reducing main thread depth on fail highs?

I don't think so.

Mikael

unread,
Jan 10, 2020, 11:19:28 AM1/10/20
to FishCooking
I could not stay away from this. :) So I ran some quick tests. First I modified Thread::search() this way:

  int researches = 0;
  const int N = 2;

  // Iterative deepening loop until requested to stop or the target depth is reached
  while (   ++rootDepth < MAX_PLY
         && !Threads.stop
         && !(Limits.depth && mainThread && rootDepth > Limits.depth))
  {
      ... code removed ...

      if (researches < N && rootDepth == Limits.depth)
      {
          researches++;
          rootDepth--;
      }
  }

  which means the last iteration is re-searched N times.

In the table below, base is a non-repeating version, rpt repeats last iteration twice.

Latest SF:                 base vs rpt: 105 - 322 - 409  [0.370], 836 games, ELO: -92.30 +/- 16.81
Step 16 disabled (LMR):    base vs rpt:  48 - 120 - 143  [0.384], 311 games, ELO: -81.92 +/- 28.49
MovePicker.score disabled: base vs rpt: 121 - 382 - 431  [0.360], 934 games, ELO: -99.74 +/- 16.39
TTEntry.save disabled:     base vs rpt: 231 - 309 - 423  [0.460], 963 games, ELO: -28.20 +/- 16.43

Latest SF is a version which has the research code added. Everything else is standard.
Disabled means I removed the code from a section in the source.
All tests were run to depth 13. With LMR disabled the test takes much longer which is why there are fewer games for LMR.

I excepted the re-search to be meaningless (zero elo difference) once the boosting effect was removed.
It seems that information in the TT has a greater effect than LMR.
Reply all
Reply to author
Forward
0 new messages