Leela's rating at zero (or very few) nodes

2,010 views
Skip to first unread message

Duplicate

unread,
Oct 17, 2018, 3:56:58 PM10/17/18
to LCZero
I am curious how strong Leela and other engines are if calculating very few nodes per move. I know that one can play Leela at very few nodes at http://play.lczero.org/.
Does anyone know of any reliable estimates about its playing strength at different levels? I am also aware of aloril's list comparing Leela and Stockfish at fixed nodes. But I'd like to know how this might translate to absolute playing strength - or rather, to rating in one of the known rating lists, or compared to FIDE Elo.
Any help would be greatly appreciated.

Jon Mike

unread,
Oct 17, 2018, 4:10:02 PM10/17/18
to LCZero
The smaller networks, such as 1039,4049,4052,9149,9152 play strong at smaller numbers of nodes or quicker time controls.  It seems multiplying the depth of lc0 times 5-7 gives the equivalent AB depth.  

Jupiter

unread,
Oct 17, 2018, 4:36:54 PM10/17/18
to LCZero
Lc0 at average 2000 nodes/move compared to CEGT 40/4 A/B engines. Lc0 2000 
On fast GPU 2000 nodes is just a few nodes.

Duplicate

unread,
Oct 17, 2018, 5:21:36 PM10/17/18
to LCZero
Thanks!
Do you know of something comparable for even fewer nodes per move, and possibly even for other engines?

By the way, a silly question: Do you know in which GUIs it is possible to have engines play with a fixed number of nodes per move, and how? (In Aquarium, I can only select a fixed search depth.)
Message has been deleted

Jupiter

unread,
Oct 18, 2018, 2:31:47 AM10/18/18
to LCZero
Do you know of something comparable for even fewer nodes per move, and possibly even for other engines?

If you have detailed info on number of nodes and what other engines are, I might be able to generate games and create rating list from it. Can you tell what is your purpose of doing this?

Cutechess GUI 1.1 is capable of fixed nodes/move games.
Maybe 1.0.0 can.

Press Tournament->New->Time Control

Graham Jones

unread,
Oct 18, 2018, 4:25:52 AM10/18/18
to LCZero

Duplicate

unread,
Oct 18, 2018, 10:47:33 AM10/18/18
to LCZero
Thanks a lot!

Roughly put*, I am interested in the strength of engines' evaluation function. I have heard various claims about this (I remember, for instance, Don Dailey talking about the strength of the evaluation function of Rybka, and then of Komodo), but I haven't seen any tests, or calculations.
For instance, 1000 nodes for a move is at (or above) the upper end of what a human could calculate. So if we get a decent estimate of, say, Stockfish's strength at 1000 nodes per move, this would at least give us a sense of how it compares.
With Leela, it seems possible to me that it's actually better than top humans at equal nodes. I am curious about this, too.
Likewise, I am curious how strong Leela is without doing any search. AlphaGo's neural net, without any search function, has a playing strength of a professional Go player. I found it extremely surprising that this is possible (not a Go player, though). I am just curious how good one can be at chess without calculating - well, at least how good Leela is.

*Full disclosure: I wrote a paper in which I mention in passing that the evaluation function of standard alpha-beta engines is much weaker than the evaluation of strong human players, and a referee asked me to back this claim up.

Duplicate

unread,
Oct 18, 2018, 10:53:44 AM10/18/18
to LCZero
Thanks for the hint! I've seen this spreadsheet before. Do you see any way to translate the tests with few nodes into absolute playing strength (by which I mean strength in comparison to other engines or humans at full strength)?

Duplicate

unread,
Oct 18, 2018, 10:54:00 AM10/18/18
to LCZero
Thank you!

On Thursday, October 18, 2018 at 1:47:12 AM UTC-4, Juve wrote:

Jupiter

unread,
Oct 18, 2018, 12:34:11 PM10/18/18
to LCZero
Lets do it for Stockfish 9 first.

Here are the situations, try to figure this out.

Hiarcs 8 rating performance of 2580

From CEGT 40/4 rating list:
a. Hiarcs 8, 2449
b. Hiarc 14 1CPU, 2835

1. Match between Hiarcs 14 1CPU vs Stockfish 9.0 x64 1CPU

Hiarcs 14 1CPU (CEGT 40/4 condition on my computer, around TC 3m+1s) vs Stockfish 9.0 x64 1CPU (TC 4s+0.05s, average of around 1000 nodes/move)

Games: 12, start opening file : noomen_3move.pgn, each opening is played twice, side reversed.
Result:
   # PLAYER                    : RATING  ERROR   POINTS  PLAYED    (%)
   1 Stockfish 9.0 x64 1CPU    :   90.2  281.2      7.5      12   62.5%
   2 Hiarcs 14 1CPU            :    0.0   ----      4.5      12   37.5%
   
 Stockfish 9.0 x64 1CPU at an average of 1000 nodes/move is 90 + 2835 or 2925 CEGT 40/4 rating points.
 
 Hiarcs 14 is 2835 - 2449 or 386 CEGT rating points ahead over Hiarcs 8.
 Hiarcs 8 is around 2580 human
 
 So what is the human strength of Stockfish 9.0 x64 1CPU at an average of around 1000 nodes/move?

Duplicate

unread,
Oct 18, 2018, 1:23:58 PM10/18/18
to LCZero
Cool, thanks a lot for doing this!
That's a really impressive performance by Stockfish. How many nodes per second did Hiarcs calculate in your test? And do you have an estimate how your hardware compares to what was used in 2002?

Average time per move in a tournament game is 180 seconds. I think assuming that humans calculate 2 moves per second is probably generous. So my next question would be: How does Stockfish perform at 400 or so nodes per move?

Duplicate

unread,
Oct 18, 2018, 1:55:09 PM10/18/18
to LCZero
I installed cutechess and started a game against a recent Stockfish version, giving it 1000 nodes per move. I won easily, although I was playing very quickly. How is this possible? (My FIDE rating is 2380.) The nodes shown during the game seem right.
Here is the game, I was white:

1. Nf3 e6 +0.03/6 0.001s

2. c4 0.91s Nc6 -0.12/5 0.001s

3. d4 1.0s d5 -0.58/5 0.001s

4. a3 1.2s Nf6 +0.64/5 0.001s

5. Nc3 2.6s Be7 +0.36/6 0.001s

6. Bg5 1.3s O-O +0.54/6 0.001s

7. e3 2.7s Ne4 +0.25/6 0.001s

8. Bxe7 1.2s Qxe7 0.00/6 0.001s

9. Qc2 3.5s Nxc3 -0.37/6 0.001s

10. Qxc3 0.77s g5 -0.07/6 0.001s

11. cxd5 2.2s exd5 +0.17/7 0.001s

12. Bb5 1.3s Bd7 +0.20/6 0.001s

13. Bxc6 2.2s Bxc6 -0.82/6 0.001s

14. h4 7.1s g4 -0.99/7 0.001s

15. Ne5 0.78s Qe6 -1.05/6 0.001s

16. Nxc6 7.1s bxc6 -1.08/6 0.001s

17. g3 19s Rab8 -0.86/5 0.001s

18. O-O 5.6s f5 -0.38/5 0.001s

19. Rac1 2.3s Rf6 -0.26/5 0.001s

20. b4 6.2s Rh6 -0.89/5 0.001s

21. Qd3 6.1s Rf8 -1.04/5 0.001s

22. Rc5 1.6s Rhf6 -1.26/5 0.001s

23. Rfc1 2.2s Kg7 -1.47/5 0.001s

24. Qa6 5.1s Qd7 -2.06/5 0.001s

25. Rxc6 2.0s R8f7 -2.15/6 0.001s

26. Rxf6 5.2s Rxf6 -3.79/7 0.001s

27. Qxa7 1.2s Rc6 -2.43/8 0.001s

28. Rxc6 3.1s Qxc6 -2.70/7 0.001s

29. Qc5 1.2s Qd7 -3.00/6 0.001s

30. b5 4.1s h6 -4.27/5 0.001s

31. a4 7.9s Kg6 -5.30/6 0.001s

32. a5 0.89s Kg7 -5.26/7 0.001s

33. a6 2.4s Qd8 -5.69/6 0.001s

34. Qc6 4.4s Qb8 -6.83/6 0.001s

35. Qb7 3.3s Qd8 -14.85/5 0.001s

36. a7 0.95s f4 -13.68/5 0.001s

37. a8=Q 6.5s Qxa8 -14.62/5 0s

38. Qxa8 1.1s fxe3 -14.99/5 0.001s

39. fxe3 1.3s c6 -15.51/5 0s

40. Qxc6 1.5s h5 -16.50/5 0s

41. b6 0.90s Kf8 -65.85/6 0s

42. b7 0.67s Kf7 -77.16/5 0.001s

43. b8=Q 2.4s Kg7 -M4/15 0.001s

44. Qbb7+ 0.94s Kh8 -M2/23 0.001s

45. Qcc8# 0.81s, White mates


On Thursday, October 18, 2018 at 12:34:11 PM UTC-4, Jupiter wrote:

Jupiter

unread,
Oct 18, 2018, 2:05:54 PM10/18/18
to LCZero
And do you have an estimate how your hardware compares to what was used in 2002? 
 
My hardware is i7-2600 3.4 ghz.

The Hiarcs 8 that played is run on
COMP Hiacs 8.0, Athlon 1Ghz,256RAM

Hiarcs 14 nps on my computer is around 520,000 from start position.

Sample two games, to get an idea of the movetime. If you want the whole 12 games, I will upload it.

[Event "?"]
[Site "?"]
[Date "2018.10.19"]
[Round "1"]
[White "Hiarcs 14 1CPU"]
[Black "Stockfish 9.0 x64 1CPU"]
[Result "1-0"]
[BlackTimeControl "4+0.05"]
[ECO "C42"]
[GameDuration "00:03:38"]
[GameEndTime "2018-10-19T00:15:06.532 China Standard Time"]
[GameStartTime "2018-10-19T00:11:28.011 China Standard Time"]
[Opening "Petrov"]
[PlyCount "139"]
[Variation "Nimzovich attack"]
[WhiteTimeControl "180+1"]

1. e4 {book} e5 {book} 2. Nf3 {book} Nf6 {book} 3. Nxe5 {book} d6 {book}
4. Nf3 {+0.38/17 6.7s} Nxe4 {-0.26/14 0.27s} 5. Nc3 {+0.32/17 9.9s}
Nxc3 {-0.17/14 0.067s} 6. dxc3 {+0.34/17 8.8s} Be7 {-0.04/15 0.050s}
7. Bd3 {+0.26/16 13s} Nd7 {0.00/15 0.31s} 8. Be3 {+0.28/16 4.7s}
O-O {0.00/16 0.072s} 9. O-O {+0.28/17 14s} Nc5 {0.00/17 0.11s}
10. Bxc5 {+0.34/16 6.3s} dxc5 {0.00/13 0.056s} 11. Qe2 {+0.34/15 4.7s}
Rb8 {0.00/15 0.27s} 12. Qe4 {+0.23/16 14s} g6 {+0.28/11 0.037s}
13. Qe3 {+0.19/15 4.2s} Kg7 {+0.37/13 0.20s} 14. Rfe1 {+0.14/14 4.6s}
Bd6 {+0.68/14 0.18s} 15. Ng5 {+0.10/14 5.0s} Bd7 {+0.58/14 0.15s}
16. f4 {+0.06/14 4.0s} b6 {+0.37/14 0.36s} 17. Ne4 {+0.07/15 4.9s}
Re8 {+0.25/12 0.062s} 18. Qf2 {0.00/15 2.9s} Bc6 {+0.35/13 0.30s}
19. Rad1 {-0.04/15 4.0s} Qd7 {+0.08/12 0.065s} 20. Nxd6 {-0.07/14 3.9s}
Rxe1+ {+0.27/12 0.058s} 21. Rxe1 {-0.13/15 2.2s} Qxd6 {+0.43/13 0.10s}
22. h4 {-0.18/16 5.9s} Re8 {+0.37/14 0.17s} 23. Rxe8 {-0.20/15 2.5s}
Bxe8 {+0.30/16 0.10s} 24. Qe3 {-0.21/15 3.0s} Bc6 {+0.37/16 0.26s}
25. a3 {-0.21/15 2.2s} Qd5 {+0.45/13 0.080s} 26. Qe2 {-0.24/18 3.0s}
h5 {+0.53/13 0.077s} 27. Kh2 {-0.23/17 6.5s} Kf6 {+0.42/15 0.18s}
28. Kg3 {-0.22/18 2.8s} a5 {+0.41/16 0.19s} 29. Kh2 {-0.19/18 2.4s}
Qd6 {+0.41/15 0.084s} 30. Kg3 {-0.19/17 2.5s} Bd5 {+0.41/14 0.030s}
31. Qe3 {-0.17/15 1.8s} Qe6 {+0.33/15 0.12s} 32. Qxe6+ {-0.14/21 1.9s}
Bxe6 {+0.23/19 0.22s} 33. Be4 {-0.03/19 2.6s} Ke7 {+0.43/15 0.10s}
34. f5 {+0.20/22 2.3s} Bxf5 {0.00/16 0.054s} 35. Bxf5 {+1.71/18 1.6s}
gxf5 {0.00/18 0.070s} 36. Kf4 {+1.85/19 2.1s} Kf6 {0.00/19 0.11s}
37. c4 {+2.82/20 2.4s} a4 {-0.96/20 0.15s} 38. b3 {+0.61/21 2.4s}
axb3 {-1.05/20 0.067s} 39. cxb3 {+0.50/22 0.61s} c6 {-1.58/21 0.21s}
40. a4 {+0.95/22 2.0s} Ke6 {-1.72/20 0.075s} 41. Kg5 {+1.95/22 1.9s}
Ke5 {-2.46/21 0.31s} 42. Kxh5 {+2.16/24 3.2s} Kf4 {-2.61/20 0.10s}
43. Kh6 {+2.25/24 2.0s} Kg4 {-2.78/20 0.094s} 44. Kg7 {+2.60/23 1.7s}
f4 {-2.90/19 0.081s} 45. h5 {+2.60/20 2.2s} Kxh5 {-2.93/18 0.023s}
46. Kxf7 {+7.32/22 1.8s} Kg4 {-2.77/21 0.085s} 47. Kg6 {+8.17/24 1.7s}
Kh4 {-5.47/21 0.17s} 48. Kf5 {+9.55/25 2.6s} Kg3 {-6.02/17 0.030s}
49. Kg5 {+10.06/24 2.7s} Kf2 {-10.25/19 0.13s} 50. Kxf4 {+11.79/19 1.3s}
Ke1 {-10.35/16 0.039s} 51. Ke5 {+22.76/20 3.0s} Kd2 {-10.35/16 0.060s}
52. Kd6 {+23.16/23 1.00s} Kc3 {-11.01/13 0.035s} 53. Kxc6 {+23.29/18 0.51s}
Kxb3 {-52.65/19 0.091s} 54. Kb5 {+23.78/22 1.0s} Kc3 {-52.69/18 0.015s}
55. g4 {+23.87/21 0.52s} Kd4 {-53.81/19 0.075s} 56. g5 {+24.73/22 2.0s}
Ke5 {-53.81/20 0.047s} 57. Kxb6 {+25.21/21 1.0s} Kf5 {-53.90/19 0.045s}
58. a5 {+M27/22 2.0s} Kxg5 {-M42/18 0.082s} 59. Kxc5 {+M23/22 1.0s}
Kf6 {-M26/18 0.054s} 60. Kd6 {+M21/22 1.8s} Kg6 {-M26/17 0.015s}
61. a6 {+M19/22 1.3s} Kf5 {-M18/18 0.034s} 62. c5 {+M17/22 1.9s}
Ke4 {-M16/20 0.029s} 63. a7 {+M15/22 0.95s} Kd3 {-M14/21 0.022s}
64. c6 {+M13/22 1.7s} Kc3 {-M12/22 0.022s} 65. c7 {+M11/27 1.4s}
Kc2 {-M10/21 0.028s} 66. a8=Q {+M9/41 1.2s} Kd2 {-M8/23 0.026s}
67. c8=Q {+M7/62 0.72s} Ke3 {-M6/40 0.024s} 68. Qc3+ {+M5/62 0.008s}
Kf4 {-M4/85 0.027s} 69. Qg2 {+M3/62 0.005s} Kf5 {-M2/1 0s}
70. Qcf3# {+M1/62 0.003s, White mates} 1-0

[Event "?"]
[Site "?"]
[Date "2018.10.19"]
[Round "1"]
[White "Stockfish 9.0 x64 1CPU"]
[Black "Hiarcs 14 1CPU"]
[Result "1-0"]
[BlackTimeControl "180+1"]
[ECO "C42"]
[GameDuration "00:04:04"]
[GameEndTime "2018-10-19T00:15:32.433 China Standard Time"]
[GameStartTime "2018-10-19T00:11:28.074 China Standard Time"]
[Opening "Petrov"]
[PlyCount "143"]
[Variation "Nimzovich attack"]
[WhiteTimeControl "4+0.05"]

1. e4 {book} e5 {book} 2. Nf3 {book} Nf6 {book} 3. Nxe5 {book} d6 {book}
4. Nf3 {+0.66/13 0.23s} Nxe4 {-0.48/16 8.8s} 5. Nc3 {+0.77/12 0.034s}
Nxc3 {-0.37/17 7.7s} 6. dxc3 {+0.37/15 0.46s} Be7 {-0.35/18 6.7s}
7. Bd3 {+0.33/15 0.054s} O-O {-0.33/17 5.4s} 8. O-O {+0.42/13 0.075s}
Nd7 {-0.34/16 3.6s} 9. Be3 {+0.37/14 0.14s} Nf6 {-0.32/16 5.6s}
10. Re1 {+0.46/13 0.088s} Re8 {-0.26/15 5.4s} 11. c4 {+0.55/14 0.10s}
Bg4 {-0.16/16 4.9s} 12. h3 {+0.89/11 0.028s} Bh5 {-0.17/18 5.2s}
13. Bd4 {+0.49/15 0.61s} Qd7 {-0.29/16 11s} 14. b3 {+0.45/13 0.17s}
c5 {0.00/14 3.2s} 15. Bb2 {+0.65/12 0.032s} Rad8 {-0.01/14 2.9s}
16. a4 {+0.80/13 0.11s} Kh8 {-0.05/14 6.9s} 17. Re3 {+0.75/13 0.12s}
d5 {-0.67/15 5.9s} 18. cxd5 {+0.98/12 0.097s} Qxd5 {-0.68/16 3.1s}
19. g4 {+1.35/15 0.35s} Bg6 {-1.28/17 9.4s} 20. Bxf6 {+1.52/13 0.030s}
Bxf6 {-1.41/18 3.0s} 21. Rxe8+ {+1.37/17 0.085s} Rxe8 {-1.46/18 0.94s}
22. Bxg6 {+1.41/17 0.059s} Qxd1+ {-1.48/19 4.3s} 23. Rxd1 {+1.40/18 0.070s}
fxg6 {-1.47/20 3.8s} 24. Rd7 {+1.49/13 0.051s} c4 {-1.56/19 4.2s}
25. bxc4 {+1.67/14 0.12s} h6 {-1.46/21 8.1s} 26. Rxb7 {+1.84/12 0.027s}
Rc8 {-1.43/19 0.88s} 27. Rxa7 {+1.66/17 0.18s} Rxc4 {-1.43/19 1.7s}
28. Ne1 {+1.75/17 0.092s} h5 {-1.64/18 2.8s} 29. gxh5 {+2.05/13 0.047s}
gxh5 {-1.64/17 0.43s} 30. a5 {+2.00/15 0.066s} Bc3 {-1.49/18 3.0s}
31. Nd3 {+2.29/15 0.090s} Bd2 {-1.78/19 6.3s} 32. a6 {+2.56/13 0.048s}
Rxc2 {-2.03/20 2.4s} 33. Rd7 {+2.81/15 0.22s} Kh7 {-2.05/20 3.9s}
34. a7 {+2.91/13 0.066s} Ra2 {-1.85/20 1.9s} 35. Kg2 {+2.95/15 0.096s}
Bc3 {-1.89/20 2.9s} 36. Kf3 {+2.95/12 0.065s} Ra3 {-1.85/19 2.5s}
37. Ke4 {+2.58/16 0.23s} Ra5 {-2.93/20 18s} 38. f4 {+3.34/12 0.11s}
Ra1 {-2.94/19 4.4s} 39. Rc7 {+3.34/14 0.21s} Bf6 {-2.54/19 3.3s}
40. Rb7 {+2.92/16 0.48s} Bc3 {-2.72/17 2.2s} 41. f5 {+3.61/14 0.092s}
Ra4+ {-3.85/17 4.9s} 42. Kd5 {+4.14/12 0.017s} Bf6 {-5.26/19 13s}
43. Nc5 {+4.90/13 0.071s} Ra5 {-5.67/20 2.7s} 44. Kc4 {+5.07/13 0.026s}
h4 {-5.67/18 1.9s} 45. Ne4 {+5.47/12 0.021s} Bb2 {-6.58/17 1.5s}
46. Ng5+ {+6.06/14 0.023s} Kh6 {-6.72/18 0.40s} 47. Nf7+ {+6.45/14 0.039s}
Kh7 {-7.14/19 1.5s} 48. Rb8 {+6.48/15 0.034s} g6 {-7.50/18 1.3s}
49. a8=Q {+6.48/14 0.037s} Rxa8 {-7.58/17 0.39s} 50. Rxa8 {+6.94/15 0.11s}
gxf5 {-10.70/23 5.2s} 51. Kd5 {+7.12/14 0.033s} f4 {-13.82/24 10s}
52. Ke4 {+7.41/14 0.065s} Bc1 {-14.11/21 2.9s} 53. Ne5 {+7.43/15 0.040s}
Kg7 {-14.25/21 3.4s} 54. Nf3 {+7.48/14 0.072s} Kf7 {-14.42/20 2.1s}
55. Ra6 {+8.73/14 0.14s} Kg7 {-14.60/18 1.1s} 56. Nxh4 {+8.86/13 0.018s}
Kf7 {-14.61/18 0.94s} 57. Nf3 {+9.65/13 0.022s} Kf8 {-14.61/18 0.94s}
58. h4 {+16.10/15 0.080s} Bb2 {-17.59/18 1.1s} 59. h5 {+18.43/13 0.029s}
Ke7 {-19.10/17 1.00s} 60. h6 {+50.65/12 0.022s} Bf6 {-22.92/16 0.99s}
61. Rxf6 {+52.98/16 0.066s} Kxf6 {-M28/18 0.82s} 62. Ne5 {+M35/16 0.031s}
f3 {-986.22/19 1.2s} 63. Kxf3 {+M25/18 0.038s} Kxe5 {-10.16/62 0s}
64. h7 {+M23/20 0.042s} Kf6 {-10.46/7 0s} 65. h8=Q+ {+M15/22 0.032s}
Ke6 {-M14/18 0.50s} 66. Qd8 {+M13/22 0.042s} Kf7 {-M12/19 0.31s}
67. Ke4 {+M11/25 0.033s} Ke6 {-M10/23 0.27s} 68. Kf4 {+M9/28 0.034s}
Kf7 {-M8/57 0.27s} 69. Kf5 {+M7/37 0.034s} Kg7 {-M6/62 0.004s}
70. Qe8 {+M5/96 0.033s} Kh7 {-M4/62 0.003s} 71. Kf6 {+M3/127 0.011s}
Kh6 {-M2/62 0.001s} 72. Qg6# {+M1/127 0.004s, White mates} 1-0
Message has been deleted

Jupiter

unread,
Oct 18, 2018, 2:10:40 PM10/18/18
to LCZero
Fixed nodes/move is different from using TC.

I played Hiarcs 14 TC 3m+1s vs Stockfish at fix 1000 nodes/move and the result is 8-0 for Hiarcs.

Stockfish is fast, 1000 nodes is just around 1ms on my PC.

Jupiter

unread,
Oct 18, 2018, 2:13:20 PM10/18/18
to LCZero
A/B engines like Stockfish has different way of counting nodes compared to Lc0.

Duplicate

unread,
Oct 18, 2018, 2:15:19 PM10/18/18
to LCZero
How do you get such high search depths for Stockfish with 1000 nodes per move? (Compare those from my games, above.)

I played another game with Stockfish on 1000 nodes per move. It's really really weak. (Below are the first 9 moves of the game.) There must be something wrong.

Duplicate

unread,
Oct 18, 2018, 2:18:52 PM10/18/18
to LCZero
Can you confirm that the depths I am seeing in my games are roughly what you saw?

Jupiter

unread,
Oct 18, 2018, 2:51:04 PM10/18/18
to LCZero
I use a TC such that the average nodes/move is 1000 from move 1 to move 500.
Sf is using TC 4s+0.05 in my test.

Duplicate

unread,
Oct 18, 2018, 3:29:24 PM10/18/18
to LCZero
Wouldn't this mean that Stockfish calculates less than 20000 nodes per second on your computer?

Jupiter

unread,
Oct 18, 2018, 11:32:26 PM10/18/18
to LCZero
Wouldn't this mean that Stockfish calculates less than 20000 nodes per second on your computer?
 
No.

Something like this.

Sf9 search info
info depth 5 seldepth 5 multipv 1 score cp 58 nodes 1036 nps 148000 tbhits 0 time 7 pv d2d4 d7d5 e2e3 e7e6 d1f3

If you let Sf9 search on more time, nps will be high, so I just take nps from that.

target_nodes = 1000
given_nps = 148000

movetime = 1000/148000 = 0.007s

Assume a game will finish in 500 moves, you can use (60 and up), seeing 0.007s, I just use the bigger 500.
total_time = 0.007 x 500 = 3.5s, use 4s

Add a little increment
TC = 4s +  0.05s

That is how I calculated the Blitz TC of Sf9.

I am not trying to be more accurate here, just more on presenting a method of how this can be estimated.


You can try this on your computer, this is more accurate.
Find an engine that is close in strength to Hiarcs 8 in CEGT rating list.

CEGT 40/4 ratingl list.



Take one that is close to Hiarcs 8 and play it against Sf9. You can open the CEGT link and find other engine.
Then set the choosen engine to CEGT 40/4 rating conditions, download the program https://drive.google.com/file/d/1zxnB3GqSqS_sZJ9PwGsFm4Rd65YJrJZc/view?usp=sharing
run it on your computer and see tc_report.txt.

Sample.
40/4   :: 40 moves in X :: X = 1 minute
That X = 1 minute will be your time in 40 moves

engine TC is 40/1minute or 40/60s


Next find TC of Sf9 at 1000 nodes on your machine

Get nps of sf9 on your machine
movetime_s = 1000/nps 

TC = 40/movetime_s

Sf9 TC is 40/movetime_s

Use Cutechess-cli to run the match.

That is old version but should work to handle 2 TC in a match.

Once you have the pgn file output, take ordo rating program and run it  using the pgn output,

You will be able to see the difference between sf9 and the engine close in strength to Hiarcs 8 that you choose.

Greg Mattson

unread,
Oct 19, 2018, 1:34:35 AM10/19/18
to LCZero
jupiter,

I think this is the perfect case for the 'nodes' parameter of cutechess. ie:

cutechess-cli -tournament gauntlet -rounds 1 -games 20 -pgn <pgn> -recover -engine conf=stockfish tc=inf nodes=2000 option.Hash=1024 -engine conf=hiarcs tc=inf nodes=1000000 -each proto=uci timemargin=1000

not sure why the nodes parameter doesn't get more respect here. all the issues with hardware are resolved and there is no ambiguity.

greg

Jupiter

unread,
Oct 19, 2018, 8:23:54 AM10/19/18
to LCZero
To measure the strength of engines, node/move is not the right test condition as most competitions are conducted with time control. You should use a TC.
I am not sure if the Hiarcs 8 that played in that tournament is using nodes/move or it was using a TC.

A lot of effort is spent on balancing between the time to calculate the evaluation function and how accurate it is. You can be more accurate with evaluation by calculating it in a more detailed way. Example is evaluating a passed pawn. More detailed - Is this pawn not blocked, is it defended by another piece, is there a friendly rook behind it, is the queening square occupied by the opponent piece, if there is a blocker can I attack the blocker, if the blocker is a knight and I attacked it with a queen does the knight defended by a opponen'ts piece lower that queen? in which case I cannot give a bigger bonus to the passer. Is there enemy rooks and queen behind the passer, can I attack these enemy rooks and queen? There are so many things that you can talk about passer bonus and penalties. This is only for passer. But since the competition is timed, you cannot afford to spend too much time in the eval because you also need a deeper search depth. If the contest is fixed nodes/move the engine with more accurate evaluation would most likely win the contest assuming other things are equal because it can spent more time in the evaluation without being flag for time forfeit.

brian.p.r...@gmail.com

unread,
Oct 19, 2018, 9:47:33 AM10/19/18
to LCZero
Engines count nodes in many different ways, so there is ambiguity.  Nodes are most consistent between different versions of the same engine.  Less so between different engines.  This is with searching using only one cpu.  Parallel search adds complexity (many "extra" and duplicate nodes are processed).  Moving from a/b engines to NN engines like Lc0 is a leap in complexity for metrics, which may have no equivalent (like depth).  Time is an option but that only works for engines running on the same or at least normalized h/w, which is problematic with NN engines that use both cpus and gpus.  

Jupiter

unread,
Oct 19, 2018, 10:07:21 AM10/19/18
to LCZero
Let me correct this part.

Next find TC of Sf9 at 1000 nodes on your machine
Get nps of sf9 on your machine
movetime_s = 1000/nps 
TC = 40/movetime_s
Sf9 TC is 40/movetime_s

Correction:

movetime_s = 1000/nps
A period has 40 moves, so total time in this period is
total_time_s = movetime_s x 40

So TC for sf9 is
TC = 40/total_time_s

Duplicate

unread,
Oct 19, 2018, 10:29:33 AM10/19/18
to LCZero
Why 500 moves? And why do you use the increment? Stockfish can calculate a lot of nodes in 50 ms.
I'm really not in expert in any of this, so I might be missing something obvious. But in the games I played, Stockfish usually reached a depth of 5 or 6 - and the same seems to be true in the search info you give. So how is it possible that Stockfish calculates 15 moves in middlegames in the match you played? And how can it beat a strong engine in your test, if it plays almost like a beginner in the games I played, in which I was monitoring that it calculated 1000 nodes per second? Something has to be really off here; our results don't add up at all.

Greg Mattson

unread,
Oct 19, 2018, 1:57:08 PM10/19/18
to LCZero
Jupiter,

I get your point but look at the OP's question. He is interested in assessing SF's evaluation function given a small number of nodes.

this is a thing tailor made for the nodes parameter. point sf at a small number of nodes, point the other engine at a benchmarkable metric with realistic time controls to get relative ELO and voila, you've got your measurement.

greg

Duplicate

unread,
Oct 19, 2018, 2:28:51 PM10/19/18
to LCZero
Sorry  to bother you with this, but I wasn't sure how to configure one of the engines in a match in cutechess. I see the "configure engine" option in the advanced options, but I don't see how I can set maximum nodes per move.

Jonathan Rosenthal

unread,
Oct 19, 2018, 4:47:28 PM10/19/18
to LCZero
SF search is really not designed to be used at such low nodecounts. 1'000 nodes will takes less than 1ms on a single core on my laptop. It's important to understand that the number of unique positions searched in that time will be much smaller than 1'000 as many positions show up multiple times. Move ordering will also be completely messed up as at these low node counts the dynamic move ordering features such as the history heuristic and countermove heuristics likely wont have reasonable values. Many of the pruning and extension techniques require a certain depth, which simply will not be reached in most positions with so few nodes.

Unfortunately, I am not sure it is possible to find a fair way to compare evaluation function strength. I think it is also flawed reasoning to assume you are comparing human evaluation functions with engine evaluation functions if you simply limit engine node counts and let them play. You can evaluate a position correctly, but still play very badly and that is the case for the engine as well.

In general what do you feel an evaluation function should optimally return? Practical winning probabilities depend on who the players are. Theoretical winning probabilities are nice, but we only have those for endgame tablebase positions. 

Finally, it's important to understand that the goal of an evaluation function in an engine is not to be as good as possible at generating winning probabilities in a vacuum. The eval function is used to give the engine hints about which variations to look at more precisely and there is a tradeoff between speed and precision. Many dynamic features of a position are not understood statically, but only through calculation of variations. This is true for humans as well, but even more so for classical engines. I can't stress enough that search and evaluation is heavily intertwined. If you take two top engines and swap their evaluation functions, I would expect them to both play worse.

On Wednesday, October 17, 2018 at 9:56:58 PM UTC+2, Duplicate wrote:
I am curious how strong Leela and other engines are if calculating very few nodes per move. I know that one can play Leela at very few nodes at http://play.lczero.org/.
Does anyone know of any reliable estimates about its playing strength at different levels? I am also aware of aloril's list comparing Leela and Stockfish at fixed nodes. But I'd like to know how this might translate to absolute playing strength - or rather, to rating in one of the known rating lists, or compared to FIDE Elo.
Any help would be greatly appreciated.

Patrick Hill

unread,
Oct 19, 2018, 5:00:54 PM10/19/18
to LCZero
Can LC0 play without MCTS at all? Just evaluate a position, and make a move. If so, how strongly? That’s what I would be interested to find out.

MindMeNot

unread,
Oct 19, 2018, 5:17:43 PM10/19/18
to LCZero
Simply play with nodesPerMove=1. Such as at http://play.lczero.org/ at easy difficulty.

Jon Mike

unread,
Oct 19, 2018, 5:41:17 PM10/19/18
to LCZero
Cscuile shared tournaments at various fixed nodes.  Here you can see the strength of SF at various ratios and low node counts.  The strength of Lc0 is MUCH stronger than SF at these low numbers of nodes.  All this can be seen firsthand here.

On Friday, October 19, 2018 at 4:17:43 PM UTC-5, MindMeNot wrote:
What is Leelas rating at few nodes...
Reply all
Reply to author
Forward
0 new messages