Groups

Tests of multiple 32xxx networks against SF 10 at 15min+3s

8,056 views

Skip to first unread message

Ingo Weidner

unread,

Dec 25, 2018, 11:57:42 PM12/25/18

to LCZero

Hi, i was and still am testing many new networks in a gauntlet against Stockfish 10 at a time control of 15min + 3s which IMO is not too short and not too long..

Several other networks were added and removed during the test as they perfomed worse than others which recently also included e.g. ID 32246.

Here are the current standings of testing 7 new networks against Stockfish 10 at 15min +3s time control:

Engine Score St % Elo

1: Stockfish_10_x64_bmi2 29.0/49 ·········· 59.2 3466

2: Lc0 v0.20.0rc2_ID32253 2.5/4 =1== 62.5 +92 = 3558

3: Lc0 v0.20.0rc2_ID32246 2.0/4 ==== 50.0 ±0 = 3466

4: Lc0 v0.20.0rc2_ID32223 3.0/7 ====0== 42.8 -49 = 3417

5: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3396

6: Lc0 v0.20.0rc2_ID32194 2.5/7 0=====0 35.7 -100 = 3366

6: Lc0 v0.20.0rc2_ID32236 2.5/7 0===0== 35.7 -100 = 3366

7: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3358

49 of 70 games played

Level: Blitz 15/3

Hardware: RAM: 16 GB (1GB hash), CPU: mobile i7-7700HQ 4x2.8GHz (3 cores used for engines), GPU: mobile Nvidia Gforce GTX 1050 TI 4GB (768 CUDA Cores), Leela ratio = 0.64

Operating system: Windows 10 Home Home Edition (Build 9200) 64 bit

Soon i will also remove networks from the current test to keep only the best ones. I

D 32194 will be further tested as a reference.

Ingo Weidner

unread,

Dec 26, 2018, 12:00:54 AM12/26/18

to LCZero

Correction: I meant that 32247 performed worse than 32246 and was removed...

Ingo Weidner

unread,

Dec 26, 2018, 1:52:36 AM12/26/18

to LCZero

Here is an update with 2 more games for both IDs 32253 and 32246 where both scored 1 win now after 6 games:

Engine Score St % Elo

1: Stockfish_10_x64_bmi2 29.0/49 ········· · 58.5 3466

2: Lc0 v0.20.0rc2_ID32253 3.5/6 =1==== 58.3 +56 = 3522

3: Lc0 v0.20.0rc2_ID32246 3.0/6 ====01 50.0 ±0 = 3466

4: Lc0 v0.20.0rc2_ID32223 3.0/7 ====0== 42.8 -49 = 3417

5: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3396

6: Lc0 v0.20.0rc2_ID32194 2.5/7 0=====0 35.7 -100 = 3366

6: Lc0 v0.20.0rc2_ID32236 2.5/7 0===0== 35.7 -100 = 3366

7: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3358

53 of 70 games played

Ingo Weidner

unread,

Dec 26, 2018, 1:56:16 AM12/26/18

to LCZero

Correction: Current score of Stockfish 10 is 31.0/53 = 58.5%. I missed to update the score for SF 10.

Ingo Weidner

unread,

Dec 26, 2018, 5:27:04 AM12/26/18

to LCZero

Looks like after ID 32253 the Self-play Elo was going down so far, currently with ID 32267 being around the level of ID 32246. At some point it was expected that it could not always go up.

At the moment after 8 rounds/games ID 32253 has a score of 50.0% (with 1 win and 1 lost game), both IDs 32246 and 32223 scored 43.8% and IDs 32194 and 32236 now have 37.5%:

Engine Score St % Elo

1: Stockfish_10_x64_bmi2 35.5/60 ·········· 59.2 3466

2: Lc0 v0.20.0rc2_ID32253 4.0/8 =1=====0 50.0 ±0 = 3466

3: Lc0 v0.20.0rc2_ID32246 3.5/6 ====010= 43.8 -42 = 3422

3: Lc0 v0.20.0rc2_ID32223 3.5/8 ====0=== 43.8 -42 = 3422

5: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3396

6: Lc0 v0.20.0rc2_ID32194 3.0/8 0=====0= 37.5 -85 = 3381

6: Lc0 v0.20.0rc2_ID32236 3.0/8 0===0=== 37.5 -85 = 3381

8: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3358

60 of 70 games played

Ingo Weidner

unread,

Dec 26, 2018, 6:24:34 AM12/26/18

to LCZero

While after 7 games ID 32253 performed better than SF 10 it lost the last 2 games against SF 10. After 10 games it might haooen that all networks except 32195 will score within a 0.5 points.range so they would be very close then.

Ingo Weidner

unread,

Dec 26, 2018, 11:04:32 AM12/26/18

to LCZero

Results after 10 rounds will be posted very soon. Waiting for 2 games now.

I was asked about that by e-mail so i also post it here:

I am not using a specific opening book and no specific endgame book.

I am using default Settings of Arena there.

Colors are reversed with each round.

Ingo Weidner

unread,

Dec 26, 2018, 11:53:13 AM12/26/18

to LCZero

After 10 games are finished for networks 32194 to 32253 as proposed above the best 6 at the end have a score within a range of just 0.5 points or 5.0%.

Best were IDs 32246 and 32223 wit ha score of 45.0%. ID 32194 finally scored a win and ended wit ha score of 40.0% after 10 games (like 3 other networks).

Here is the result for 7 32xxx networks after 10 games (colors revers after each round/game):

Engine Score St % Elo

1: Stockfish_10_x64_bmi2 41.5/70 ·········· 59.3 3464 (from CCRL 40/40 list)

2: Lc0 v0.20.0rc2_ID32246 4.5/10 ====010=01 45.0 -35 = 3429

2: Lc0 v0.20.0rc2_ID32223 4.5/10 ====0===01 45.0 -35 = 3429

4: Lc0 v0.20.0rc2_ID32194 4.0/10 0=====0=01 40.0 -70 = 3394

4: Lc0 v0.20.0rc2_ID32253 4.0/10 =1=====000 40.0 -70 = 3394

4: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3394

4: Lc0 v0.20.0rc2_ID32236 4.0/10 0===0==== 40.0 -70 = 3394

7: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3356

70 of 70 games played

Level: Blitz 15/3

Hardware: RAM: 16 GB (1GB hash), CPU: mobile i7-7700HQ 4x2.8GHz (3 cores used for engines), GPU: mobile Nvidia Gforce GTX 1050 TI 4GB (768 CUDA Cores), Leela ratio = 0.64

Operating system: Windows 10 Home Home Edition (Build 9200) 64 bit

NOTE:

Soon I will continue further testing networks 32246 and 32223 and will also keep ID 32194 (mostly as i is used at TCEC now).

I will also add the new network ID 32274 to this test now.

MathAndreas

unread,

Dec 26, 2018, 12:06:41 PM12/26/18

to LCZero

Thank you for your effort!
Please can you post a PGN of the games? Thank you!

Ingo Weidner

unread,

Dec 26, 2018, 12:44:24 PM12/26/18

to LCZero

Hello MathAndreas,

many networks not found in the final table posted above were added and removed during this test and the PGN file is a total mess due to that.

It would take long time to clean that up and i do not really think it is worth the effort. For me what counts at the end i the result shown in the list above.

I plan to do further tests with the best networks and currently am checking ID 32274. This already lost in the first game and it might not be worth it to check this one further.

Ingo Weidner

unread,

Dec 26, 2018, 1:25:52 PM12/26/18

to LCZero

I just started a new gauntlet with IDs 32246 and 32223 against the latest Stockfish 11 dev 18122411 version using 30min+5s time control.

The previous tests posted above were against the official release version of Stockfish 10 with 15mins + 3s time control.

In both tests the colors are reversed after each round.

Message has been deleted

Message has been deleted

Ingo Weidner

unread,

Dec 27, 2018, 2:55:05 AM12/27/18

to LCZero

After ID 32246 had a score of 4.0/8 = 50.0 against the latest Stockfish 11 dev at 30min+5s time control i will no test some of the new networks that had a big jump in Self-play Elo.

I will post the results of the gauntet with multiple networks here.

As a reference (could be also seen at the table posted earlier):

At those conditions posted in this thread ID 32246 had reached a score of 4.5/10 = 45.0% against SF 10 with 15min+3s TC.

Message has been deleted

Ingo Weidner

unread,

Dec 27, 2018, 4:35:48 AM12/27/18

to LCZero

Those new networks are currently tested against SF10 at 15min+3s TC: IDs 32295, 32294, 32291, 32286, 32285, 32280, ID32273

FWIW i had alraedy tested ID 32274 which was not really good but 32273 might be promising.

I will sort the results into the table with the older networks (32194 up to 32253) also to have a better comparison with them.

Ingo Weidner

unread,

Dec 27, 2018, 5:58:23 AM12/27/18

to LCZero

IDs 32291 and 32290 perfomed really bad and were removed from further tests.

The best IDs from previous tests like e.g, 32246 were already performing really good and for new networks it will get harder to be better than this.

Ingo Weidner

unread,

Dec 27, 2018, 6:09:40 AM12/27/18

to LCZero

I just added ID 32301 to the tested networks. Usually if networks more or less stay at a constant Elo for a while like in this case it seems to be a good sign.

Lukas S

unread,

Dec 27, 2018, 6:54:32 AM12/27/18

to LCZero

Awesome! I agree, 32301 looks really promising

Ingo Weidner

unread,

Dec 27, 2018, 10:10:33 AM12/27/18

to LCZero

Tests ffor new networks are running and i plan to post a current table after 4 games are finished for all 7 candidates.

I just had a look at the Elo table of MTGOStark that are also used at the "official" estimated Elo graph:

https://docs.google.com/spreadsheets/d/1XSJiCcQpCLv0fNwrUn7jXjdkZFU63YFEWpdXv6dSSg0/edit#gid=868347223

For 3xxxx nets the highest Elo value there so far is 3422for Lc0 ID 32250.

This is very close to the Elo value of 3429 that i found for ID 32246 with my tests against SF 10 at 15min+3s TC. With more games played maybe my result would be even identical to that of MTGOStark.

Ingo Weidner

unread,

Dec 27, 2018, 2:35:19 PM12/27/18

to LCZero

The current test of new networks takes a bit longer as some bad performing networks were removed and new ones added instead.

At the moment those IDs are tested: 32307, 32306, 32301, 32295, 32285, 32280, 32273

So far from those only 32280 had a win against SF 10 while 32307 and 32306 were just added to the test.

Message has been deleted

Message has been deleted

Ingo Weidner

unread,

Dec 28, 2018, 4:19:32 AM12/28/18

to LCZero

After several new networks had been added and removed for testing purposes it is time for the current standings now.

The remaining new networks have played 4 games/rounds now and colors were reversed after each round.

So far new networks between IDs 32273 and 32312 had been tested and both IDs 32316 and 32315 will be added now.

Networks that were tested but do not appear in the table were performing too bad to be worth being further tested (this includes IDs 32311 and 32312 with a score of just 25.0% after 4 games).

Currently after 4 games the best networks are IDs 32307 with 3.0/4 = 75.0% and 32280 with 2.5/4 = 62.5%.

Here is a combined table with the new and the previously tested networks that already finished 10 games (table sorted by percent value):

Engine Score St % Elo

1: Lc0 v0.20.0rc2_ID32307 3.0/4 =1=1 75.0 +191 = 3657

2: Lc0 v0.20.0rc2_ID32280 2.5/4 1=== 62.5 +92 = 3556

3: Stockfish_10_x64_bmi2 50.5/90 ·········· 56.1 3464 (from CCRL 40/40 list)

4: Lc0 v0.20.0rc2_ID32295 2.0/4 ==== 50.0 ±0 = 3464

4: Lc0 v0.20.0rc2_ID32273 2.0/4 ==== 50.0 ±0 = 3464

6: Lc0 v0.20.0rc2_ID32246 4.5/10 ====010=01 45.0 -35 = 3429

6: Lc0 v0.20.0rc2_ID32223 4.5/10 ====0===01 45.0 -35 = 3429

8: Lc0 v0.20.0rc2_ID32194 4.0/10 0=====0=01 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32253 4.0/10 =1=====000 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32236 4.0/10 0===0===== 40.0 -70 = 3394

12: Lc0 v0.20.0rc2_ID32301 1.5/4 ===0 37.5 -85 = 3381

13: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3356

90 of 120 games played

Morant

unread,

Dec 28, 2018, 4:38:39 AM12/28/18

to LCZero

Hi ! Do you have some treshold ratio to drop a network? and a minimum games before this? because as i see - ID32253 in first 4 games looked very promissing. But if you take only last three - it looks as completely loser. So the samle size matters. What are your internal rules for this?

thanks for you work

пятница, 28 декабря 2018 г., 12:19:32 UTC+3 пользователь Ingo Weidner написал:

Ingo Weidner

unread,

Dec 28, 2018, 4:51:09 AM12/28/18

to LCZero

Currently teh networks had raeched such a high levekl that every lost game within the first 4 games couldbe a reason to exclude them from further testing.

Usually i wait between 3 and 4 games for a decison. If 2 games were lost after 3 games that is simply too bad.

I could not keep all tested networks until the full 10 games as this would simply take forever.

After removing a network usually i also add one or two new ones so the overall amount of tested IDs mostly stays constant. Currently i added IDs 32315 and 32316.

FWIW the ID 32246 that was the winner of previous tests scored 4.0/8 = 50% against the latest SF11dev at a TC of 30min+5s so my test procedure could not be that bad...

Ingo Weidner

unread,

Dec 28, 2018, 5:10:19 AM12/28/18

to LCZero

So far i found that a higher Self-play Elo does not have to result in a higher real Elo.

For example currently both 32280 and 32273 still perform really good even if there are many newer networks after them with a higher self-play Elo..

It could still happen that after 10 games the ID 32246 stays the best one and if this is the case i do not really have a problem with that....

Morant

unread,

Dec 28, 2018, 5:14:10 AM12/28/18

to LCZero

as self ELO look at 32000 games, you can just not face such situation/positions to prove that high ELO in your 10 games. So i believe self does matter, but with sample size less than 100-200 games - its not obvious.

пятница, 28 декабря 2018 г., 13:10:19 UTC+3 пользователь Ingo Weidner написал:

Ingo Weidner

unread,

Dec 28, 2018, 5:39:07 AM12/28/18

to LCZero

If you have a look at the real Elo values in thw estimated Elo graphs yiu see that the real Elo did not further imptove yet after ID 32250:

https://docs.google.com/spreadsheets/d/1Mi2qwPCK4aVNt9B8aC-HecIyLJ0BEAPvp63L59DTrY4/edit#gid=952456918

The detailed resuts for those real Elo graphs could be found here:

https://docs.google.com/spreadsheets/d/1XSJiCcQpCLv0fNwrUn7jXjdkZFU63YFEWpdXv6dSSg0/edit#gid=868347223

https://docs.google.com/spreadsheets/d/19UL5a9I3M_TjIYwTpJQKdCi2RJHpxDD9YHnRogrwTnc/edit#gid=0

My own Elo value of 3429 for ID 32246 is very close to the Elo value of 3422 for ID 32250 in that graph.

Message has been deleted

Ingo Weidner

unread,

Dec 28, 2018, 5:55:37 AM12/28/18

to LCZero

A small update concerning the current tests.

Besides adding new networks 32315 and 32316 the ID 32307 now lost a game and has a score of 3.0/5 = 60.0% after 5 games.

With that currently it falls behind ID 32280 that has 2.5/4 = 62.5%.

At the end it could happen that all new networks have a score simlar to or even worse than ID 32246 but i still hope that the new networks will be better.

If i would not have that hope the current tests would be a bit pointless and/or a waste of time...

Vassilis

unread,

Dec 28, 2018, 6:17:40 AM12/28/18

to LCZero

Hi Ingo!

I believe your tests are valuable, and I'like to thank you for all your effort.
Let me explain my reasoning for this.

1) Your hardware is good enough to give quality games.
2) The time control is fine. Not too fast, not to slow. Ideal to give quick, reliable results.
3) You test against Stockfish 10. A reliable and strong opponent. Probably the strongest one.
4) 10 games are few? Sure! What about 50 or 100 or 1000. The more the better. But we can't spent a whole life testing... Even if we see it from a statistical point of view, winning 10 games against Stockfish, in a 10 game match, is anything but luck, provided the opening book is balanced, and of course is better than winning 2 (Of course we aren't at this point...yet). The uncertainty is somewhere in the middle, with nets giving similar results.
5) The way you discard the last two nets and import two newer ones doesn't encompass the danger of letting a strong net out, because of statistical uncertainty.

I'm not an expert in statistical analysis, but the way you think makes sense to me.
Overall a very efficient filtering procedure, even if the net which will finally emerge, won't be THE best, but one of the 2 or 3 best :)

Keep going, all the good testing...
Vas

Message has been deleted

Ingo Weidner

unread,

Dec 28, 2018, 6:25:48 AM12/28/18

to LCZero

Another note about my "criteria" for removing networks from the current test:

So far i never removed any network that was able to win a game in the first 3 games even if it later lost some games..

Those that were removed had both draws and lost games

Ingo Weidner

unread,

Dec 28, 2018, 6:46:26 AM12/28/18

to LCZero

Speaking about networks that win games the new network 32316 now has a score of 1.5/2 = 75.0% after 2 games so this is "safe" to continue the full 10 games distance...

Morant

unread,

Dec 28, 2018, 6:59:03 AM12/28/18

to LCZero

looks reasonable. But may be not first 3 games but 4? as it could be 2 games for the black and decrease chances to survive

пятница, 28 декабря 2018 г., 14:46:26 UTC+3 пользователь Ingo Weidner написал:

Ingo Weidner

unread,

Dec 28, 2018, 7:25:06 AM12/28/18

to LCZero

With the current test so far the 3 best ones 32316, 32280 and 32307 had a win in the first 2 games so it was not really possible to "miss" those by removing them too early...

As alraedy mentioned i will not remove those IDs that had a win.

Of course like alraedy done with ID 32246 for the following test against SF11dev at 30min+5s TC only the best candidate will be used.

In the near future i also plan to do a small round-robin tournament that includes the best Lc0 ID and multiple other engines like e.g. Stockfish 10 (or latest SF11dev), Fire 7.1, Komodo 9.02, ,Ethereal 11.11, Xiphos 0.4.14, Andscacs 0.95..

Vassilis

unread,

Dec 28, 2018, 7:45:16 AM12/28/18

to LCZero

[...In the near future i also plan to do a small round-robin tournament that includes the best Lc0 ID and multiple other engines like e.g. Stockfish 10 (or latest SF11dev), Fire 7.1, Komodo 9.02, ,Ethereal 11.11, Xiphos 0.4.14, Andscacs 0.95 ]

This is a very good idea. To see how well this best net will perform under different styles of play.

However, in this test I would also include the old-tested 11248. This would give you an estimate for the difference in strength, between the new and the old one.

With a total of 8 opponents (2 nets and six AB) and two games for each engine (with white and black) against every other engine in the tournament, there will be 2x7 = 14 rounds.

Vas

Ingo Weidner

unread,

Dec 28, 2018, 8:29:49 AM12/28/18

to LCZero

Another hint about how unreliable the self-play Elo is concerning real Elo:

While ID 32316 performs great with 2.0/3 = 66.7% in 3 games the ID 32319 which has a 8 points higher self-play Elo failed with just 0.5/2 = 25% in 2 games and the first game lost.

Based on the self-play Elo one would expect that they at least have a comparable performance.

FWIW so far the 7 best tested 32xxx nets had a score of at least 50% in the first 4 games (which ID 32316 had alraedy reached after 3 games now...).

Vassilis

unread,

Dec 28, 2018, 8:48:59 AM12/28/18

to LCZero

Self-elo is not to be taken too literally!
It is only for estimating the learning progress of the net.
There are statistical variations from one net to the next ones which do not allow us to draw safe conclusions. But one can assume, I guess, with minimal margin of error, that between two nets with 100 or more self-elo difference, the higher rated one plays better.
Only real tests with other engines will tell us how much better it plays.

Message has been deleted

Ingo Weidner

unread,

Dec 28, 2018, 11:48:42 AM12/28/18

to LCZero

So far IDs 32273 to ID 32319 have been checked in the current test.

After 5 rounds with ID 32316 added the scores and the estimated Elo seems to go a bit "back to normal" now.

After 5 games ID 32307 has a score of 3.0/5 = 60.0% followed by 3 IDs with a score of 2.5/5 =50.0% which are IDs 32316, 32280 and 32273.

The ID 2273 stands out with having only draws yet while i would prefer the other 2 with the same score as they already had a win.

Here is the current table after 5 rounds/games with scores sorted by percent (%):

Engine Score St % Elo

1: Lc0 v0.20.0rc2_ID32307 3.0/5 =1=10 60.0 +70 = 3534

2: Stockfish_10_x64_bmi2 57.0/100 ·········· 57.0 3464 (from CCRL 40/40 list)

3: Lc0 v0.20.0rc2_ID32316 2.0/5 =1==0 50.0 ±0 = 3464

3: Lc0 v0.20.0rc2_ID32280 2.5/5 1===0 50.0 ±0 = 3464

3: Lc0 v0.20.0rc2_ID32273 2.5/5 ===== 50.0 ±0 = 3464

6: Lc0 v0.20.0rc2_ID32246 4.5/10 ====010=01 45.0 -35 = 3429

6: Lc0 v0.20.0rc2_ID32223 4.5/10 ====0===01 45.0 -35 = 3429

8: Lc0 v0.20.0rc2_ID32301 2.0/5 ===0= 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32295 2.0/5 ====0 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32194 4.0/10 0=====0=01 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32253 4.0/10 =1=====000 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32236 4.0/10 0===0===== 40.0 -70 = 3394

14: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3356

100 of 130 games played

Owen W

unread,

Dec 28, 2018, 12:03:03 PM12/28/18

to LCZero

Your opening book?

Ingo Weidner

unread,

Dec 28, 2018, 12:17:19 PM12/28/18

to LCZero

Correction for the table above : ID 32316 has a score of 2.5/5. Actually now after 6 games it has a score of 3.0/6 = 50.0%.

@ Owen W:

As mentioned earlier in this thread for opening and endgames i use the default settings of the Arena 3.5.1 and the Lc0 engine.

Colors are reversed after each round.

The only values changed in the Lc0 engine are ponder = off and FPU srategy set to "absolute". (as used with the network training staring with v0.19.1).

Owen W

unread,

Dec 28, 2018, 12:38:22 PM12/28/18

to LCZero

So then each test run of a different NN could play a different opening than the others did?

Ingo Weidner

unread,

Dec 28, 2018, 12:48:03 PM12/28/18

to LCZero

@Owen W:

Both Lc0 and Stockfish use the default settings of Arena and the engines except ponder = off for both and FPU strategy = absolute for Lc0.

Same conditions for all Lc0 networks and all used the latest 0.20.0rc2 version of Lc0 (as could be seen in the table).

The Stockfish 10 version used here is the official release version.

The latest SF 11 dev version was only used for deeper testing the best network at 30min+5s TC which before the current tests was Lc0 ID 32246.

Owen W

unread,

Dec 28, 2018, 12:56:29 PM12/28/18

to LCZero

I don't use Arena, so I am just asking when you run the different NN against SF are they using the same opening lines as the previous NNs? So if I looked at your table and it has 5 games played for each of the NNs and what their performance was against SF, did they all play from the same starting positions?

Owen W

unread,

Dec 28, 2018, 12:59:51 PM12/28/18

to LCZero

It will be interesting to see the games when you are done.

On Friday, December 28, 2018 at 12:48:03 PM UTC-5, Ingo Weidner wrote:

Rajen Gupta

unread,

Dec 28, 2018, 2:21:10 PM12/28/18

to LCZero

Hi Ingo;

your work is very rewarding and gives all us Leela lovers hope

1: Lc0 v0.20.0rc2_ID32307 3.0/5 =1=10 60.0 +70 = 3534

2: Stockfish_10_x64_bmi2 57.0/100 ·········· 57.0 3464 (from CCRL 40/40 list)

3: Lc0 v0.20.0rc2_ID32316 Has a score of 3.0/6 = 50.0%

Looks like for the time being these are so far the strongest and based on limited testing equal to or stronger than SF 10. Any thoughts of investigating further with longer matches using one or both of the above?

Message has been deleted

Ingo Weidner

unread,

Dec 28, 2018, 3:35:17 PM12/28/18

to LCZero

@Owen W:

Hi, i do not know exactly what you mean with "starting position", but i already mentioned many times that after each round/game against SF 10 the colors are reversed which Arena does automatically.

That means that each network will play 5 times as white and 5 times as black.

Ingo Weidner

unread,

Dec 28, 2018, 3:38:36 PM12/28/18

to LCZero

After 6 games ID 32273 still has not lost a game but also no win yet.

ID 32307 leads with a score of 3.5/6 = 58.3% followed by 3 IDs with a score of 50.0% (IDs 32316, 32280 and 32273).

Here is the current table after 6 rounds/games with scores sorted by percent (%):

Engine Score St % Elo

1: Lc0 v0.20.0rc2_ID32307 3.5/6 =1=10= 58.3 +56 = 3520

2: Stockfish_10_x64_bmi2 60.5/106 ·········· 57.1 3464 (from CCRL 40/40 list)

3: Lc0 v0.20.0rc2_ID32316 3.0/6 =1==0= 50.0 ±0 = 3464

3: Lc0 v0.20.0rc2_ID32280 3.0/6 1===0= 50.0 ±0 = 3464

3: Lc0 v0.20.0rc2_ID32273 3.0/6 ====== 50.0 ±0 = 3464

6: Lc0 v0.20.0rc2_ID32246 4.5/10 ====010=01 45.0 -35 = 3429

6: Lc0 v0.20.0rc2_ID32223 4.5/10 ====0===01 45.0 -35 = 3429

8: Lc0 v0.20.0rc2_ID32301 2.5/6 ===0== 41.7 -56 = 3408

9: Lc0 v0.20.0rc2_ID32194 4.0/10 0=====0=01 40.0 -70 = 3394

9: Lc0 v0.20.0rc2_ID32253 4.0/10 =1=====000 40.0 -70 = 3394

9: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3394

9: Lc0 v0.20.0rc2_ID32236 4.0/10 0===0===== 40.0 -70 = 3394

13: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3356

14: Lc0 v0.20.0rc2_ID32295 2.0/6 ====00 33.3 -123 = 3341

106 of 130 games played

Ingo Weidner

unread,

Dec 28, 2018, 4:02:08 PM12/28/18

to LCZero

Attached is a screenshot about how the "gauntlet" tournament of SF 10 vs 6 different Lc0 networks looks like (lc0 currently playing as black):

SF10 vs 6xLc0 gauntlet_1.png

A table of the current results of the tournament is found at the upper right.

The tables i posted here is a combined one which includes the reults of a previous test with other networks.

Felix Zaslavskiy

unread,

Dec 28, 2018, 4:28:58 PM12/28/18

to Ingo Weidner, LCZero

It is like no one match appears to be statistically significant but as a whole there appears a glimmer of a trend that start to feel statistically significant.

The trends seems to be that higher IDs are stronger and I think that will show a bit more prominently once all 10 games are played out in all the matches.

We would be able to concluded that higher IDs are getting better which is how it should be.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/f59d6794-f84c-4bfa-a6f0-d4eaf0a8171a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ingo Weidner

unread,

Dec 28, 2018, 4:52:20 PM12/28/18

to LCZero

I would be careful about such conclusions concering the newer nets being better.

In round 7 both IDs 32316 and 32301 lost a game and are now ranked behind the "good old" ID 32247 that in another test had alraedy scored 4.0/8 = 50% aginst the latest SF 11 dev version at 30min+5s TC.

ID 32307 lost too and now has a score of 3.5/7 = 50.0%.

John D

unread,

Dec 28, 2018, 5:52:15 PM12/28/18

to LCZero

don't think you can conclude anything in such a small sample, so if self-elo + other metrics indicate newer is stronger thtat's the same assumption. as long as the nets remain markedly inferior in endgames (ideally, this won't be the case after the next LR drop) or contain tactical holes, losses are going to happen. eg was the loss caused by a blunder & if so does the 'good' net not make the same mistake? if not, does it play even that single game demonstrably better?

Ingo Weidner

unread,

Dec 28, 2018, 6:12:57 PM12/28/18

to LCZero

In round 7 only ID 32280 scored a draw while the 6 othernew networks lost a game.

Quite a lot of changes in the table after this round and ID 32246 now moved up to 4th place.

Here is the current combined table (new + older nets) after 7 rounds/games (colors reversed after each round) with scores sorted by percent (%):

Engine Score St % Elo

1: Stockfish_10_x64_bmi2 66.0/112 ·········· 58.9 3464 (from CCRL 40/40 list)

2: Lc0 v0.20.0rc2_ID32307 3.5/7 =1=10=0 50.0 ±0 = 3464

2: Lc0 v0.20.0rc2_ID32280 3.5/7 1===0== 50.0 ±0 = 3464

4: Lc0 v0.20.0rc2_ID32246 4.5/10 ====010=01 45.0 -35 = 3429

4: Lc0 v0.20.0rc2_ID32223 4.5/10 ====0===01 45.0 -35 = 3429

6: Lc0 v0.20.0rc2_ID32316 3.0/7 =1==0=0 42.9 -49 = 3415

6: Lc0 v0.20.0rc2_ID32273 3.0/7 ======0 42.9 -49 = 3415

9: Lc0 v0.20.0rc2_ID32194 4.0/10 0=====0=01 40.0 -70 = 3394

9: Lc0 v0.20.0rc2_ID32253 4.0/10 =1=====000 40.0 -70 = 3394

9: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3394

9: Lc0 v0.20.0rc2_ID32236 4.0/10 0===0===== 40.0 -70 = 3394

12: Lc0 v0.20.0rc2_ID32301 2.5/7 ===0==0 35.7 -100 = 3364

13: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3356

14: Lc0 v0.20.0rc2_ID32295 2.0/7 ====000 28.6 -156 = 3308

112 of 130 games played

Level: Blitz 15/3

Hardware: RAM: 16 GB (1GB hash), CPU: mobile i7-7700HQ 4x2.8GHz (3 cores used for engines), GPU: mobile Nvidia Gforce GTX 1050 TI 4GB (768 CUDA Cores), Leela ratio = 0.64

Operating system: Windows 10 Home Home Edition (Build 9200) 64 bit

Detailed results of the currently tested networks:

-----------------Lc0 v0.20.0rc2_ID32273-----------------

Lc0 v0.20.0rc2_ID32273 - Stockfish_10_x64_bmi2 : 3,0/7 0-1-6 (======0) 43% -49

-----------------Lc0 v0.20.0rc2_ID32280-----------------

Lc0 v0.20.0rc2_ID32280 - Stockfish_10_x64_bmi2 : 3,5/7 1-1-5 (1===0==) 50% ±0

-----------------Lc0 v0.20.0rc2_ID32295-----------------

Lc0 v0.20.0rc2_ID32295 - Stockfish_10_x64_bmi2 : 2,0/7 0-3-4 (====000) 29% -156

-----------------Lc0 v0.20.0rc2_ID32301-----------------

Lc0 v0.20.0rc2_ID32301 - Stockfish_10_x64_bmi2 : 2,5/7 0-2-5 (===0==0) 36% -100

-----------------Lc0 v0.20.0rc2_ID32307-----------------

Lc0 v0.20.0rc2_ID32307 - Stockfish_10_x64_bmi2 : 3,5/7 2-2-3 (=1=10=0) 50% ±0

-----------------Lc0 v0.20.0rc2_ID32316-----------------

Lc0 v0.20.0rc2_ID32316 - Stockfish_10_x64_bmi2 : 3,0/7 1-2-4 (=1==0=0) 43% -49

-----------------Stockfish_10_x64_bmi2-----------------

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32273 : 4,0/7 1-0-6 (======1) 57% +49

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32280 : 3,5/7 1-1-5 (0===1==) 50% ±0

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32295 : 5,0/7 3-0-4 (====111) 71% +156

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32301 : 4,5/7 2-0-5 (===1==1) 64% +100

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32307 : 3,5/7 2-2-3 (=0=01=1) 50% ±0

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32316 : 4,0/7 2-1-4 (=0==1=1) 57% +49

Ingo Weidner

unread,

Dec 28, 2018, 6:46:04 PM12/28/18

to LCZero

After it lost in round 7 ID 32316 just won in round 8 and is back to a score of 50.0% now.

Both IDs 32316 and 32307 have 2 wins now.

David B

unread,

Dec 28, 2018, 7:12:31 PM12/28/18

to LCZero

On Friday, December 28, 2018 at 3:38:36 PM UTC-5, Ingo Weidner wrote:

After 6 games ID 32273 still has not lost a game but also no win yet.
ID 32307 leads with a score of 3.5/6 = 58.3% followed by 3 IDs with a score of 50.0% (IDs 32316, 32280 and 32273).

Here is the current table after 6 rounds/games with scores sorted by percent (%):

Engine Score St % Elo
1: Lc0 v0.20.0rc2_ID32307 3.5/6 =1=10= 58.3 +56 = 3520
2: Stockfish_10_x64_bmi2 60.5/106 ·········· 57.1 3464 (from CCRL 40/40 list)
3: Lc0 v0.20.0rc2_ID32316 3.0/6 =1==0= 50.0 ±0 = 3464
3: Lc0 v0.20.0rc2_ID32280 3.0/6 1===0= 50.0 ±0 = 3464
3: Lc0 v0.20.0rc2_ID32273 3.0/6 ====== 50.0 ±0 = 3464
6: Lc0 v0.20.0rc2_ID32246 4.5/10 ====010=01 45.0 -35 = 3429
6: Lc0 v0.20.0rc2_ID32223 4.5/10 ====0===01 45.0 -35 = 3429
8: Lc0 v0.20.0rc2_ID32301 2.5/6 ===0== 41.7 -56 = 3408
9: Lc0 v0.20.0rc2_ID32194 4.0/10 0=====0=01 40.0 -70 = 3394
9: Lc0 v0.20.0rc2_ID32253 4.0/10 =1=====000 40.0 -70 = 3394
9: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3394
9: Lc0 v0.20.0rc2_ID32236 4.0/10 0===0===== 40.0 -70 = 3394
13: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3356
14: Lc0 v0.20.0rc2_ID32295 2.0/6 ====00 33.3 -123 = 3341

I'm running a match of your#1 above, net 32307 against net 11248 of Test 10, 5 min TC.

It's Lc0v0.20-32307 v. Lc0v0.19-11248 with score: +24 =15 -29 , 46% for Lc0v0.20-32307.

I was hoping 32307 would get at least 60%. One of the nets following the last LR drop

got about 62% vs net 11248 in 600 games, but I lost its ID number.

David Bernier

Ingo Weidner

unread,

Dec 28, 2018, 11:31:26 PM12/28/18

to LCZero

@David B:

Many thanks for your effort. Maybe would have been better to wait until the 10 games of my curent test are done.

Currently in round 9 ID 32316 leads with a score of 4.5/9 = 50.0% and an estimated Elo of 3464.

ID 32307 after round 9 has a score of 4.0/9 = 44.4% with an estimated Elo of 3422.

I had learned myself that testsing two networks against each other does not always give a proper information about how a network will perform against other engines like e.g. Stockfish 10.

Besides that you used a shorter time control of 5min which couldmake a difference too.

After the 10 rounds are finished i plan to add some references under the same conditions including Lc0 v0.19.1 ID 11248 and some other engines like e.g. Fire 7.1, Komodo 9.02 and Ethereal 11.11.

Message has been deleted

Owen W

unread,

Dec 29, 2018, 12:44:25 AM12/29/18

to LCZero

I mean each NN plays the same positions exact positions against SF, so in your example if there are going to be 10 games played, presumably that is 5 different positions, each new network that you pit against SF would play from those same exact 5 positions.

Owen W

unread,

Dec 29, 2018, 1:04:14 AM12/29/18

to LCZero

I will run 32316 vs SF 241218 64 BMI2 for 10 games at the same control 15m + 3s and see how it compares to what you say you are getting, I suspect this will vary greatly with so few of games and starting positions. But you are saying that this NN after 9 games is dead equal to SF, in terms of score.

5 threads on my machine for SF is close to 1 LeR give or take depending on the positions (can be discovered and calculated from the pgn) with my NVIDIA QUADRO P5000, but I have found that in testing SF vs NNs that it does not make that much difference on how many threads are ran above 5.

Owen W

unread,

Dec 29, 2018, 1:13:54 AM12/29/18

to LCZero

I will be streaming them at https://www.twitch.tv/bohemian65

Also https://tcecbonus.club/? is running 32329 at 20m + 2s against the latest dev of SF, but SF has crashed 4 times so far, must be something wrong with that from the other day

Ingo Weidner

unread,

Dec 29, 2018, 3:03:36 AM12/29/18

to LCZero

I still do not really get it. I do not give any "fixed positions" to the engines, The engines just play "as is" from the start position which should be the normal way of playing chess...

As already mentioned the colors are reversed after each round so each network plays 5 times white and 5 times black.

Ingo Weidner

unread,

Dec 29, 2018, 3:10:08 AM12/29/18

to LCZero

@David B:

The first games of round 10 are finished and ID 32307 lost the last two games. After 10 games it has a score of 4.0/10 = 40.0% with an estimated Elo of 3396.

This might explain why ID 11248 performed slightly better in your test while using 5 mins time control still might make a difference too.

ID 32316 finished with 5.0/10 = 50.0% and an estimated Elo of 3464. I will poost teh final table when al games are done.

After that i will add ID 11248 for a comparison and maybe also some other engines.

Owen W

unread,

Dec 29, 2018, 3:10:39 AM12/29/18

to LCZero

From the arena opening book , no?

Owen W

unread,

Dec 29, 2018, 3:11:26 AM12/29/18

to LCZero

So I guess leela plays e4 every time she is white?

On Saturday, December 29, 2018 at 3:03:36 AM UTC-5, Ingo Weidner wrote:

Owen W

unread,

Dec 29, 2018, 3:15:24 AM12/29/18

to LCZero

32316 so for after 5 games of 15m + 3s has 0 wins, 3 draws, 2 losses

Ingo Weidner

unread,

Dec 29, 2018, 3:17:26 AM12/29/18

to LCZero

@ Owen W:

After 10 rounds ID 32316 indeed has a score of 5.0/10 = 50.0% vs SF 10 now.

This is the official release version of SF 10 and NOT the latest development version so if you play against the latest dev version the result might be different.

I had used latest Stockfish 11 dev against ID 32246 at a TC of 30min+5s where it scored 4.0/8 = 50%.

Those games were posted here:

https://groups.google.com/forum/#!topic/lczero/kY6BQYa1t2U

Owen W

unread,

Dec 29, 2018, 3:19:05 AM12/29/18

to LCZero

lol there is always a rub isn't there.

Owen W

unread,

Dec 29, 2018, 3:20:27 AM12/29/18

to LCZero

I guarantee if I set up the same exact scenario that seems to evolving here the results will be different. I have played the exact same conditions on my machine a few times in a row and results were always different.

On Saturday, December 29, 2018 at 3:17:26 AM UTC-5, Ingo Weidner wrote:

Ingo Weidner

unread,

Dec 29, 2018, 3:32:58 AM12/29/18

to LCZero

Here at a TC of 15min+3s against SF 10 (official release version) ID 32316 in 10 games had 2 wins, 6 draws and 2 losses.

Ingo Weidner

unread,

Dec 29, 2018, 3:43:12 AM12/29/18

to LCZero

Another note about the Stockfish version:

The official release version of Stockfish 10 that is used in this test is still the same as when it was release in end of November and is available at the offical website.

The development versions are actually early versions of an upcoming Stockfish 11. Due to that of course there could be differences in the results when using either SF10 or Sf11dev even when used at the same system/hardware.

Owen W

unread,

Dec 29, 2018, 3:46:48 AM12/29/18

to LCZero

Intel(R) Core(TM) i9-7940X CPU @ 3.10GHz 3095 MHz
Using 5 Threads for SF
NVIDIA QUADRO P5000 for Leela
LeR varies, but in general on 5 Threads SF is around 8-9mK/s and Leela is around 8kN/s, obviously this would be position specific.
Both have access to 6 man TB, Contempt is set to 0 for SF
All other parameters are left at their default values for both engines.
The opening positions come from Balsa Top 10 and neither has access to an opening book beyond that, they play from the position given.
https://sites.google.com/site/computerschess/balsa-opening-test-suite

Now apparently there is some caveat about individual testing, so there will be another test ran under the same conditions with the exception that there will be NO OPENING BOOK allowed for either engine and it specifically needs to be the released version of SF 10 and not any of the dev versions, even though this is what leela is having to play in hosted tournaments. But the question is should leela or SF have access to TB's? If not, this would also seem contrary to any other test going on and especially if one is not allowing any kind of opening book or starting positions of some kind that the engines play both sides of? So in theory there should not be any access to TBs.

I will post the results of both conditions to see how they compare. In general I have found that without a starting positions leela as white plays e4, and as of late she seems to be playing c5 against e4 versus e6 before.

I have also ran tons of tests at various time controls and various conditions, and I have even reran tests and have found that for any consistency of performance there has to be quite a few games because in shorter test cases the results often vary when rerunning the same test. I have had leela win, lose and draw from the same exact starting position using the same control.

Owen W

unread,

Dec 29, 2018, 3:48:25 AM12/29/18

to LCZero

I have all of them as most people do

Owen W

unread,

Dec 29, 2018, 3:58:01 AM12/29/18

to LCZero

Any particular version of SF10?

SF10 x64
SF10 x64 BMI2
SF10 x64 POPCNT

On Saturday, December 29, 2018 at 3:43:12 AM UTC-5, Ingo Weidner wrote:

Ingo Weidner

unread,

Dec 29, 2018, 4:09:21 AM12/29/18

to LCZero

SF10 x64 BMI2 as you could see in my table and the screenshot i posted in one of my previous posts... :)

BMI is the version for latest Intel CPUs.

Owen W

unread,

Dec 29, 2018, 4:10:53 AM12/29/18

to LCZero

your screenshot is way to small for my screen so no I could not see it

Owen W

unread,

Dec 29, 2018, 4:12:07 AM12/29/18

to LCZero

Houdini is Popc because it believes it is suppose to be better, interesting no?

Message has been deleted

Owen W

unread,

Dec 29, 2018, 4:15:08 AM12/29/18

to LCZero

Also TBs heavily favor SF and do nothing for leela as she ill mill around for quite while even with a rook and king vs a lone king and eventually will mate the king even with TBs, whereas SF will give the exact amount of moves until mate.

Ingo Weidner

unread,

Dec 29, 2018, 4:31:53 AM12/29/18

to LCZero

In my browser (Chrome) a can use "open in new tab" from the right-click menu. Then it will show in full size. It is also possible to download it with the right-click menu.

Vassilis

unread,

Dec 29, 2018, 4:56:13 AM12/29/18

to LCZero

Hi Owen!

You just mention one case, where Monte Carlo is inefficient, and Leela should revert to AB tree search.
Simple endgames with TBs. Totally agree with you...

Hi Ingo!

With your tests so far, which you believe is the best 3xxxx networks?
Judging not only from the results, but from the quality of games also.
How about testing 32333? Looks too solid on my Laptop.

Thx

Ingo Weidner

unread,

Dec 29, 2018, 5:12:16 AM12/29/18

to LCZero

Hi,

the last game of the current networks (tested until 32319 so far) is still running and if ID 32273 scores a draw there will be 3 IDs with a score of 50% after 10 games. Others are 32316 and 32280.

Theer are always new networks but if i always add them during the running tournament i will never finish the 10 games for the earlier networks.

FWIW i have a already downloaded a few new networks.

After the current test the next step for me is to add one or multiple "references" under the same conditions like e.g. Lc0 ID 11248 and other engines like Fire 7.1, Komodo 9.02 and Ethereal 11.11.

The results will the again be sorted into the table so it will be a growing overview over many new networks and some references.

Ingo Weidner

unread,

Dec 29, 2018, 5:29:56 AM12/29/18

to LCZero

After adding the "references" it might be necessary to play more games with the best networks to see which one is really the best.

The results of this test are just a kind of "qualification" for further tests in the future. With ID 32246 the deeper tests that followed were quite successful.

It would take ages to test all those new networks with 20 or more games at 15min+3s TC and in the past i found that testing with very short TC could then lead to a disaster at longer time controls.

Ingo Weidner

unread,

Dec 29, 2018, 5:41:08 AM12/29/18

to LCZero

I just decided this:

After the current networks finished the 10 games i will add ID 11248 to play 10 games too and after that the best networks will continue playing more games, at least 6 more for each of them.

When this is done with ID 11248 i will have a "reference" and hopefully will find a single best network until 32319 which was the last one tested so far.

Ingo Weidner

unread,

Dec 29, 2018, 5:57:46 AM12/29/18

to LCZero

Here is the table of the currently tested networks (tested up to ID 32319) after 10 rounds/games are finished (scores sorted by percent):

Engine Score St % Elo

1: Stockfish_10_x64_bmi2 74.5/130 ·········· 57.3 3464 (from CCRL 40/40 list)

2: Lc0 v0.20.0rc2_ID32316 5.0/10 =1==0=01== 50.0 ±0 = 3464

2: Lc0 v0.20.0rc2_ID32280 5.0/10 1===0===== 50.0 ±0 = 3464

2: Lc0 v0.20.0rc2_ID32273 5.0/10 ======0=1= 50.0 ±0 = 3464

5: Lc0 v0.20.0rc2_ID32295 4.5/10 ====000=11 45.0 -35 = 3429

5: Lc0 v0.20.0rc2_ID32246 4.5/10 ====010=01 45.0 -35 = 3429

5: Lc0 v0.20.0rc2_ID32223 4.5/10 ====0===01 45.0 -35 = 3429

8: Lc0 v0.20.0rc2_ID32307 4.0/10 =1=10=0=00 40.0 -70 = 3396

8: Lc0 v0.20.0rc2_ID32194 4.0/10 0=====0=01 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32253 4.0/10 =1=====000 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3394

8: Lc0 v0.20.0rc2_ID32236 4.0/10 0===0===== 40.0 -70 = 3394

13: Lc0 v0.20.0rc2_ID32301 3.5/10 ===0==0==0 35.0 -108 = 3356

13: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3356

130 of 130 games played

Level: Blitz 15/3

Hardware: RAM: 16 GB (1GB hash), CPU: mobile i7-7700HQ 4x2.8GHz (3 cores used for engines), GPU: mobile Nvidia Gforce GTX 1050 TI 4GB (768 CUDA Cores), Leela ratio = 0.64

Operating system: Windows 10 Home Home Edition (Build 9200) 64 bit

NOTE: Colors are reversed after each round, so each networks plays 5 times as white and 5 times as black.

Detailed results of the currently tested networks (the table above is a combined table with older and new networks);

-----------------Lc0 v0.20.0rc2_ID32273-----------------

Lc0 v0.20.0rc2_ID32273 - Stockfish_10_x64_bmi2 : 5,0/10 1-1-8 (======0=1=) 50% ±0

-----------------Lc0 v0.20.0rc2_ID32280-----------------

Lc0 v0.20.0rc2_ID32280 - Stockfish_10_x64_bmi2 : 5,0/10 1-1-8 (1===0=====) 50% ±0

-----------------Lc0 v0.20.0rc2_ID32295-----------------

Lc0 v0.20.0rc2_ID32295 - Stockfish_10_x64_bmi2 : 4,5/10 2-3-5 (====000=11) 45% -35

-----------------Lc0 v0.20.0rc2_ID32301-----------------

Lc0 v0.20.0rc2_ID32301 - Stockfish_10_x64_bmi2 : 3,5/10 0-3-7 (===0==0==0) 35% -108

-----------------Lc0 v0.20.0rc2_ID32307-----------------

Lc0 v0.20.0rc2_ID32307 - Stockfish_10_x64_bmi2 : 4,0/10 2-4-4 (=1=10=0=00) 40% -70

-----------------Lc0 v0.20.0rc2_ID32316-----------------

Lc0 v0.20.0rc2_ID32316 - Stockfish_10_x64_bmi2 : 5,0/10 2-2-6 (=1==0=01==) 50% ±0

-----------------Stockfish_10_x64_bmi2-----------------

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32273 : 5,0/10 1-1-8 (======1=0=) 50% ±0

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32280 : 5,0/10 1-1-8 (0===1=====) 50% ±0

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32295 : 5,5/10 3-2-5 (====111=00) 55% +35

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32301 : 6,5/10 3-0-7 (===1==1==1) 65% +108

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32307 : 6,0/10 4-2-4 (=0=01=1=11) 60% +70

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32316 : 5,0/10 2-2-6 (=0==1=10==) 50% ±0

Vassilis

unread,

Dec 29, 2018, 6:01:14 AM12/29/18

to LCZero

Good choice!

ID 11248 is a hard nut to crack and plays clever chess, so it definitely should be included (in every future test I would say, until it is obvious that is inferior to t30)

Owen W

unread,

Dec 29, 2018, 6:02:33 AM12/29/18

to LCZero

It would be hard to say which test30 is the best because past a certain point they are all about the same right now, slight above or equal with 11248. The NNs still got a ways to go to catch he latest SF. Running 32316 right now against SF and out of 9 games played so far she has not won a game, has drew 7 times and lost twice.with a 15m + 3 time control which would around -79 elo to SF in this example so far, but 10 games is really way to small of a sample size it should really be around 100 games with a variety of opening positions, but that would take quite a while to do.

Owen W

unread,

Dec 29, 2018, 6:05:08 AM12/29/18

to LCZero

Do you have the pgn of these games? I would like to see the opening moves of the different NNs vs SF

Vassilis

unread,

Dec 29, 2018, 6:07:27 AM12/29/18

to LCZero

Yes, the latest nets have a higher draw tendency. Maybe they become more mature...
Just remind me, how well did 11248 go against SF, on your hardware, under the same time control?

Ingo Weidner

unread,

Dec 29, 2018, 6:13:44 AM12/29/18

to LCZero

Currently those 2 tests were started:

1.) 10 games of Lc0 v0.19.1.1 ID 11248 vs Sf 10 at 15min+3s TC

2.) At least 6 more games for the 3 best networks 32316, 32280 and 32273

After that more new networks will be checked.

Ingo Weidner

unread,

Dec 29, 2018, 6:37:14 AM12/29/18

to LCZero

I had alraedy tested ID 11248 in the past with an average Elo of 3415.

Anyway as a reference for my table i want to use some "fresh" results at the same conditions as the other networks.

FWIW as i sort the table by the percent value i could also use more than 10 games per network there.

Owen W

unread,

Dec 29, 2018, 6:52:53 AM12/29/18

to LCZero

After 10 games the results are:
Lc0 v0.20.0-rc1 (32316) 0 wins, 7 draws, 3 losses
SF 241218 64 BMI2 3 wins, 7 draws, 0 losses

TP = -108 Elo, 68%->[-207,-70], 95%->[-352,-35]

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "1"]
[White "Lc0 v0.20.0-rc1 (32316)"]
[Black "Stockfish 241218 64 BMI2"]
[Result "1/2-1/2"]
[ECO "E15"]
[PlyCount "129"]

1. d4 Nf6 2. c4 e6 3. Nf3 b6 4. g3 Ba6 5. b3 Bb4+ 6. Bd2 Be7 7. Bg2 O-O 8. Nc3 c6 9. e4 d5 10. exd5 cxd5 11. Ne5 Nfd7 12. O-O Nxe5 13. dxe5 Nd7 14. Re1 dxc4 15. Bxa8 Qxa8 16. Bh6 Rd8 17. Qg4 Bf8 18. Rad1 cxb3 19. Bg5 Nxe5 20. Rxe5 Rxd1+ 21. Qxd1 b2 22. Re1 Qc6 23. Bd2 Bb7 24. f3 b5 25. Nb1 b4 26. Bf4 a5 27. Be5 Qc4 28. Kg2 Qxa2 29. Re2 Qd5 30. Qxd5 Bxd5 31. Bxb2 a4 32. Bd4 f6 33. Nd2 Kf7 34. Ba1 g5 35. Kf1 e5 36. Re3 h5 37. h3 a3 38. Rd3 Ke6 39. g4 hxg4 40. hxg4 Bc5 41. Ke2 Bd4 42. Rxd4 exd4 43. Bxd4 f5 44. Bg7 fxg4 45. fxg4 b3 46. Nb1 a2 47. Nd2 Kd6 48. Bc3 Kc5 49. Kd3 Kb5 50. Ke3 Be6 51. Kf3 Ka4 52. Bb2 Bd5+ 53. Kf2 Bc6 54. Nc4 Kb4 55. Ne5 Bb5 56. Ke3 a1=Q 57. Bxa1 Ka3 58. Nd3 Bd7 59. Kf3 Bb5 60. Ke4 Bc6+ 61. Kf5 Bb5 62. Ke4 Bc6+ 63. Ke3 Bd7 64. Kf3 Bc6+ 65. Ke2 1/2-1/2

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "2"]
[White "Stockfish 241218 64 BMI2"]
[Black "Lc0 v0.20.0-rc1 (32316)"]
[Result "1/2-1/2"]
[ECO "E15"]
[PlyCount "61"]

1. d4 Nf6 2. c4 e6 3. Nf3 b6 4. g3 Ba6 5. b3 Bb4+ 6. Bd2 Be7 7. Bg2 c6 8. Bc3 d5 9. Nbd2 Nbd7 10. O-O O-O 11. Re1 c5 12. e4 dxe4 13. Nxe4 Bb7 14. Ned2 Re8 15. dxc5 Nxc5 16. Ne5 Qc7 17. Bxb7 Qxb7 18. b4 Ncd7 19. Qf3 Qxf3 20. Ndxf3 Rec8 21. Rac1 Rc7 22. Kg2 Kf8 23. Re2 Nxe5 24. Nxe5 Nd7 25. Rec2 Rac8 26. Nxd7+ Rxd7 27. Bd2 Rcd8 28. Be1 Rc8 29. Bd2 Rcd8 30. Be1 Rc8 31. Bd2 1/2-1/2

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "3"]
[White "Lc0 v0.20.0-rc1 (32316)"]
[Black "Stockfish 241218 64 BMI2"]
[Result "0-1"]
[ECO "C11"]
[PlyCount "92"]

1. e4 e6 2. d4 d5 3. Nc3 Nf6 4. e5 Nfd7 5. f4 c5 6. Nf3 Nc6 7. Be3 cxd4 8. Nxd4 Bc5 9. Qd2 O-O 10. O-O-O Nxd4 11. Bxd4 a6 12. h4 b5 13. a3 Bxd4 14. Qxd4 Qe7 15. h5 Rb8 16. Qb4 Qd8 17. h6 g6 18. Qd4 Qe7 19. Rh3 b4 20. Na2 bxa3 21. Rxa3 Bb7 22. g3 Rfc8 23. g4 Ra8 24. Nc3 Rcb8 25. Qe3 Bc6 26. Ne2 Nc5 27. f5 Rb6 28. Rc3 Na4 29. Rb3 Bb5 30. Nc3 Nc5 31. Rb4 Rab8 32. fxe6 fxe6 33. Nxb5 axb5 34. c3 Na6 35. Rf4 Nc5 36. Be2 b4 37. c4 Rc8 38. Rdf1 d4 39. Qf3 Rb7 40. Rxd4 Ra7 41. Kb1 b3 42. Qf6 Rca8 43. Kc1 Ne4 44. Rxe4 Rd8 45. Rd4 Rxd4 46. Kb1 Ra1+ 0-1

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "4"]
[White "Stockfish 241218 64 BMI2"]
[Black "Lc0 v0.20.0-rc1 (32316)"]
[Result "1-0"]
[ECO "C11"]
[PlyCount "71"]

1. e4 e6 2. d4 d5 3. Nc3 Nf6 4. e5 Nfd7 5. f4 c5 6. Nf3 Nc6 7. Be3 cxd4 8. Nxd4 Qb6 9. Qd2 Qxb2 10. Rb1 Qa3 11. Bb5 Nxd4 12. Bxd4 a6 13. Bxd7+ Bxd7 14. Rb3 Qe7 15. Rxb7 Qh4+ 16. Qf2 Be7 17. g3 Qh3 18. Kd2 Bc8 19. Rxe7+ Kxe7 20. Bc5+ Ke8 21. Re1 f6 22. exf6 gxf6 23. Bd4 Kf7 24. Bxf6 Rg8 25. Bh4 Rf8 26. Re5 h6 27. Qe2 Kg7 28. Rh5 Kg6 29. Nd1 e5 30. Nf2 Qg2 31. Be7 Rf7 32. Bg5 Rh7 33. Bxh6 Rc7 34. Bf8 Raa7 35. Rh6+ Kf7 36. Bc5 1-0

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "5"]
[White "Lc0 v0.20.0-rc1 (32316)"]
[Black "Stockfish 241218 64 BMI2"]
[Result "1/2-1/2"]
[ECO "D46"]
[PlyCount "136"]

1. d4 d5 2. c4 c6 3. e3 Nf6 4. Nf3 e6 5. Nc3 Nbd7 6. Qc2 Bd6 7. Bd3 O-O 8. O-O
dxc4 9. Bxc4 b5 10. Be2 a6 11. Ng5 Qc7 12. e4 h6 13. e5 hxg5 14. exd6 Qxd6 15.
Bxg5 Qxd4 16. Rfe1 Qc5 17. h4 b4 18. Be3 Qe5 19. Na4 Nd5 20. Bf3 Nxe3 21. Rxe3
Qf4 22. Re4 Qf5 23. Qd2 Ne5 24. Rf4 Qh7 25. Be2 Ng6 26. Rxb4 a5 27. Re4 Nxh4
28. Nc5 Ng6 29. Re1 e5 30. Nd7 Nf4 31. Rxf4 Bxd7 32. Bd3 Qh6 33. Rxe5 Rfe8 34.
Rxe8+ Rxe8 35. Bc4 Re1+ 36. Qxe1 Qxf4 37. b3 Qg5 38. Qc3 Kf8 39. a4 Qc5 40. Qd2
Be8 41. Qf4 f6 42. Qc7 Qe5 43. Qd8 Qa1+ 44. Bf1 Qe5 45. Bc4 Qa1+ 46. Bf1 Qe5
47. g3 g5 48. b4 axb4 49. a5 b3 50. a6 b2 51. Bd3 b1=Q+ 52. Bxb1 Qe1+ 53. Kg2
Qxb1 54. Qxf6+ Bf7 55. a7 Qe4+ 56. Kh2 Qh7+ 57. Kg1 Qb1+ 58. Kh2 Qh7+ 59. Kg1
Qb1+ 60. Kg2 Qe4+ 61. Kh3 Qh1+ 62. Kg4 Qd1+ 63. Kxg5 Qc1+ 64. Kh4 Qh1+ 65. Kg4
Qd1+ 66. Kg5 Qc1+ 67. Kh4 Qh1+ 68. Kg5 Qc1+ 1/2-1/2

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "6"]
[White "Stockfish 241218 64 BMI2"]
[Black "Lc0 v0.20.0-rc1 (32316)"]
[Result "1/2-1/2"]
[ECO "D46"]
[PlyCount "103"]

1. d4 d5 2. c4 c6 3. e3 Nf6 4. Nf3 e6 5. Nc3 Nbd7 6. Qc2 Bd6 7. Bd3 dxc4 8.
Bxc4 O-O 9. O-O b5 10. Be2 Bb7 11. e4 e5 12. dxe5 Nxe5 13. h3 Re8 14. Rd1 Nxf3+
15. Bxf3 Qe7 16. Be3 a6 17. Ne2 Qc7 18. Rd2 Be5 19. Rc1 c5 20. Qxc5 Qxc5 21.
Rxc5 Bxe4 22. Kf1 Rac8 23. Rxc8 Rxc8 24. Nd4 h6 25. Ke2 Kh7 26. g3 Rd8 27. g4
g6 28. a3 Rd6 29. b3 Bd5 30. Bxd5 Nxd5 31. Kd3 Bxd4 32. Kxd4 Nxe3+ 33. Kxe3
Re6+ 34. Kd4 Rf6 35. Ke3 Rc6 36. Rd4 Kg7 37. Kd2 Kf6 38. Rd7 g5 39. Rb7 Kg7 40.
Ra7 Rf6 41. Ke3 Rc6 42. f3 Rc3+ 43. Kf2 Rxb3 44. Rxa6 f6 45. a4 bxa4 46. Rxa4
Rb2+ 47. Kg3 Kg6 48. h4 gxh4+ 49. Kxh4 Rb1 50. Ra6 Rb4 51. Kg3 Rb3 52. Ra5
1/2-1/2

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "7"]
[White "Lc0 v0.20.0-rc1 (32316)"]
[Black "Stockfish 241218 64 BMI2"]
[Result "1/2-1/2"]
[ECO "B12"]
[PlyCount "312"]

1. e4 c6 2. d4 d5 3. e5 Bf5 4. c3 e6 5. Nf3 Nd7 6. Be2 Ne7 7. O-O h6 8. Nbd2 a5
9. a4 Bg6 10. Re1 c5 11. Nb3 Nc6 12. Be3 cxd4 13. Nfxd4 Be7 14. Bb5 Nxd4 15.
cxd4 O-O 16. Rc1 Nb8 17. Nc5 Qb6 18. Qg4 Bf5 19. Qg3 Kh8 20. h3 Bxc5 21. Rxc5
Na6 22. Rc3 Rac8 23. Rec1 Rc7 24. Qh4 Kg8 25. Bxh6 gxh6 26. Qxh6 Rxc3 27. Rxc3
Qxd4 28. Rg3+ Bg6 29. Bd3 Qxd3 30. Rxd3 Bxd3 31. h4 Re8 32. h5 Nc5 33. Qf6 Nd7
34. Qg5+ Kf8 35. Qh6+ Ke7 36. Qd2 Bf5 37. Qxa5 Rc8 38. Qb4+ Ke8 39. Qxb7 Rb8
40. Qa7 Rb4 41. a5 d4 42. Qa8+ Rb8 43. Qf3 Nxe5 44. Qf4 Rd8 45. a6 d3 46. a7 d2
47. Qxd2 Rxd2 48. a8=Q+ Ke7 49. f3 Rd3 50. b4 Rd1+ 51. Kf2 Rd2+ 52. Kf1 Nd7 53.
g4 Bb1 54. Kg1 Rd6 55. b5 Rb6 56. Qa3+ Ke8 57. Qb3 Bh7 58. f4 Rd6 59. Qb4 Rd1+
60. Kh2 Rb1 61. Qc4 Kf8 62. Kh3 Rh1+ 63. Kg3 Rg1+ 64. Kh3 Rb1 65. Kh4 Rh1+ 66.
Kg3 Rg1+ 67. Kh3 Rb1 68. Qa4 Nf6 69. Qd4 Ke7 70. Qc5+ Ke8 71. f5 Rh1+ 72. Kg2
Rh4 73. Qc6+ Ke7 74. Qb7+ Ke8 75. Qa8+ Ke7 76. Qa7+ Kf8 77. Qc5+ Ke8 78. Qc6+
Ke7 79. Qb7+ Ke8 80. Qb8+ Ke7 81. Qc7+ Ke8 82. Qc8+ Ke7 83. Qc7+ Ke8 84. Qb8+
Ke7 85. Qa7+ Kf8 86. Qa3+ Ke8 87. Qa8+ Ke7 88. Qa3+ Ke8 89. Kf2 Rxg4 90. Qa8+
Ke7 91. Qa3+ Ke8 92. Qa8+ Ke7 93. Qa7+ Kf8 94. b6 Bxf5 95. b7 Nd7 96. Qa3+ Kg8
97. Qe7 e5 98. Qe8+ Kh7 99. Qxf7+ Kh6 100. Ke2 Rf4 101. Kd2 Rf1 102. Kc3 Rf4
103. Qe8 Rf3+ 104. Kb2 Rh3 105. Qf7 Rxh5 106. Qg8 Rh1 107. Qh8+ Bh7 108. Qc8
Rb1+ 109. Ka2 Rd1 110. Qd8 Bb1+ 111. Ka3 Bf5 112. Ka2 Be6+ 113. Kb2 Rf1 114.
Qc8 Bf5 115. Ka2 Rf2+ 116. Ka3 e4 117. Qd8 Rf3+ 118. Kb2 e3 119. Kc3 e2+ 120.
Kd2 e1=B+ 121. Kxe1 Rb3 122. Qh8+ Bh7 123. Qc8 Rb1+ 124. Kf2 Bf5 125. Kg3 Kg6
126. Kf4 Rf1+ 127. Ke3 Re1+ 128. Kd2 Rb1 129. Qc6+ Kg5 130. Ke3 Rb3+ 131. Kd2
Rb2+ 132. Ke3 Rb1 133. Kd2 Nc5 134. Qg2+ Kf6 135. Qc6+ Be6 136. Qf3+ Ke7 137.
Qa3 Rxb7 138. Qxc5+ Ke8 139. Qc6+ Rd7+ 140. Ke3 Kf7 141. Kf4 Rd5 142. Qc7+ Kf6
143. Ke4 Rd7 144. Qf4+ Ke7 145. Qh4+ Kf7 146. Qf4+ Ke7 147. Qh4+ Kf7 148. Qf2+
Ke7 149. Qc5+ Kf7 150. Qh5+ Kf6 151. Qf3+ Ke7 152. Qa3+ Kf7 153. Qf3+ Ke7 154.
Qa3+ Kf7 155. Ke5 Rd5+ 156. Ke4 Rd7 1/2-1/2

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "8"]
[White "Stockfish 241218 64 BMI2"]
[Black "Lc0 v0.20.0-rc1 (32316)"]
[Result "1/2-1/2"]
[ECO "B12"]
[PlyCount "517"]

1. e4 c6 2. d4 d5 3. e5 Bf5 4. c3 e6 5. Nf3 Nd7 6. Be2 h6 7. O-O a6 8. Be3 Ne7
9. a4 Bh7 10. Na3 Nf5 11. Bd2 c5 12. b4 c4 13. a5 Be7 14. Nc2 O-O 15. Nce1 g5
16. g4 Ng7 17. Qc1 f6 18. exf6 Rxf6 19. Be3 Be4 20. h4 gxh4 21. Nxh4 Rf7 22.
Nhg2 h5 23. f3 Bh7 24. g5 Nf5 25. f4 Ng3 26. Rf2 Nxe2+ 27. Rxe2 Nf8 28. Nf3 Bd6
29. Ngh4 Ng6 30. Nxg6 Bxg6 31. Rh2 Qd7 32. Rh4 Bf5 33. Ne5 Bxe5 34. dxe5 Rh7
35. Ra2 Qf7 36. Rah2 Qg6 37. Rd2 Kg7 38. Bc5 Bg4 39. Qc2 Bf5 40. Qa2 Qf7 41.
Kf2 Kg6 42. Ke1 Bg4 43. Qb1+ Bf5 44. Qb2 Rc8 45. Rd4 Bd3 46. Bb6 Rb8 47. Kd2
Qf5 48. Kc1 Qf7 49. Qa3 Qd7 50. Qa2 Rf8 51. Qa1 Rf5 52. Kb2 Rh8 53. Qd1 Rf7 54.
Bc5 Qc6 55. Ka3 Qc8 56. Kb2 Qd8 57. Qd2 Qe8 58. Qd1 Rf5 59. Ka3 Qd7 60. Ba7 b5
61. Bc5 Rh7 62. Kb2 Qd8 63. Bb6 Qg8 64. Bc5 Rff7 65. Ka3 Rh8 66. Bd6 Qh7 67.
Kb2 Bf5 68. Rd2 Bg4 69. Qf1 Kg7 70. Rd4 Qf5 71. Qc1 Kg6 72. Bc5 Rd7 73. Bb6 Rf8
74. Bc5 Rc8 75. Bb6 Rh8 76. Bc5 Ra8 77. Qg1 Rc8 78. Bb6 Rf8 79. Bc5 Rc8 80. Bb6
Kg7 81. Qf2 Kh7 82. Rh2 Kg6 83. Qg2 Rb7 84. Rh1 Rd7 85. Rh4 Rb8 86. Rh1 Rf7 87.
Bc5 Ra8 88. Bd6 Rh8 89. Rh4 Rd7 90. Qh2 Re8 91. Qd2 Rc8 92. Qc1 Re8 93. Qa1 Rh8
94. Bc5 Rg8 95. Qe1 Re8 96. Qe3 Rc8 97. Bb6 Rb8 98. Bc5 Rh8 99. Bd6 Rc8 100.
Qc1 Ra8 101. Bc5 Rg8 102. Qa1 Rgg7 103. Qh1 Rd8 104. Bb6 Rb8 105. Qg2 Rh7 106.
Bc5 Rc8 107. Qg1 Rxc5 108. bxc5 Rc7 109. Qa1 Rxc5 110. Rh2 Qf8 111. Qa3 b4 112.
cxb4 Rb5 113. Kc3 Bf5 114. Qa4 Be4 115. Rh4 Kg7 116. Qa3 Bg6 117. Qb2 Rb7 118.
Qa3 Qf5 119. Qc1 Qf8 120. Qb2 Qe8 121. Qa3 Rb8 122. Rh2 Qf8 123. Rh1 Be4 124.
Rh4 Bg6 125. Qb2 Qe8 126. Qa3 Rb7 127. Rh2 Bf5 128. Kd2 Kg6 129. Kc3 Qb5 130.
Ra2 Rb8 131. Qa4 Qb7 132. Rh2 Rh8 133. Rh4 Kg7 134. Qa3 Qb5 135. Kb2 Kg6 136.
Kc3 Rb8 137. Rh2 Qd7 138. Rh4 Qb5 139. Rh2 Ra8 140. Qa2 Rh8 141. Rh4 Rb8 142.
Qa3 Rb7 143. Rh1 Qc6 144. Rh2 Qe8 145. Kb2 Rh7 146. Rd1 h4 147. Qc3 Qa4 148.
Ra1 Qc6 149. Rc1 Qa4 150. Ra1 Qb5 151. Rc1 h3 152. Ka3 Rb7 153. Rd1 Rh7 154.
Qd4 Rb7 155. Qb2 Qc6 156. Qc3 Rh7 157. Rc1 Qb5 158. Rd1 Qc6 159. Rc1 Qc7 160.
Qd4 Qb8 161. Qc5 Rc7 162. Qe3 Rb7 163. Qc3 Qh8 164. Rch1 Rh7 165. Rg1 Qd8 166.
Rc1 Qc8 167. Rcc2 Qd7 168. Rc1 Qb7 169. Re1 Qa7 170. Rd1 Qf7 171. Qf3 Be4 172.
Qg4 Bf5 173. Qg3 Qc7 174. Qc3 Qc8 175. Rc1 Qc7 176. Qd4 Qa7 177. Qxa7 Rxa7 178.
Kb2 Rd7 179. Rd2 Be4 180. Kc3 Bg2 181. Ra1 Rb7 182. Rf2 Rb8 183. Rc1 Rf8 184.
Rd1 Kf5 185. Ra1 Ke4 186. Re1+ Kf5 187. Ra1 Rf7 188. Rg1 Kg6 189. b5 axb5 190.
a6 Ra7 191. Ra1 Be4 192. Kb4 d4 193. Kxb5 c3 194. Kc4 c2 195. Kxd4 Kf5 196.
Rff1 h2 197. Kc3 Rc7+ 198. Kb2 Rc4 199. Kb3 Rc7 200. a7 Rb7+ 201. Kc4 Rc7+ 202.
Kb3 Rc5 203. Rfc1 Rb5+ 204. Kc4 Rb1 205. Rcxb1 cxb1=Q 206. Rxb1 Bxb1 207. a8=Q
Be4 208. Qf8+ Kg4 209. g6 h1=Q 210. g7 Qc1+ 211. Kd4 Bh7 212. g8=Q+ Bxg8 213.
Qxg8+ Kxf4 214. Qf7+ Kg5 215. Qf6+ Kh5 216. Qxe6 Qf4+ 217. Kc5 Qf8+ 218. Kc6
Qd8 219. Kb7 Kg5 220. Qd6 Qh8 221. Kc7 Qh7+ 222. Kd8 Qf5 223. Ke7 Qh7+ 224. Ke8
Qh5+ 225. Kd7 Qf3 226. Ke7 Kh5 227. Qd7 Qe4 228. Qe6 Qc2 229. Kf7 Qf2+ 230. Kg8
Qg3+ 231. Kf8 Qg2 232. Ke8 Qe4 233. Qh3+ Kg6 234. Qe6+ Kg7 235. Qf6+ Kg8 236.
Qf7+ Kh8 237. Kf8 Qa8+ 238. Qe8 Qf3+ 239. Ke7+ Kg7 240. Qd7 Qf8+ 241. Ke6+ Kg8
242. Qd5 Qf4 243. Qd8+ Kh7 244. Qf6 Qc4+ 245. Ke7 Qc7+ 246. Ke8 Qb8+ 247. Kf7
Qf8+ 248. Ke6 Qc8+ 249. Kd5 Qb7+ 250. Kc4 Kg8 251. Kc3 Qe4 252. Kb3 Qd4 253.
Qe6+ Kf8 254. Qd6+ Qxd6 255. exd6 Ke8 256. Kc4 Kd8 257. Kc5 Kd7 258. Kd5 Kd8
259. Ke4 1/2-1/2

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "9"]
[White "Lc0 v0.20.0-rc1 (32316)"]
[Black "Stockfish 241218 64 BMI2"]
[Result "1/2-1/2"]
[ECO "B66"]
[PlyCount "311"]

1. e4 c5 2. Nf3 Nc6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 d6 6. Bg5 e6 7. Qd2 a6 8.
O-O-O h6 9. Nxc6 bxc6 10. Bf4 d5 11. Qe3 Bb4 12. a3 Ba5 13. exd5 cxd5 14. Be2
O-O 15. Qg3 Bxc3 16. Qxc3 Bd7 17. Be5 Ne4 18. Qe3 Bb5 19. Rhe1 Bxe2 20. Rxe2
Rc8 21. f3 Nd6 22. b3 Rc6 23. a4 Nf5 24. Qd3 Qb6 25. Kb1 Rfc8 26. Bb2 h5 27. g3
a5 28. Rc1 Qb4 29. Re5 R6c7 30. Re2 Rc6 31. Rg1 Nd6 32. Qe3 Nf5 33. Qd3 Nd6 34.
Rd1 Rb8 35. Qd4 Qxd4 36. Rxd4 Nf5 37. Rd3 Rbc8 38. g4 hxg4 39. fxg4 Nh4 40.
Rde3 Ra6 41. Rf2 f6 42. g5 f5 43. g6 Re8 44. Rg3 e5 45. Rh3 Nxg6 46. Rxf5 Rf6
47. Rg5 d4 48. Rhg3 Kf7 49. c3 e4 50. cxd4 e3 51. Bc3 e2 52. Bxa5 e1=Q+ 53.
Bxe1 Rxe1+ 54. Kb2 Ne7 55. Rxg7+ Kf8 56. Rg2 Rf4 57. R7g4 Rff1 58. Ka3 Nc6 59.
h4 Ra1+ 60. Ra2 Rad1 61. h5 Rd3 62. h6 Rff3 63. Rb2 Ke7 64. d5 Ne5 65. Rb4 Rh3
66. Rg2 Rxd5 67. Rb7+ Kd6 68. h7 Rd1 69. Kb4 Rdh1 70. Rgg7 Nc6+ 71. Kb5 Rxb3+
72. Ka6 Nb4+ 73. Ka7 Nc6+ 74. Ka6 Rxb7 75. Kxb7 Rb1+ 76. Ka6 Nb4+ 77. Kb6 Nc6+
78. Ka6 Nb4+ 79. Kb5 Nd5+ 80. Kc4 Rb4+ 81. Kd3 Rb3+ 82. Ke2 Rh3 83. a5 Nf4+ 84.
Kf2 Ne6 85. Kg2 Nf4+ 86. Kg1 Ne6 87. Kg2 Rh4 88. Kg3 Rh1 89. Rb7 Nc5 90. Kg2
Rh4 91. Kg3 Rh1 92. Rf7 Ke6 93. Rg7 Na6 94. Kg4 Nc5 95. Kg5 Rg1+ 96. Kh6 Rh1+
97. Kg5 Kd6 98. Kg6 Nd7 99. a6 Nf8+ 100. Kf5 Nxh7 101. a7 Rf1+ 102. Ke4 Nf6+
103. Kd3 Ra1 104. Rg6 Ke6 105. a8=R Rxa8 106. Rg5 Nd7 107. Rg1 Ne5+ 108. Ke2
Kf5 109. Rg7 Ke4 110. Re7 Ra2+ 111. Kd1 Kd3 112. Ke1 Kd4 113. Kd1 Rg2 114. Re8
Ra2 115. Re7 Rg2 116. Re8 Nc4 117. Ke1 Ra2 118. Rd8+ Kc3 119. Rc8 Kd3 120. Rd8+
Ke3 121. Re8+ Kf4 122. Kd1 Ne3+ 123. Kc1 Nd5 124. Re6 Kf5 125. Re8 Rh2 126. Rg8
Ke6 127. Rg3 Nf4 128. Rg8 Kf5 129. Rf8+ Ke4 130. Rd8 Ke5 131. Rd7 Nd5 132. Rd8
Rf2 133. Rh8 Rg2 134. Rh7 Rf2 135. Rd7 Ke4 136. Rd8 Rg2 137. Rd7 Rf2 138. Rd8
Rg2 139. Rd7 Rg1+ 140. Kc2 Kd4 141. Rd8 Rg2+ 142. Kb3 Rg1 143. Kc2 Rh1 144. Kd2
Rh2+ 145. Ke1 Rb2 146. Kf1 Rh2 147. Kg1 Ra2 148. Kf1 Ke4 149. Ke1 Nf4 150. Kd1
Rh2 151. Kc1 Nd3+ 152. Kb1 Rb2+ 153. Ka1 Rc2 154. Rb8 Rc1+ 155. Rb1 Rxb1+ 156.
Kxb1 1/2-1/2

[Event "15m+3s"]
[Site "Owen"]
[Date "2018.12.29"]
[Round "10"]
[White "Stockfish 241218 64 BMI2"]
[Black "Lc0 v0.20.0-rc1 (32316)"]
[Result "1-0"]
[ECO "B69"]
[PlyCount "71"]

1. e4 c5 2. Nf3 Nc6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 d6 6. Bg5 e6 7. Qd2 a6 8.
O-O-O Bd7 9. f4 Be7 10. Nf3 b5 11. Bxf6 gxf6 12. Kb1 b4 13. Ne2 h5 14. Ned4 Qb6
15. Be2 Nxd4 16. Nxd4 a5 17. Bc4 Rc8 18. Qe2 a4 19. b3 a3 20. Rhe1 Qc5 21. f5
Qe5 22. Qd2 Rg8 23. g3 h4 24. gxh4 Bd8 25. fxe6 fxe6 26. Ba6 Rc5 27. Rg1 Rxg1
28. Rxg1 Bc6 29. Bc8 Bxe4 30. Rg8+ Kf7 31. Bxe6+ Ke7 32. Rg7+ Kf8 33. Ra7 Qh5
34. Bf7 Qxf7 35. Ne6+ Ke8 36. Rxf7 1-0

christian...@gmail.com

unread,

Dec 29, 2018, 7:37:36 AM12/29/18

to LCZero

10 games are way to few: The standard deviation of the result for two equally strong opponents after 10 games is about 1.6 points. Thus you can't say that your test concludes that a net with a score 3.5/10 isn't as good as a net with a 5.0/10 score. It gives a hint in that direction. Not more.

Cyrix

Am Mittwoch, 26. Dezember 2018 05:57:42 UTC+1 schrieb Ingo Weidner:

Hi, i was and still am testing many new networks in a gauntlet against Stockfish 10 at a time control of 15min + 3s which IMO is not too short and not too long..

Several other networks were added and removed during the test as they perfomed worse than others which recently also included e.g. ID 32246.

Here are the current standings of testing 7 new networks against Stockfish 10 at 15min +3s time control:

Engine Score St % Elo

1: Stockfish_10_x64_bmi2 29.0/49 ·········· 59.2 3466
2: Lc0 v0.20.0rc2_ID32253 2.5/4 =1== 62.5 +92 = 3558
3: Lc0 v0.20.0rc2_ID32246 2.0/4 ==== 50.0 ±0 = 3466
4: Lc0 v0.20.0rc2_ID32223 3.0/7 ====0== 42.8 -49 = 3417
5: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3396
6: Lc0 v0.20.0rc2_ID32194 2.5/7 0=====0 35.7 -100 = 3366
6: Lc0 v0.20.0rc2_ID32236 2.5/7 0===0== 35.7 -100 = 3366
7: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3358

49 of 70 games played

Level: Blitz 15/3
Hardware: RAM: 16 GB (1GB hash), CPU: mobile i7-7700HQ 4x2.8GHz (3 cores used for engines), GPU: mobile Nvidia Gforce GTX 1050 TI 4GB (768 CUDA Cores), Leela ratio = 0.64
Operating system: Windows 10 Home Home Edition (Build 9200) 64 bit

Soon i will also remove networks from the current test to keep only the best ones. I
D 32194 will be further tested as a reference.

Ingo Weidner

unread,

Dec 29, 2018, 7:45:45 AM12/29/18

to LCZero

@Owen W:

As shown in the table IDs 32280 and 32273 after 10 games had the same score as 32316 (5.0/10 = 50.0%).

I already mentioned that i am again testing ID 11248 as a refeence and also that i add mire games for the 3 best IDs mentioned above.

If the result stays the same i could not do anything about that. It is simply what i got here at my system and i did not "force" any of the networks to get such results.

About SF 10 Elo:

An Elo difference of -108 with the Stockfish 11 development version usually does not give the same real Elo as with Stockfish 10 official release version.

Stockfish 10 4CPU in the CCRL 40/40 list currently has a value of 3463 (i would have to slightly correct my value of 3464 i used for the table...). They mentioned that their 40 min TC corresponds to 15 min on modern CPUs which is teh TC i used here.

Oters are using this CCRl 40/40 value as a reference too.

Ingo Weidner

unread,

Dec 29, 2018, 8:01:30 AM12/29/18

to LCZero

Here are two interesting sources for Elo estimates where the 2nd is also used in the "official" estimated Elo graph:

https://docs.google.com/spreadsheets/d/1QxAG6XVTvvTAGlZ-kpSSvv0VuMGh7RkJjgKqgT37vMU/edit?ts=5bf04640#gid=0

https://docs.google.com/spreadsheets/d/1XSJiCcQpCLv0fNwrUn7jXjdkZFU63YFEWpdXv6dSSg0/edit#gid=868347223

In the first one against SF 10 teh ID 32194 raches an Elo of 3408, in the second one (by MTGOStark) an Elo of 3409 is listed.

In my current table i got a value of 3393 for ID 32194 which is just around 14 to 15 points away.

The list of MTGOStark lists a value of 3423 points for the ID 32250 while in my table i found a value of 3429 points for ID 32246 (that is closest to ID 32250) which is just 7 points different.

Actually if i correct my value due to a current correction in the CCRL 40/40 list i have a value of 3428 points for ID 32246 which is just 6 points away.

Lothar Jung

unread,

Dec 29, 2018, 11:10:52 AM12/29/18

to LCZero

On my PC Ryzen 1800X (3,6 Ghz) 2xGTX1080 Lc0 0.20 Net 32337 won in match (5min/5sec) against SF10 with 8 cores 3:0:7!

Owen W

unread,

Dec 29, 2018, 11:42:15 AM12/29/18

to LCZero

Awesome!

It is loading more messages.

0 new messages

Search

Clear search

Close search

Google apps

Main menu