I am not using a specific opening book and no specific endgame book.
I am using default Settings of Arena there.
Colors are reversed with each round.
I believe your tests are valuable, and I'like to thank you for all your effort.
Let me explain my reasoning for this.
1) Your hardware is good enough to give quality games.
2) The time control is fine. Not too fast, not to slow. Ideal to give quick, reliable results.
3) You test against Stockfish 10. A reliable and strong opponent. Probably the strongest one.
4) 10 games are few? Sure! What about 50 or 100 or 1000. The more the better. But we can't spent a whole life testing... Even if we see it from a statistical point of view, winning 10 games against Stockfish, in a 10 game match, is anything but luck, provided the opening book is balanced, and of course is better than winning 2 (Of course we aren't at this point...yet). The uncertainty is somewhere in the middle, with nets giving similar results.
5) The way you discard the last two nets and import two newer ones doesn't encompass the danger of letting a strong net out, because of statistical uncertainty.
I'm not an expert in statistical analysis, but the way you think makes sense to me.
Overall a very efficient filtering procedure, even if the net which will finally emerge, won't be THE best, but one of the 2 or 3 best :)
Keep going, all the good testing...
Vas
Hi Ingo;
Attached is a screenshot about how the "gauntlet" tournament of SF 10 vs 6 different Lc0 networks looks like (lc0 currently playing as black):

A table of the current results of the tournament is found at the upper right.
The tables i posted here is a combined one which includes the reults of a previous test with other networks.
--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/f59d6794-f84c-4bfa-a6f0-d4eaf0a8171a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
After 6 games ID 32273 still has not lost a game but also no win yet.ID 32307 leads with a score of 3.5/6 = 58.3% followed by 3 IDs with a score of 50.0% (IDs 32316, 32280 and 32273).Here is the current table after 6 rounds/games with scores sorted by percent (%):Engine Score St % Elo1: Lc0 v0.20.0rc2_ID32307 3.5/6 =1=10= 58.3 +56 = 35202: Stockfish_10_x64_bmi2 60.5/106 ·········· 57.1 3464 (from CCRL 40/40 list)3: Lc0 v0.20.0rc2_ID32316 3.0/6 =1==0= 50.0 ±0 = 34643: Lc0 v0.20.0rc2_ID32280 3.0/6 1===0= 50.0 ±0 = 34643: Lc0 v0.20.0rc2_ID32273 3.0/6 ====== 50.0 ±0 = 34646: Lc0 v0.20.0rc2_ID32246 4.5/10 ====010=01 45.0 -35 = 34296: Lc0 v0.20.0rc2_ID32223 4.5/10 ====0===01 45.0 -35 = 34298: Lc0 v0.20.0rc2_ID32301 2.5/6 ===0== 41.7 -56 = 34089: Lc0 v0.20.0rc2_ID32194 4.0/10 0=====0=01 40.0 -70 = 33949: Lc0 v0.20.0rc2_ID32253 4.0/10 =1=====000 40.0 -70 = 33949: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 33949: Lc0 v0.20.0rc2_ID32236 4.0/10 0===0===== 40.0 -70 = 339413: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 335614: Lc0 v0.20.0rc2_ID32295 2.0/6 ====00 33.3 -123 = 3341
You just mention one case, where Monte Carlo is inefficient, and Leela should revert to AB tree search.
Simple endgames with TBs. Totally agree with you...
Hi Ingo!
With your tests so far, which you believe is the best 3xxxx networks?
Judging not only from the results, but from the quality of games also.
How about testing 32333? Looks too solid on my Laptop.
Thx
ID 11248 is a hard nut to crack and plays clever chess, so it definitely should be included (in every future test I would say, until it is obvious that is inferior to t30)
Hi, i was and still am testing many new networks in a gauntlet against Stockfish 10 at a time control of 15min + 3s which IMO is not too short and not too long..Several other networks were added and removed during the test as they perfomed worse than others which recently also included e.g. ID 32246.Here are the current standings of testing 7 new networks against Stockfish 10 at 15min +3s time control:
Engine Score St % Elo
1: Stockfish_10_x64_bmi2 29.0/49 ·········· 59.2 34662: Lc0 v0.20.0rc2_ID32253 2.5/4 =1== 62.5 +92 = 35583: Lc0 v0.20.0rc2_ID32246 2.0/4 ==== 50.0 ±0 = 34664: Lc0 v0.20.0rc2_ID32223 3.0/7 ====0== 42.8 -49 = 34175: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 33966: Lc0 v0.20.0rc2_ID32194 2.5/7 0=====0 35.7 -100 = 33666: Lc0 v0.20.0rc2_ID32236 2.5/7 0===0== 35.7 -100 = 33667: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 335849 of 70 games played
Level: Blitz 15/3Hardware: RAM: 16 GB (1GB hash), CPU: mobile i7-7700HQ 4x2.8GHz (3 cores used for engines), GPU: mobile Nvidia Gforce GTX 1050 TI 4GB (768 CUDA Cores), Leela ratio = 0.64Operating system: Windows 10 Home Home Edition (Build 9200) 64 bit
Soon i will also remove networks from the current test to keep only the best ones. ID 32194 will be further tested as a reference.