How long will it take for leela to become the strongest chess engine?

Norton Freeman

unread,

Dec 29, 2018, 2:31:06 AM12/29/18

to LCZero

How long do you think it will take for leela to become the strongest chess engine?
1) On average PC,
2) Under TCEC condition,
3) Under CCCC condition.
A) Half a year or less,
B) One year,
C) Two years,
D) Never.
Personally, ABA.

Owen W

unread,

Dec 29, 2018, 2:42:48 AM12/29/18

to LCZero

Depending on games contributed to the network I would suspect around a year or so. Average PC is tough, because a lot people may have fast CPUs compared to their GPU.

OmenhoteppIV

unread,

Dec 29, 2018, 2:44:36 AM12/29/18

to LCZero

Half a year or less

pwa128

unread,

Dec 29, 2018, 3:12:09 AM12/29/18

to LCZero

You may need to specifiy a time control for the average PC. At very fast time controls maybe never (or at least several years), anything else maybe 6 months to a yaer.

But I would put a big caveat on all these. Leela is already the strongest at the openings. Leela will probably become the best overall but in its current form it will never be the best at endgames. The endgame wekaness is related to using Monte Carlo, as is the opening strength to a lesser degree (that being more stroingly infleunced by superiror evaluation).

When the development of the nets are finished for Monte Carlo I would love to see an AlphaBeta version developed, or perhaps one that combines MonteCarlo with AlphaBeta. Then Leela could be the best at every phase of the game and at very fast time controls too.

srikant dash

unread,

Dec 29, 2018, 4:27:08 AM12/29/18

to LCZero

Can we use a simple NN that takes the current position of the board and returns if AB search or MonteCarlo search would be better for the current position? I don't think it would cause a large performance issue.

123

unread,

Dec 29, 2018, 4:32:16 AM12/29/18

to LCZero

Norton Freeman:

LC0 must reach 13000 selfplay elo.

Vassilis

unread,

Dec 29, 2018, 4:41:37 AM12/29/18

to LCZero

Hi to all!

I believe that, when reliable NNs are evolved, and are capable of very accurate positional estimations Leela should drop Monte Carlo altogether, and rely solely on AB. These NNs could be used for candidate moves selection, as well as positional evaluation. The problem with this approach is speed. NNs are heavy, so Leela won't reach high depths. If there were a way to run NNs on GPU (for lighting fast evaluation) and implement AB tree search, so to use multi core CPUs, I guess Leela would be on top.

Vas

CavalierFou

unread,

Dec 29, 2018, 4:49:10 AM12/29/18

to LCZero

Hi all

Could the developers send us RTX 2080Ti?...

It's a joke!

:-)

Regards

CF

Vassilis

unread,

Dec 29, 2018, 5:16:19 AM12/29/18

to LCZero

Why joke? No joke at all!

Please devs. Send us an rtx2080 ti to do some tests...
We will return it to you later. When the tests are over.
You got our word for this :)

CavalierFou

unread,

Dec 29, 2018, 5:24:02 AM12/29/18

to LCZero

+1!

Dave Whipp

unread,

Dec 29, 2018, 6:28:16 AM12/29/18

to CavalierFou, LCZero

Can you clarify why a shift to AB would have any effect on GPU/CPU tradeoffs or concurrency? It seem to me that the difference between AB and UCT is that the former backprops minmax values (and prunes) while the latter uses counts (and biased selection). The computation effort should be approximately the same. We know that Deepmind tried both, and chose UCT because it gave better results with NN eval function. Probably would be good to have both options available so as to confirm that aspect of their results.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/48c15338-174d-4d02-9752-9196e01e6927%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vassilis

unread,

Dec 29, 2018, 6:50:50 AM12/29/18

to LCZero

I shall clarify this more, why not?

Isn't Stockfish one of the best engines available? No doubt, I guess. All tests confirm this.
Stockfish is a typical (pure) AB engine, with a fast move generation, a reliable evaluation function, and many AB extensions.

So if we use NNs as evaluation functions, provided they evaluate better than Stockfish's eval, after sufficient training, and are also faster than Stockfish's eval (because we run them on GPU) we already have a better AB engine. To make things better, let's use the same NNs (or other lighter ones) to generate the candidate moves, for better selection and pruning, and let AB be multithreaded, and already we have an even better AB engine. Does it sound weird?

It might be too difficult to implement, or unrealistic - technical speaking. I don't know.
But sounds quite logical...

Be good :)

123

unread,

Dec 29, 2018, 7:56:34 AM12/29/18

to LCZero

CavalierFou:

Good idea but I don't need it, cause I have bought an RTX 2080 Ti to train LC0:)

Maybe I should buy a second one?

Dave Whipp

unread,

Dec 29, 2018, 8:04:58 AM12/29/18

to Vassilis, LCZero

My gut tells me that there are probably ecosystem effects that enhance stockfish performance: the sum is greater than the parts. That is, each component is optimized to work well with the others; and so replacing one component would reduce the advantage of the others. So simply taking the stockfish AB engine and using an NN eval would, at least initially, be less performant than either Stockfish or Leela. Sure, one could then work on optimizing; but that seems counter to the stated rational.

What we know is that SF uses AB -- presumably because it works for them (I don't know what alternatives they tried); and that Deepmind used UCT because experiments showed it worked better for their NN. There's undoubtedly some interesting computer science in understanding the regimes in which one has advantage over the other. With perfect eval function, it's they converge to the same result. But there's no guarantee of crossover prior to that limit.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/29de4773-ab6e-4ac2-9dbd-d1c2d52e3ab9%40googlegroups.com.

pwa128

unread,

Dec 29, 2018, 8:31:54 AM12/29/18

to LCZero

There are a couple of arguments against this necessarilly being right:

- even on a GPU Leela's eval is much slower than SF's full eval on a CPU

- all modern AB engines use a stripped down version of their full eval to guide the search to speed it up further and only go for their full eval at certain points.

As Houdart has pointed out on a number of occasions search and eval cannot be cleanly separated in modern AB engines. And the Konodo team has said that they have tried to simply pick up parts of SF search methods and drop them into their engine only to find that it works worse, despite knowing that overall that SF's search must work better (the Komodo team believe that thier eval is superor to SFs). So apparently it is not as simple as plug X's eval into Y's search and you get a better result.

It may be that MC really is better with the heavywieght eval of NNs at the start of the game. Or it may be Deepmind just did not write an appropriate search. Indeed writing a sophisticated search sort of breaks the principle of zero knowledge (you could argue that writinig a search at all breaks that principle but that is another matter).

IMHO the only way to find out the truth is try it. I would love to see a decent AB engine running with Leela's eval.

David Bigler

unread,

Dec 29, 2018, 8:36:46 AM12/29/18

to LCZero

It is already in my case on a 2080ti stronger than latest SF dev on AMD Threadripper 1950x.

Rgds

josé luis

unread,

Dec 29, 2018, 8:51:05 AM12/29/18

to LCZero

I thought a year ago by now there would be some public NN engine better than AB engines, and that could be true if Stockfish would developed less this year. Now I believe answer is in months or days with good gpu..CPU only maybe years away. Should be take in consideration that hybrids and NN will be more common..already we have alphazero leela's t10 t30 mark AF, vieri's 40 b , DeusX, Ender, distilled & boosted nets, and at list a couple more..Alphazero may became public or Facebook or other giant may get interested and komodo is experimenting too..I wouldnt bet for AB in 6 months

Vassilis

unread,

Dec 29, 2018, 9:21:04 AM12/29/18

to LCZero

[ ... As Houdart has pointed out on a number of occasions search and eval cannot be cleanly separated in modern AB engines. ]

Very true!

Not only Houdart - one of the best AI programmers actually - but anyone who ever tried to write a (simple) chess engine, (or an Othello engine - in my case) has come to the same conclusion!

[ ... And the Konodo team has said that they have tried to simply pick up parts of SF search methods and drop them into their engine only to find that it works worse, despite knowing that overall that SF's search must work better (the Komodo team believe that their eval is superior to SFs). So apparently it is not as simple as plug X's eval into Y's search and you get a better result. ]

Apparently not!

Sorry if I gave you this impression, I want to be crystal clear about this:

eval function - search algorithm - board representation + other hardware - architecture factors are very tight together!

Why do you think Komodo MCTS crashes often in this monstrous hardware in CCC and TCEC? Arn't Komodo programmers top-class?

These engines are not just simple software. They are life-devoted multithreaded projects.

[ ...It may be that MC really is better with the heavyweight eval of NNs at the start of the game. Or it may be Deepmind just did not write an appropriate search. Indeed writing a sophisticated search sort of breaks the principle of zero knowledge (you could argue that writing a search at all breaks that principle but that is another matter). ]

In my humble opinion MC is way better for training... e.g during exploration/exploitation procedure in reinforcement learning, and also works better in games with huge branching factor like GO.

In simpler games, like chess AB tree-search is (and probably will be) preferable.

[ ...IMHO the only way to find out the truth is try it. I would love to see a decent AB engine running with Leela's eval. ]

Yeah, me too! Just wait for the near future...

Or... Let's write such an engine, now. The two of us. Are you in?

:):) Just kidding. I' too old for this!

Regards...

Vas

pwa128

unread,

Dec 29, 2018, 9:27:25 AM12/29/18

to LCZero

You and me too!

Rudolf Posch

unread,

Dec 29, 2018, 11:58:18 AM12/29/18

to LCZero

AlphaGo and later AlphaZero were first designed for the game of Go. I read somewhere MCT search is better applicable for Go (or the other way round AB search is not so good applicable to Go because of the deep tree and the difficulty to evaluate exactly a Go position).

After that Deepmind used the generic AlphaZero also for Chess and Shogi. The Chess version was so successful with NN and MCT search that Deepmind had no reason to develop and try a NN with AB search. ( ? That's my personal reasoning).

BTW does there even exist already a chess engine with NN and AB search?

Vassilis

unread,

Dec 29, 2018, 12:30:26 PM12/29/18

to LCZero

Hi Rudolf :)

[ ...BTW does there even exist already a chess engine with NN and AB search? ]

Probably no! Not yet...

NNs are very "heavy" to be used as evaluators in every possible leaf node of the AB search-tree. (Especially for DCNNs millions of weights are calculated, to estimate the quality of just one node)

We must find a way to perform these calculations fast. Faster than the ones of a typical AB evaluation function.

Dietrich Kappe

unread,

Dec 29, 2018, 1:05:25 PM12/29/18

to LCZero

To be literal minded, on an average pc (without special gpu card), no nn/mcts engine will be strongest. If the average pc in the future sports a powerful gpu, that may change.

If you want to experiment with a simpler engine, where you can tinker in python, try Leela Lite: https://github.com/dkappe/leela_lite

It shouldn’t be too hard to splice the nn value head into a python ab search’s eval.

Vassilis

unread,

Dec 29, 2018, 1:38:11 PM12/29/18

to LCZero

Wow Dietrich !

Leela Light. I didn't know it even existed :)

Thank you so much my friend... :)

Python is not my preferred programming language, so I have very little experience with it.

Java and C++ on the other hand, are ...

However, the libraries that Python uses, are very convenient indeed, especially when it comes to Deep Learning, and I guess I'll have major difficulties, transcribing the code to any of the aforementioned languages...

Nevertheless I can get some ideas :)

Thanks again...

Vas

graci...@gmail.com

unread,

Dec 29, 2018, 1:47:30 PM12/29/18

to LCZero

Depends on how you define average pc, anything around 1k usd setup including gpu is pretty much average pc. And anything in the high-end today will most likely become mainstream in a few years. The key here is you can buy these stuff already and there is already a working software on it, and are not super top secret special hardware like google's tensor processing units that also needs specialized softwares.

Vassilis

unread,

Dec 29, 2018, 1:59:30 PM12/29/18

to LCZero

Hi Dietrich!

From a quick look into Leela's Light code, I discovered that YOU are the author of this little gem!

Are you also one of the developers, of Lc0 ?

if yes, I have to congrats you, and I wish you keep working to improve this project to the maximum degree.

Respect...

Vas

FWCC1

unread,

Dec 29, 2018, 2:52:25 PM12/29/18

to LCZero

Going with my 6 month theory,we have 3 more months and Leela will RULE SF

NuclearPawn

unread,

Dec 29, 2018, 4:11:44 PM12/29/18

to LCZero

In my opinion Leela is stronger already. However her major downside is the ending phase. Once her endgame gets better she will be the stronger overall

Dietrich Kappe

unread,

Dec 29, 2018, 7:00:33 PM12/29/18

to LCZero

I have made some small contributions to the lczero project, mostly around the non-Zero end of things, but I wouldn’t call myself a core developer.

With Leela Lite, I wanted to make it simpler to tinker around with the net and mcts than the highly optimized lc0 made possible.

Check out the lczero_tools that the nn eval comes from and the various branches and forks of Leela Lite to see what folks are up to.

Dietrich Kappe

unread,

Dec 29, 2018, 7:03:50 PM12/29/18

to LCZero

Let me quote myself from discord:

Leela nets actually play some amazing endgames, like for example imbalanced minor piece endings, where it can drown sf in the deep water. But it has too many gaps, like opposite colored bishop endings.

Veedrac

unread,

Dec 29, 2018, 10:33:23 PM12/29/18

to LCZero

The unannounced new A0 that DeepMind have in-house that has been hinted at is almost certainly stronger than SF or Lc0.

Vassilis

unread,

Dec 30, 2018, 12:34:38 AM12/30/18

to LCZero

First of all DeepMind has to prove their A0 is as efficient as Leela, or more, when running on an ordinary pc and not on their specialized TPU chips. Then we talk! Otherwise the comparison between the two is at least, "unfair".

Veedrac

unread,

Dec 30, 2018, 5:20:01 AM12/30/18

to LCZero

Matrices are matrices. A0's matrix multiplies aren't going to be slower than Leela's just because the numbers in them were trained on different hardware...

Vassilis

unread,

Dec 30, 2018, 5:37:36 AM12/30/18

to LCZero

Absolutely right! Can't argue that!

What I'm saying is that, if we want to compare Leela and A0 as they both stand now,
we have to let them play a match (not train) on a similar hardware!
Probably A0 will win, but don't be surprised if this is not an easy win :) especially after what I've seen in TCEC with the 32194 draw-master.

Veedrac

unread,

Dec 30, 2018, 6:49:03 AM12/30/18

to LCZero

Leela would probably give the A0 we've seen in papers a good match, but the new version we haven't seen is bound to be architecturally superior; if it wasn't they wouldn't have made the change. There's no point overthinking it until DeepMind spill the beans, though.

Francesco Tommaso

unread,

Dec 30, 2018, 7:05:49 AM12/30/18

to LCZero

Right now it is beating Stockfish on TCEC bonus, leading by two points after 25 games. Maybe it is already the strongest.

Of course, is a small sample, but who knows. Also, to answer this question one should consider hardware. But with both on extreme hardware, she is probably the strongest already. Just throw another rtx 2080ti and it will become even clearer. Even with a second gpu it's hardware is less expensive.

Ivan Ivec

unread,

Dec 30, 2018, 11:41:08 AM12/30/18

to LCZero

Just go to 100M games with test30.

Good luck!

vladimir...@gmail.com

unread,

Dec 30, 2018, 12:30:06 PM12/30/18

to LCZero

Not before it get rid of the Monte-Carlo search.

Ingo Weidner

unread,

Dec 30, 2018, 12:53:07 PM12/30/18

to LCZero

At least at my system/hardware currently in 16 games (colors reversed after each game) against Stockfish 10 (release version) with 15min+3s TC the IDs 32316 with a score of 9.5/16 = 59.4% and 32273 with a score of 8.5/16 = 53.1% are already better than the Stockfish 10 release version.

Hardware used: RAM: 16 GB (1GB hash), CPU: mobile i7-7700HQ 4x2.8GHz (3 cores used for engines), GPU: mobile Nvidia Gforce GTX 1050 TI 4GB (768 CUDA Cores), Leela ratio = 0.64

christian...@gmail.com

unread,

Dec 30, 2018, 2:05:32 PM12/30/18

to LCZero

With only 16 games you have such a small sample size that you can't say an opponent is better than the other, if it has not more than 10.5/16 points. (binomial test with p value 0.05 and draw quote of 1/2)

Ingo Weidner

unread,

Dec 30, 2018, 2:24:35 PM12/30/18

to LCZero

Sorry but i think that 16 games at a time control of 15min+3s is enough to get an impression about the strength.

FWIW ID 32316 at that score of 9.5/16 had 6 wins, 7 draws and 3 losses with 4 wins in the last 6 games..

christian...@gmail.com

unread,

Dec 30, 2018, 2:27:47 PM12/30/18

to LCZero

Statistics says ohterwise. You are underestimating the randomness of your results.

NuclearPawn

unread,

Dec 30, 2018, 2:30:08 PM12/30/18

to LCZero

@Christian
It's not like there's only one person running the test to say there's not enough games. A lot of us are running tests in different time controls and hardware, and the pattern is clear. Leela has improved greatly and at least Currently is stronger than SF10

Ingo Weidner

unread,

Dec 30, 2018, 2:51:52 PM12/30/18

to LCZero

This was posted in my own thread (you might also have a look at the overall score of SF 10 against 14 networks...):

Here is the current table of tested networks (tested up to ID 32319 and ID 11248 included) with scores sorted by percent (%):

Engine Score St % Elo

1: Lc0 v0.20.0rc2_ID32316 9.5/16 =1==0=01==1=1101 59.4 +63 = 3526

2: Stockfish_10_x64_bmi2 92.0/164 ················ 56.1 3463 (from CCRL 40/40 list)

3: Lc0 v0.20.0rc2_ID32273 8.5/16 ======0=1=01===1 53.1 +21 = 3484

4: Lc0 v0.20.0rc2_ID32295 4.5/10 ====000=11 45.0 -35 = 3428

4: Lc0 v0.20.0rc2_ID32246 4.5/10 ====010=01 45.0 -35 = 3428

4: Lc0 v0.20.0rc2_ID32223 4.5/10 ====0===01 45.0 -35 = 3428

7: Lc0 v0.19.1.1_ID11248 7.0/16 ==========00==== 43.8 -42 = 3421 (previous average Elo was 3415)

8: Lc0 v0.20.0rc2_ID32280 6.5/16 1===0=======0=00 40.6 -63 = 3400

9: Lc0 v0.20.0rc2_ID32307 4.0/10 =1=10=0=00 40.0 -70 = 3395

9: Lc0 v0.20.0rc2_ID32194 4.0/10 0=====0=01 40.0 -70 = 3393 (average Elo of 3409 in tests done by others)

9: Lc0 v0.20.0rc2_ID32253 4.0/10 =1=====000 40.0 -70 = 3393

9: Lc0 v0.20.0rc2_ID32207 4.0/10 ==0=0===== 40.0 -70 = 3393

9: Lc0 v0.20.0rc2_ID32236 4.0/10 0===0===== 40.0 -70 = 3393

14: Lc0 v0.20.0rc2_ID32301 3.5/10 ===0==0==0 35.0 -108 = 3355

15: Lc0 v0.20.0rc2_ID32195 3.5/10 0==0=0==== 35.0 -108 = 3355

164 of 164 games played

Level: Blitz 15/3

Hardware: RAM: 16 GB (1GB hash), CPU: mobile i7-7700HQ 4x2.8GHz (3 cores used for engines), GPU: mobile Nvidia Gforce GTX 1050 TI 4GB (768 CUDA Cores), Leela ratio = 0.64

Operating system: Windows 10 Home Home Edition (Build 9200) 64 bit

Detailed results of the latest test:

-----------------Lc0 v0.19.1.1_ID11248-----------------

Lc0 v0.19.1.1_ID11248 - Stockfish_10_x64_bmi2 : 7,0/16 0-2-14 (==========00====) 44% -42

-----------------Lc0 v0.20.0rc2_ID32273-----------------

Lc0 v0.20.0rc2_ID32273 - Stockfish_10_x64_bmi2 : 8,5/16 3-2-11 (======0=1=01===1) 53% +21

-----------------Lc0 v0.20.0rc2_ID32280-----------------

Lc0 v0.20.0rc2_ID32280 - Stockfish_10_x64_bmi2 : 6,5/16 1-4-11 (1===0=======0=00) 41% -63

-----------------Lc0 v0.20.0rc2_ID32316-----------------

Lc0 v0.20.0rc2_ID32316 - Stockfish_10_x64_bmi2 : 9,5/16 6-3-7 (=1==0=01==1=1101) 59% +63

-----------------Stockfish_10_x64_bmi2-----------------

Stockfish_10_x64_bmi2 - Lc0 v0.19.1.1_ID11248 : 9,0/16 2-0-14 (==========11====) 56% +42

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32273 : 7,5/16 2-3-11 (======1=0=10===0) 47% -21

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32280 : 9,5/16 4-1-11 (0===1=======1=11) 59% +63

Stockfish_10_x64_bmi2 - Lc0 v0.20.0rc2_ID32316 : 6,5/16 3-6-7 (=0==1=10==0=0010) 41% -63

christian...@gmail.com

unread,

Dec 30, 2018, 3:07:30 PM12/30/18

to LCZero

Then the writer should say "the combined results, including xyz games from testers X, Y and Z, let us suggest that leela is now better than stockfish" or something similiar. 10 or 16 games with this spread between lc0 and sf are way to few. To conclude that an opponent with a score of 60% is better than the other you need 40 games ore more (depending on the p value).

Thus also from the tcec bonus games between sf and lc0 net 32329 we only get something like "they are virtually tied".

Owen W

unread,

Dec 30, 2018, 3:10:37 PM12/30/18

to LCZero

and depending on positions used the results will vary, at least 100 games with various types of structures.

NuclearPawn

unread,

Dec 30, 2018, 3:16:03 PM12/30/18

to LCZero

Currently TCEC is playing all openings between the two and by the end of that tournament we can probably say which is the better engine for sure

Ingo Weidner

unread,

Dec 30, 2018, 3:26:35 PM12/30/18

to LCZero

I simply do not have the time and a second PC with a strong GPU to play 50 or even 100 games at 15min +3 with each new network.

Doing the recent test with 16 games for 4 networks (= 64 games) already took a bit of time.

The current results and/or my table are a kind of "qualification" for further deeper tests in the near future and i will not waste my time letting an obviously weak and/or average network play 100 games at 15min+3 TC (or even longer)..

cyrix

unread,

Dec 30, 2018, 3:28:27 PM12/30/18

to LCZero

No, we can't. The "margin of victory" will be to small.

Testing is like polling in a political landscape. TCEC bonus games are like one poll with 40 voters beeing asked. Maybe 21 are voting for Clinton and only 19 from these 40 will vote for Trump. Does this mean that Clinton surely will win the election?! No, the probability that in this small sample there is a higher proportion of Clinton voters than in the entire population is way too big: Only a small error between the polling and the actual vote of all voters (meaning the strength of the enginies in all possible chess games) leeds to a Trump win (or that the engine, wich "lost" this short match, as the "real better engine")...

pwa128

unread,

Dec 30, 2018, 3:48:28 PM12/30/18

to LCZero

Quite right (assuming the scores remain close). The probability of the winning engine being the stroinger will be just over 50% - i.e. we will really not know from TCEC bonus which is stronger.

However, we will be able to concluide with a fair degree of certainty that they are close to each other in strength.

Ingo Weidner

unread,

Dec 30, 2018, 3:49:09 PM12/30/18

to LCZero

Now that i added the results of ID 11248 to my table (those are new results, not old ones...) in the future i will also use this as a kind of "baseline" (besides the recently best networks)..

Those networks listed below ID 11248 sooner or later will be kicked out from the table.

I have no interest in preparing a huge list of networks but am interested in finding the best one.

As there are always new networks that might be better with the older ones it only makes sense to invest more time and games into those that alraedy gave promiding results even if they "only" played 16 games at a certain time control and against a certain opponent.

David Bernier

unread,

Dec 30, 2018, 3:49:18 PM12/30/18

to lcz...@googlegroups.com

Hi NucleaPawn,

On 12/30/18 2:30 PM, NuclearPawn wrote:
> @Christian
> It's not like there's only one person running the test to say there's not enough games. A lot of us are running tests in different time controls and hardware, and the pattern is clear. Leela has improved greatly and at least Currently is stronger than SF10
>

I rather agree with Christian that it's not full-proof statistical methods.

For example, tests of nets that do poorly (tests "failed"), for the sake
of argument on ID 32316,

might go under-reported. We simply don't know unless we ask.

However, I'm all for gathering together all similar tests, for example
of Leela net ID 32316 vs. SF 10 .

In my calculations in another post, the 68% error bars were wide at e.g.
10 games.

David

John D

unread,

Dec 30, 2018, 4:14:07 PM12/30/18

to LCZero

i haven't understood why this hasn't been attempted since around leela chess's inception. maybe it is difficult, impossible, or silly, but it's such a simple idea that i'd at least think i'd have seen someone either attempt it & fail or explain why it's nonsense by now.

On Saturday, December 29, 2018 at 5:50:50 AM UTC-6, Vassilis wrote:

I shall clarify this more, why not?
Isn't Stockfish one of the best engines available? No doubt, I guess. All tests confirm this.
Stockfish is a typical (pure) AB engine, with a fast move generation, a reliable evaluation function, and many AB extensions.
So if we use NNs as evaluation functions, provided they evaluate better than Stockfish's eval, after sufficient training, and are also faster than Stockfish's eval (because we run them on GPU) we already have a better AB engine. To make things better, let's use the same NNs (or other lighter ones) to generate the candidate moves, for better selection and pruning, and let AB be multithreaded, and already we have an even better AB engine. Does it sound weird?
It might be too difficult to implement, or unrealistic - technical speaking. I don't know.
But sounds quite logical...
Be good :)

Ingo Weidner

unread,

Dec 30, 2018, 4:26:26 PM12/30/18

to LCZero

Another note about my results for IDs 32316 and 32273:

If i find that those are playing very strong against Stockfish 10 at my system but pothers could not confirm that would it mean i have to sto pusing this just because others say it is crap?

I can only give a hint that this might be a good network and others decideon their own if it is worth for them to check this too or not.

Of course i will use the network that is the strongest on my OWN system until there is one that is obviously better.

At the moment i keep those installed in Arena (besides 9 new networks that i am currently testing...): 11248, 32194, 32246, 32273 and 32316.

Dietrich Kappe

unread,

Dec 30, 2018, 4:27:25 PM12/30/18

to LCZero

Hehe, stronged you say?

# PLAYER : RATING POINTS PLAYED (%)
1 crafty : 3057.0 693.5 1006 68.9%
2 ID11258 : 3000.4 42.0 100 42.0%
3 ID32293 : 2961.2 22.0 60 36.7%
4 ID32281 : 2938.8 27.0 80 33.8%
5 ID32327 : 2935.5 14.0 42 33.3%
6 ID32251 : 2928.9 32.5 100 32.5%
7 ID32247 : 2924.9 32.0 100 32.0%
8 ID32344 : 2923.4 7.0 22 31.8%
9 ID32286 : 2922.2 19.0 60 31.7%
10 ID32320 : 2906.5 12.5 42 29.8%
11 ID32300 : 2898.0 11.5 40 28.8%
12 ID32305 : 2898.0 11.5 40 28.8%
13 ID32294 : 2894.4 17.0 60 28.3%
14 ID32194 : 2873.7 26.0 100 26.0%
15 ID32273 : 2856.6 14.5 60 24.2%
16 ID32170 : 2855.0 24.0 100 24.0%

Dietrich Kappe

unread,

Dec 30, 2018, 4:39:48 PM12/30/18

to LCZero

This was tried with giraffe. It used a much smaller nn than leela. Unorunately the nn, while smarter than heuristic evals, was just too slow. It sits around 2400 ccrl.

Keep in mind that mcts as used by a0 and leela has two “evals” — a value head that predicts outcome, and a policy head that suggests which moves are most promising. Just thinking of the value head as the only important part is late reasoning. The policy is at least as important. With Leela Lite i’ve experimented with using random move exploration; policy makes a huge difference.

David Bernier

unread,

Dec 30, 2018, 4:53:15 PM12/30/18

to LCZero

Hello Ingo,

On 12/30/18 12:53 PM, Ingo Weidner wrote:

At least at my system/hardware currently in 16 games (colors reversed after each game) against Stockfish 10 (release version) with 15min+3s TC the IDs 32316 with a score of 9.5/16 = 59.4% and 32273 with a score of 8.5/16 = 53.1% are already better than the Stockfish 10 release version.

Hardware used: RAM: 16 GB (1GB hash), CPU: mobile i7-7700HQ 4x2.8GHz (3 cores used for engines), GPU: mobile Nvidia Gforce GTX 1050 TI 4GB (768 CUDA Cores), Leela ratio = 0.64

Against Net 11248 from Test 10, here a 40-game match with Test 30 net ID 32316 gave:

+19 =6 -15 for a 55% score for net 32316.

Time Control: 5 minutes blitz.

This, together with other data, strongly suggests to me that the best test 30 nets have equalled or surpassed test 10 net 11248.

David Bernier

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/db566e1f-b11b-4b27-aafa-3b681f5e5d58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward