Engine performance

jtsbad

unread,

May 3, 2018, 8:06:36 PM5/3/18

to LCZero

The Alpha Zero research paper states that the engine was evaluating 80,000 positions per second. It looks like a GTX1080Ti will evaluate about 1100 positions per second using the latest Leela engine. Does this mean that even if the Leela NN becomes as strong as Alpha Zeros, one would need a GPU capacity of about 70x GTX1080Ti to match Alpha Zeros performance in it's match against Stockfish?

alvaro...@gmail.com

unread,

May 3, 2018, 8:42:14 PM5/3/18

to LCZero

I believe the 1080Ti will get you more than 1100 positions per second (I get a bit more than that on a GTX980). But yes, a consumer-grade card won't get you there. Also, A0's network was larger than what LCZero has now. With careful usage of cuDNN, a Titan V (~$3,000) or a V100 (~$8,500) should be much faster, if you manage to use their tensor processors. My [uneducated] guess is that it would take something like four V100 cards to match the hardware that Alpha Zero used.

You may want o wait three or four years until this kind of hardware is affordable. :)

Graham Jones

unread,

May 4, 2018, 3:21:54 PM5/4/18

to LCZero

A0 used first-generation TPUs to generate self-play games, which can do 92 TeraOps/s (https://arxiv.org/abs/1704.04760). These are 8-bit operations, which are enough for playing, not for training. The GTX 1080 Ti has similar operations, and as far as I can tell (search "GTX 1080 Ti INT8") it can do 44 TeraOps/s, a bit less than half a TPU. A0 used 4 TPUs to play SF, which should be about equivalent to 8 or 9 GTX 1080 Ti's, not 70.

I imagine there's a lot of programming work needed to get the best out of GPUs, and while it hasn't been done yet, I'm sure it will be.

FWCC1

unread,

May 4, 2018, 9:08:56 PM5/4/18

to LCZero

She's gained 100 elo in 3 days that's good

FWCC1

unread,

May 4, 2018, 9:20:06 PM5/4/18

to LCZero

Maybe she needs to go to 100 million games to equal A0 on consumer hardware ie Gtx 1080i

Jesse Jordache

unread,

May 4, 2018, 11:09:32 PM5/4/18

to LCZero

I think the training stops when the policy network is filled and isn't getting any smarter (this can be objectively measured).

After that (or even before that) you can pile on the elo just by running her on a faster card. The wiki's FAQ spreadsheets has a ton of information on that. Just in the past couple of days Leela has gotten to SF 9 strength if stockfish only sees 100x more nodes than Leela does. Which is nice, but the number needs to be at least 1000 for rough equality

https://docs.google.com/spreadsheets/d/1zcXqNzLNBT8RjTHO_AppL6WN0j8TGmOIh6osLPmaB6E/edit#gid=0

I like this spreadsheet, because it also tells you how close to done the current network is.

magicmulder

unread,

May 5, 2018, 4:32:36 AM5/5/18

to LCZero

I don't think it's feasible to measure chess performance based on nps. Even within conventional programs, these differ wildly (Rybka may hit 400K nps where Fritz hits 30M, or some amateur program hits 90M).

In fact, I would not be surprised if the final version outperforms every other chess program in the world with just 1K nps.

Galaga

unread,

May 5, 2018, 6:48:51 AM5/5/18

to LCZero

The tuning tool reports 1,1TFlops on GTX1080Ti, I don't know how this relates to the units given above.

In more practical dimensions: client produces nearly 1 game per minute, that is 76 years for 40 million games.

Actually the clients uses only 35% of GPU as reported by the taskmanager.

(Starting another client doesn't help - GPU usage goes up to 50%, but it slows down the first client.)

The game itself uses 50% of the GPU. Maybe here is some room for improvement.

And a ready-to-use program using CUDA could probably speed up things.

Jesse Jordache

unread,

May 5, 2018, 7:59:44 AM5/5/18

to LCZero

You can measure chess performance of the same version of Leela by nps. There's a margin of error, but like everything it gets smaller with more trials.

Graham Jones

unread,

May 5, 2018, 8:28:51 AM5/5/18

to LCZero

The specs of the GTX1080Ti say it can do maximum of 11TFLOPS. I don't know why only 1.1 is achieved (though I think your result is typical). You never get the maximum in an actual application, but I think it could be a lot more than 10%. Neural net calculations are GPU-friendly. The 8-bit operations work 4x as fast as the normal 32-bit ones.

I got slightly better games/minute running two clients on my 1070. In Task Manager you should look at the Compute_0 graph (click little down arrow near 3D or Video decode) not the summary on the left.

Albert Silver

unread,

May 5, 2018, 9:16:35 AM5/5/18

to LCZero

GTX1060 here says around 700GFLOPS (0.7 TFLOPS) in the tuning tool.

Reply all

Reply to author

Forward