INT8 or FP16? Ndidia says that only very new GPUs support full-rate FP16. Better to use INT8

4,621 views
Skip to first unread message

Martin Renneke

unread,
Jul 8, 2018, 5:39:02 PM7/8/18
to LCZero
I think a lot of supporter are using NVIDIA GPUs and a low number have the very new GPUs. Given that I suggest to investigate in INT8 rather than switch to FP16. Refer to the Nvidia comment enclosed. 


Here an info from NVIDIA 2017: 

FP16 is supported but at a low rate. So performance won't be interesting. The driver version you have should be fine. I would recommend using CUDA 8.0.61 (CUDA 8 GA2) which is what is currently publicly available.

The only GPUs with full-rate FP16 performance are Tesla P100, Quadro GP100, and Jetson TX1/TX2.

All GPUs with compute capability 6.1 (e.g. GTX 1050, 1060, 1070, 1080, Pascal Titan X, Titan Xp, Tesla P40, etc.) have low-rate FP16 performance. It's not the fast path on these GPUs. All of these GPUs should support "full rate" INT8 performance, however.
Message has been deleted

Nabil Danial

unread,
Jul 8, 2018, 7:40:36 PM7/8/18
to LCZero
According to this google sheet, most modern Nvidia GPUs has 4x processing speedup with int8, but only Titan V and Tesla V100 has some speedup on fp16, others are getting speed penalty.

Graham Jones

unread,
Jul 9, 2018, 2:31:59 AM7/9/18
to LCZero
The devs know this.  I am not a dev, but I believe that what I said on May 4th (https://groups.google.com/d/msg/lczero/gBwAOMYNDFs/1CuFQWp0BgAJ) still stands, except that int8 is closer now.

A0 used first-generation TPUs to generate self-play games, which can do 92 TeraOps/s (https://arxiv.org/abs/1704.04760). These are 8-bit operations, which are enough for playing, not for training. The GTX 1080 Ti has similar operations, and as far as I can tell (search "GTX 1080 Ti INT8") it can do 44 TeraOps/s, a bit less than half a TPU. A0 used 4 TPUs to play SF, which should be about equivalent to 8 or 9 GTX 1080 Ti's, not 70.

I imagine there's a lot of programming work needed to get the best out of GPUs, and while it hasn't been done yet, I'm sure it will be.

Alexander Lyashuk

unread,
Jul 9, 2018, 3:42:21 AM7/9/18
to Graham Jones, LCZero
When people tried to have weights int8-quantized, playing strength per same node count dropped a lot (~150-200 Elo). That's not the same as int8 computation, but it would be even worse (as int8 would be used on all layers and not just input). Whether it would be compensated by faster computation speed is a question.

For fp16 it seems fine though.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/17e80ef0-672d-49ec-97b4-8b683b973cce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vladimir Prelovac

unread,
Jul 9, 2018, 12:03:27 PM7/9/18
to LCZero
Is it worthwhile considering using INT8 training only for specific scenarios like end-game training?

Jhor Vi

unread,
Jul 10, 2018, 12:19:42 AM7/10/18
to LCZero
What did Google DeepMind do to make 8 bit work?
Message has been deleted

Jhor Vi

unread,
Jul 10, 2018, 9:18:27 PM7/10/18
to LCZero
As I observe the training games it seems that accuracy is not of primary importance so we can benefit from the speed of INT8 for games generation during training. Then it should switch back to 16 or 32bit during actual game play where accuracy is a must. 
Reply all
Reply to author
Forward
0 new messages