Hi Sergey. I am new to torch and am trying to ramp up on how to convert an existing model to use FP16. I have a Tegra TX1. I am familiar with the CudaHalfTensor, and I have a small piece of code working as follows:
n = 1000
m = 10000
ht1 = torch.rand(n,m):clone():type('torch.CudaHalfTensor')
ht2 = torch.rand(n,m):clone():type('torch.CudaHalfTensor')
nruns = 1000
start=os.clock()
for i=1,nruns do
ht3 = ht1:cmul(ht2)
end
stoptime=os.clock()
This appears to me to be working with FP16 instructions because it runs faster than CudaTensor which I believe is a float32 type. I cloned your repository, but I can't find any reference to CudaHalfTensor, or hasHalf. I am probably missing something critical here but I can't figure out how the models are using FP16. Can you point me to the area of code where FP16 is enabled? Naively I would have thought that if the input to the network is CudaHalfTensor, then the computations would be carried out has FP16. I can't find anywhere in your code where the input is being changed, so perhaps my postulate is incorrect.
my ultimate goal is to take alexnet.lua and run float32 and fp16 and compare the timings. I found soumith's repo convnet benchmark repo (
https://github.com/soumith/convnet-benchmarks) however it requires x86_64 and won't build on arm64.
thank you!
Chris