It seems I am running out of memory on the GPU (a poor man's GTX 1080) in nnet3-chain-train, but I don't really understand why, from the numbers in the logfile. There should be about 8 GB available, but `nnet3-chain-train` bails out allocating about 320 MB while about 2600 MB has already been allocated, if I interpret the logs correctly. The relevant lines from the log are:
LOG (nnet3-chain-train[5.2.160~2-51042]:IsComputeExclusive():cu-device.cc:263) CUDA setup operating under Compute Exclusive Process Mode.
LOG (nnet3-chain-train[5.2.160~2-51042]:FinalizeActiveGpu():cu-device.cc:225) The active GPU is [0]: GeForce GTX 1080 free:7966M, used:147M, total:8113M, free/total:0.981882 version 6.1
LOG (nnet3-chain-train[5.2.160~2-51042]:PrintMemoryUsage():cu-allocator.cc:127) Memory usage:
4047244608 bytes currently allocated (max: 4543456848); 2321673216 currently in use by user (max:
3029200088); 1721/13448 calls to Malloc* resulted in CUDA calls.
LOG (nnet3-chain-train[5.2.160~2-51042]:PrintMemoryUsage():cu-allocator.cc:134) Time taken in cudaMallocPitch=-1.10451e+17, in cudaMalloc=-1.25219e+16, in cudaFree=8.43331e+11, in this->MallocPitch()=-1.45745e+20
WARNING (nnet3-chain-train[5.2.160~2-51042]:MallocPitchInternal():cu-allocator.cc:97) Allocation of 6968320 x 48 region failed: freeing some memory and trying again.
LOG (nnet3-chain-train[5.2.160~2-51042]:MallocPitchInternal():cu-allocator.cc:102) To avoid future problems like this, changing memory_factor from 1.5 to 1.1
LOG (nnet3-chain-train[5.2.160~2-51042]:PrintMemoryUsage():cu-allocator.cc:127) Memory usage: 3116352832 bytes currently allocated (max: 4543456848); 2321673216 currently in use by user (max:
3029200088); 1721/13448 calls to Malloc* resulted in CUDA calls.
LOG (nnet3-chain-train[5.2.160~2-51042]:PrintMemoryUsage():cu-allocator.cc:134) Time taken in cudaMallocPitch=-1.10591e+17, in cudaMalloc=-1.25219e+16, in cudaFree=-9.3372e+13, in this->MallocPitch()=-1.45745e+20
WARNING (nnet3-chain-train[5.2.160~2-51042]:MallocPitchInternal():cu-allocator.cc:97) Allocation of 6968320 x 48 region failed: freeing some memory and trying again.
LOG (nnet3-chain-train[5.2.160~2-51042]:PrintMemoryUsage():cu-allocator.cc:127) Memory usage: 2701251904 bytes currently allocated (max: 4543456848); 2321673216 currently in use by user (max:
3029200088); 1721/13448 calls to Malloc* resulted in CUDA calls.
LOG (nnet3-chain-train[5.2.160~2-51042]:PrintMemoryUsage():cu-allocator.cc:134) Time taken in cudaMallocPitch=-1.10732e+17, in cudaMalloc=-1.25219e+16, in cudaFree=-1.87587e+14, in this->MallocPitch()=-1.45745e+20
ERROR (nnet3-chain-train[5.2.160~2-51042]:MallocPitchInternal():cu-allocator.cc:114) Cannot allocate the requested memory (6968320 x 48 = 334479360 bytes)
For now, I've reduced the largest number in `--trainer.num-chunk-per-minibatch`, and that seems to help.