Hello all,
You've been very helpful in previous posts, and I'm much further along now than I was previously. I've secured help from a speech recognition consultant to assist in getting Kaldi running. At this point, I'm still having trouble with my GPU. I have a Tesla S1070, which has four C1060's, using the T10 processor, which has compute capability of 1.3 (I'm hoping that's not too old). I have installed CUDA toolkit 6.5 with the 340.xx drivers, which are the latest that support this GPU. I was told to use CUDA 6.5, which is also the latest that supports this GPU. I've re-compiled kaldi once this version of CUDA and drivers was installed.
So here's the problem. I had previously succeeded in running run.sh in the tedlium recipe without a problem (but with a different version of toolkit and drivers installed), but now I'm getting this error when running run_nnet2_ms_perturbed.sh (from exp/nnet2_online/nnet_ms_sp/log/train.0.2.log):
nnet-copy-egs --frame=0 ark:exp/nnet2_online/nnet_ms_sp/egs/egs.2.ark ark:-
ERROR (nnet-train-simple:CopyRows():cu-matrix.cc:2138) cudaError_t 8 : "invalid device function " returned from 'cudaGetLastError()'
ERROR (nnet-train-simple:AddDiagMatMat():cu-vector.cc:580) cudaError_t 8 : "invalid device function " returned from 'cudaGetLastError()'
This happens after it acts like it's able to successfully create one model: LOG (nnet-train-transitions:main():nnet-train-transitions.cc:140) Trained transitions of neural network model and wrote it to exp/nnet2_online/nnet_ms_sp/0.mdl
So I ran 'make test' under src/cudamatrix
Running cu-vector-test .../bin/sh: line 1: 27150 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-vector-test
Running cu-matrix-test .../bin/sh: line 1: 27161 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-matrix-test
Running cu-math-test .../bin/sh: line 1: 27171 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-math-test
Running cu-test .../bin/sh: line 1: 27180 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-test
Running cu-sp-matrix-test .../bin/sh: line 1: 27189 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-sp-matrix-test
Running cu-packed-matrix-test .../bin/sh: line 1: 27198 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-packed-matrix-test
Running cu-tp-matrix-test .../bin/sh: line 1: 27207 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-tp-matrix-test
Running cu-block-matrix-test .../bin/sh: line 1: 27216 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-block-matrix-test
Running cu-matrix-speed-test .../bin/sh: line 1: 27225 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-matrix-speed-test
Running cu-vector-speed-test .../bin/sh: line 1: 27235 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-vector-speed-test
Running cu-sp-matrix-speed-test .../bin/sh: line 1: 27244 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-sp-matrix-speed-test
Running cu-array-test .../bin/sh: line 1: 27253 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-array-test
Running cu-sparse-matrix-test .../bin/sh: line 1: 27262 Aborted (core dumped) ./$x > $x.testlog 2>&1
... FAIL cu-sparse-matrix-test
Running cu-device-test ...... SUCCESS
make: *** [test] Error 1
So I'm not quite sure what's wrong, but I'm worried that CUDA 6.5 doesn't fully support my GPUs, as they are older. I was told kaldi needed 6.5, but the makefiles seem to indicate older toolkits will work. Could I run 5.5 or 6 with kaldi? They appear to more fully support my GPUs, which are plenty powerful for the task at hand if I can get them to work. I'm afraid if I had to run run_nnet2_ms_perturbed.sh in CPU-only mode, it would take too long (even with quad socket, quad core 2.9 GHz CPUs and 64 GB RAM).
I had read this post on the old forums:
http://sourceforge.net/p/kaldi/discussion/1355348/thread/c9991f50/And tried what was recommended in there.
What would you recommend I do or try?
Thanks!
Rhiannon