Optiml CUDA help

26 views
Skip to first unread message

Robert Wright

unread,
May 1, 2014, 9:49:05 AM5/1/14
to opt...@googlegroups.com
I am having problems trying to get the LogRegCompiler example to work with CUDA.  There seems to be a problem compiling the cuda code at runtime.  I'm not sure what.  CUDA appears to be installed properly.  I am able to compile and run all the examples from NVidia.  Any help is appreciated.  Thanks.

When I try to run the example using CUDA here is what happens:

$ delite LogRegCompiler input.txt test.txt -t 8 --cuda 1
== executing application: LogRegCompiler.deg input.txt test.txt
Delite Runtime executing with the following arguments:
LogRegCompiler.deg,input.txt,test.txt
Delite Runtime executing with: 8 Scala thread(s), 0 Cpp thread(s), 1 Cuda(s), 0 OpenCL(s)
/usr/local/cuda/bin/nvcc -I/home/wrightr/optiml/generatedCache/cuda/src/datastructures -I/home/wrightr/optiml/generatedCache/cpp/src/datastructures -I/home/wrightr/optiml/generatedCache/cuda/src/kernels -I/home/wrightr/optiml/generatedCache/cpp/src/kernels -I/home/wrightr/optiml/generatedCache/cuda/src/runtime -I/home/wrightr/optiml/generatedCache/cpp/src/runtime -I/usr/lib/jvm/java-7-oracle/include -I/usr/lib/jvm/java-7-oracle/include/linux -I/home/wrightr/optiml/runtime/cuda -m64 -w -O3 -lcublas -arch compute_11 -code sm_11 -shared -Xcompiler '-fPIC' -o /home/wrightr/optiml/generatedCache/cuda/bin/runtime/cudaHost.so /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1483.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1827.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1879.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1930x2154x2119.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/While_x2040_8.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppList.cpp /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppRef.cpp /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppDeliteArray.cpp /home/wrightr/optiml/generatedCache/cuda/src/kernels/cudahelperFuncs.cu /home/wrightr/optiml/runtime/cuda/DeliteCuda.cu
--cuda compile args: /usr/local/cuda/bin/nvcc,-I/home/wrightr/optiml/generatedCache/cuda/src/datastructures,-I/home/wrightr/optiml/generatedCache/cpp/src/datastructures,-I/home/wrightr/optiml/generatedCache/cuda/src/kernels,-I/home/wrightr/optiml/generatedCache/cpp/src/kernels,-I/home/wrightr/optiml/generatedCache/cuda/src/runtime,-I/home/wrightr/optiml/generatedCache/cpp/src/runtime,-I/usr/lib/jvm/java-7-oracle/include,-I/usr/lib/jvm/java-7-oracle/include/linux,-I/home/wrightr/optiml/runtime/cuda,-m64,-w,-O3,-lcublas,-arch,compute_11,-code,sm_11,-shared,-Xcompiler,'-fPIC',-o,/home/wrightr/optiml/generatedCache/cuda/bin/runtime/cudaHost.so,/home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1483.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1827.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1879.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1930x2154x2119.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/While_x2040_8.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu,/home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppList.cpp,/home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppRef.cpp,/home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppDeliteArray.cpp,/home/wrightr/optiml/generatedCache/cuda/src/kernels/cudahelperFuncs.cu,/home/wrightr/optiml/runtime/cuda/DeliteCuda.cu/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(61): error: identifier "recvViewCPPfromJVM_cppRef_double_" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(62): error: identifier "sendCuda_cudaRef_double_" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(84): error: identifier "recvViewCPPfromJVM_cppRef_int_" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(85): error: identifier "sendCuda_cudaRef_int_" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(87): error: identifier "recvViewCPPfromJVM_cppRef_cppDenseVectorDouble__" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(88): error: identifier "sendCuda_cudaRef_cudaDenseVectorDouble__" is undefined

6 errors detected in the compilation of "/tmp/tmpxft_0000469d_00000000-86_Executable8.cpp1.ii".


Exception in thread "main" java.lang.RuntimeException: cuda compilation failed with exit value 2
at scala.sys.package$.error(package.scala:27)
at ppl.delite.runtime.codegen.CCompile$class.checkError(CCompile.scala:129)
at ppl.delite.runtime.codegen.CudaCompile$.checkError(CudaCompile.scala:17)
at ppl.delite.runtime.codegen.CCompile$class.compile(CCompile.scala:87)
at ppl.delite.runtime.codegen.CudaCompile$.compile(CudaCompile.scala:17)
at ppl.delite.runtime.codegen.CCompile$class.compile(CCompile.scala:73)
at ppl.delite.runtime.codegen.CudaCompile$.compile(CudaCompile.scala:17)
at ppl.delite.runtime.codegen.Compilers$.compileSchedule(Compilers.scala:75)
at ppl.delite.runtime.Delite$.embeddedMain(Delite.scala:119)
at ppl.delite.runtime.Delite$.main(Delite.scala:48)
at ppl.delite.runtime.Delite.main(Delite.scala)
error: Delite execution failed



The details of my setup are:
OS = Debian 7.0 x86_64
Java = java-7-oracle update 55
CUDA = 6.0
Optiml = 0.3.3


Message has been deleted

Robert Wright

unread,
May 1, 2014, 2:27:38 PM5/1/14
to opt...@googlegroups.com
I switched to CUDA 4.0 and now I am getting this error:

$ delite Example1Compiler --cuda 1
== executing application: Example1Compiler.deg 
Delite Runtime executing with the following arguments:
Example1Compiler.deg
Delite Runtime executing with: 1 Scala thread(s), 0 Cpp thread(s), 1 Cuda(s), 0 OpenCL(s)
/usr/local/cuda/bin/nvcc -I/usr/lib/jvm/java-7-oracle/include -I/usr/lib/jvm/java-7-oracle/include/linux -m64 -w -O3 -lcublas -arch compute_11 -code sm_11 -shared -Xcompiler '-fPIC' -o /home/wrightr/optiml/runtime/cuda/cudaInit.so /home/wrightr/optiml/runtime/cuda/cudaInit.cu
/usr/local/cuda/bin/nvcc -I/home/wrightr/optiml/generatedCache/cuda/src/datastructures -I/home/wrightr/optiml/generatedCache/cpp/src/datastructures -I/home/wrightr/optiml/generatedCache/cuda/src/kernels -I/home/wrightr/optiml/generatedCache/cpp/src/kernels -I/home/wrightr/optiml/generatedCache/cuda/src/runtime -I/home/wrightr/optiml/generatedCache/cpp/src/runtime -I/usr/lib/jvm/java-7-oracle/include -I/usr/lib/jvm/java-7-oracle/include/linux -I/home/wrightr/optiml/runtime/cuda -m64 -w -O3 -lcublas -arch compute_11 -code sm_11 -shared -Xcompiler '-fPIC' -o /home/wrightr/optiml/generatedCache/cuda/bin/runtime/cudaHost.so /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1990.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1992.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x2013.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x2059.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x2061.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x2082.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/Condition_x1471_1.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable1.cu /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppList.cpp /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppRef.cpp /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppDeliteArray.cpp /home/wrightr/optiml/generatedCache/cuda/src/kernels/cudahelperFuncs.cu /home/wrightr/optiml/runtime/cuda/DeliteCuda.cu
Beginning Execution Run 1
FATAL (tempCudaMemInit): Insufficient device memory for tempCudaMem
error: Delite execution failed

HyoukJoong Lee

unread,
May 1, 2014, 2:36:53 PM5/1/14
to opt...@googlegroups.com
Hi.

I think one problem you have with CUDA 4.0 is that you need to change 
$DELITE_HOME/config/delite/CUDA.xml to have <arch> 3.0 </arch> if you are using Kepler devices.
And try removing $DELITE_HOME/generatedCache and $DELITE_HOME/runtime/cuda/cudaInit.so after changing the config file.

The first problem (LogReg) you have may be related to the recent commits I made to fix C++ targets.
I'll check and and let you know about the fixes.

Cheers,
HyoukJoong.


--
You received this message because you are subscribed to the Google Groups "OptiML" group.
To unsubscribe from this group and stop receiving emails from it, send an email to optiml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Wright

unread,
May 1, 2014, 2:40:18 PM5/1/14
to opt...@googlegroups.com
Thanks. 

I am using an old Quadro FX 580 which is compatible with the 1.1 architecture.  I made the appropriate changes to the CUDA.xml and cuBLAS.xml files and I still get that error. 

HyoukJoong Lee

unread,
May 1, 2014, 3:20:03 PM5/1/14
to opt...@googlegroups.com
Okay. We have not tested with devices with CUDA 1.1 capability
since we use the features released after that (e.g. synchronizing streams using events).
Do you have another device with CUDA 2.0 or higher to run?

Cheers,
HyoukJoong.

Robert Wright

unread,
May 1, 2014, 3:21:23 PM5/1/14
to opt...@googlegroups.com
I'll look around the office for one.  Thanks.

Robert Wright

unread,
May 2, 2014, 2:09:57 PM5/2/14
to opt...@googlegroups.com
I managed to scrounge up a 580 GTX (Cuda 2.0).  That fixed things.  Example1Compiler and LogRegCompiler now work with CUDA.  Thanks again for the help.
Reply all
Reply to author
Forward
0 new messages