Optiml CUDA help

Robert Wright

unread,

May 1, 2014, 9:49:05 AM5/1/14

to opt...@googlegroups.com

I am having problems trying to get the LogRegCompiler example to work with CUDA. There seems to be a problem compiling the cuda code at runtime. I'm not sure what. CUDA appears to be installed properly. I am able to compile and run all the examples from NVidia. Any help is appreciated. Thanks.

When I try to run the example using CUDA here is what happens:

$ delite LogRegCompiler input.txt test.txt -t 8 --cuda 1

== executing application: LogRegCompiler.deg input.txt test.txt

Delite Runtime executing with the following arguments:

LogRegCompiler.deg,input.txt,test.txt

Delite Runtime executing with: 8 Scala thread(s), 0 Cpp thread(s), 1 Cuda(s), 0 OpenCL(s)

/usr/local/cuda/bin/nvcc -I/home/wrightr/optiml/generatedCache/cuda/src/datastructures -I/home/wrightr/optiml/generatedCache/cpp/src/datastructures -I/home/wrightr/optiml/generatedCache/cuda/src/kernels -I/home/wrightr/optiml/generatedCache/cpp/src/kernels -I/home/wrightr/optiml/generatedCache/cuda/src/runtime -I/home/wrightr/optiml/generatedCache/cpp/src/runtime -I/usr/lib/jvm/java-7-oracle/include -I/usr/lib/jvm/java-7-oracle/include/linux -I/home/wrightr/optiml/runtime/cuda -m64 -w -O3 -lcublas -arch compute_11 -code sm_11 -shared -Xcompiler '-fPIC' -o /home/wrightr/optiml/generatedCache/cuda/bin/runtime/cudaHost.so /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1483.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1827.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1879.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1930x2154x2119.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/While_x2040_8.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppList.cpp /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppRef.cpp /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppDeliteArray.cpp /home/wrightr/optiml/generatedCache/cuda/src/kernels/cudahelperFuncs.cu /home/wrightr/optiml/runtime/cuda/DeliteCuda.cu

--cuda compile args: /usr/local/cuda/bin/nvcc,-I/home/wrightr/optiml/generatedCache/cuda/src/datastructures,-I/home/wrightr/optiml/generatedCache/cpp/src/datastructures,-I/home/wrightr/optiml/generatedCache/cuda/src/kernels,-I/home/wrightr/optiml/generatedCache/cpp/src/kernels,-I/home/wrightr/optiml/generatedCache/cuda/src/runtime,-I/home/wrightr/optiml/generatedCache/cpp/src/runtime,-I/usr/lib/jvm/java-7-oracle/include,-I/usr/lib/jvm/java-7-oracle/include/linux,-I/home/wrightr/optiml/runtime/cuda,-m64,-w,-O3,-lcublas,-arch,compute_11,-code,sm_11,-shared,-Xcompiler,'-fPIC',-o,/home/wrightr/optiml/generatedCache/cuda/bin/runtime/cudaHost.so,/home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1483.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1827.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1879.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1930x2154x2119.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/While_x2040_8.cu,/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu,/home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppList.cpp,/home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppRef.cpp,/home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppDeliteArray.cpp,/home/wrightr/optiml/generatedCache/cuda/src/kernels/cudahelperFuncs.cu,/home/wrightr/optiml/runtime/cuda/DeliteCuda.cu/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(61): error: identifier "recvViewCPPfromJVM_cppRef_double_" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(62): error: identifier "sendCuda_cudaRef_double_" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(84): error: identifier "recvViewCPPfromJVM_cppRef_int_" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(85): error: identifier "sendCuda_cudaRef_int_" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(87): error: identifier "recvViewCPPfromJVM_cppRef_cppDenseVectorDouble__" is undefined

/home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable8.cu(88): error: identifier "sendCuda_cudaRef_cudaDenseVectorDouble__" is undefined

6 errors detected in the compilation of "/tmp/tmpxft_0000469d_00000000-86_Executable8.cpp1.ii".

Exception in thread "main" java.lang.RuntimeException: cuda compilation failed with exit value 2

at scala.sys.package$.error(package.scala:27)

at ppl.delite.runtime.codegen.CCompile$class.checkError(CCompile.scala:129)

at ppl.delite.runtime.codegen.CudaCompile$.checkError(CudaCompile.scala:17)

at ppl.delite.runtime.codegen.CCompile$class.compile(CCompile.scala:87)

at ppl.delite.runtime.codegen.CudaCompile$.compile(CudaCompile.scala:17)

at ppl.delite.runtime.codegen.CCompile$class.compile(CCompile.scala:73)

at ppl.delite.runtime.codegen.CudaCompile$.compile(CudaCompile.scala:17)

at ppl.delite.runtime.codegen.Compilers$.compileSchedule(Compilers.scala:75)

at ppl.delite.runtime.Delite$.embeddedMain(Delite.scala:119)

at ppl.delite.runtime.Delite$.main(Delite.scala:48)

at ppl.delite.runtime.Delite.main(Delite.scala)

error: Delite execution failed

The details of my setup are:

OS = Debian 7.0 x86_64

Java = java-7-oracle update 55

CUDA = 6.0

Optiml = 0.3.3

Message has been deleted

Robert Wright

unread,

May 1, 2014, 2:27:38 PM5/1/14

to opt...@googlegroups.com

I switched to CUDA 4.0 and now I am getting this error:

$ delite Example1Compiler --cuda 1

== executing application: Example1Compiler.deg

Delite Runtime executing with the following arguments:

Example1Compiler.deg

Delite Runtime executing with: 1 Scala thread(s), 0 Cpp thread(s), 1 Cuda(s), 0 OpenCL(s)

/usr/local/cuda/bin/nvcc -I/usr/lib/jvm/java-7-oracle/include -I/usr/lib/jvm/java-7-oracle/include/linux -m64 -w -O3 -lcublas -arch compute_11 -code sm_11 -shared -Xcompiler '-fPIC' -o /home/wrightr/optiml/runtime/cuda/cudaInit.so /home/wrightr/optiml/runtime/cuda/cudaInit.cu

/usr/local/cuda/bin/nvcc -I/home/wrightr/optiml/generatedCache/cuda/src/datastructures -I/home/wrightr/optiml/generatedCache/cpp/src/datastructures -I/home/wrightr/optiml/generatedCache/cuda/src/kernels -I/home/wrightr/optiml/generatedCache/cpp/src/kernels -I/home/wrightr/optiml/generatedCache/cuda/src/runtime -I/home/wrightr/optiml/generatedCache/cpp/src/runtime -I/usr/lib/jvm/java-7-oracle/include -I/usr/lib/jvm/java-7-oracle/include/linux -I/home/wrightr/optiml/runtime/cuda -m64 -w -O3 -lcublas -arch compute_11 -code sm_11 -shared -Xcompiler '-fPIC' -o /home/wrightr/optiml/generatedCache/cuda/bin/runtime/cudaHost.so /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1990.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x1992.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x2013.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x2059.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x2061.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/MultiLoop_GPU_Array_x2082.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/Condition_x1471_1.cu /home/wrightr/optiml/generatedCache/cuda/src/runtime/Executable1.cu /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppList.cpp /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppRef.cpp /home/wrightr/optiml/generatedCache/cpp/src/datastructures/cppDeliteArray.cpp /home/wrightr/optiml/generatedCache/cuda/src/kernels/cudahelperFuncs.cu /home/wrightr/optiml/runtime/cuda/DeliteCuda.cu

Beginning Execution Run 1

FATAL (tempCudaMemInit): Insufficient device memory for tempCudaMem

error: Delite execution failed

HyoukJoong Lee

unread,

May 1, 2014, 2:36:53 PM5/1/14

to opt...@googlegroups.com

Hi.

I think one problem you have with CUDA 4.0 is that you need to change

$DELITE_HOME/config/delite/CUDA.xml to have <arch> 3.0 </arch> if you are using Kepler devices.

And try removing $DELITE_HOME/generatedCache and $DELITE_HOME/runtime/cuda/cudaInit.so after changing the config file.

The first problem (LogReg) you have may be related to the recent commits I made to fix C++ targets.

I'll check and and let you know about the fixes.

Cheers,

HyoukJoong.

--
You received this message because you are subscribed to the Google Groups "OptiML" group.
To unsubscribe from this group and stop receiving emails from it, send an email to optiml+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robert Wright

unread,

May 1, 2014, 2:40:18 PM5/1/14

to opt...@googlegroups.com

Thanks.

I am using an old Quadro FX 580 which is compatible with the 1.1 architecture. I made the appropriate changes to the CUDA.xml and cuBLAS.xml files and I still get that error.

HyoukJoong Lee

unread,

May 1, 2014, 3:20:03 PM5/1/14

to opt...@googlegroups.com

Okay. We have not tested with devices with CUDA 1.1 capability

since we use the features released after that (e.g. synchronizing streams using events).

Do you have another device with CUDA 2.0 or higher to run?

Cheers,

HyoukJoong.

Robert Wright

unread,

May 1, 2014, 3:21:23 PM5/1/14

to opt...@googlegroups.com

I'll look around the office for one. Thanks.

Robert Wright

unread,

May 2, 2014, 2:09:57 PM5/2/14

to opt...@googlegroups.com

I managed to scrounge up a 580 GTX (Cuda 2.0). That fixed things. Example1Compiler and LogRegCompiler now work with CUDA. Thanks again for the help.

Reply all

Reply to author

Forward