Re: [theano-dev] Win7 x64 + CUDA + Theano to work

143 views

Skip to first unread message

Frédéric Bastien

unread,

May 10, 2013, 3:42:03 PM5/10/13

to theano-dev

Hi,

g++ isn't mandatory, but highly recommanded. Without it, you won't have any c code, so everything will be slow. This will also disable the GPU code.

Note we do not officialy support GPU on windows, but we help as we can on the mailing list about it. Some people have it working. I would suggest to use the development version of Theano if you indent to try this, as there was fix since the last release about this.

Fred

On Thu, May 9, 2013 at 5:07 PM, Andrzej Gorski <nitro...@gmail.com> wrote:

I'm trying to get my Win7 x64 working with Theano + CUDA GPU support. I followed the tutorial for how to setup CUDA on Windows (installed 64bit drivers, but 32bit Toolkit and SDK).

I have Python 2.7.3 32-bit witn NumPy and SciPy installed. I also use Visual Studio 2012

I can run the 32-bit CUDA samples - for example here is the output of the deviceQuery.exe

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.0\bin\win32\Release>deviceQuery.exe
deviceQuery.exe Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 680"
CUDA Driver Version / Runtime Version 5.0 / 5.0
CUDA Capability Major/Minor version number: 3.0

Total amount of global memory: 2048 MBytes (2147483648 bytes)
( 8) Multiprocessors x (192) CUDA Cores/MP: 1536 CUDA Cores

GPU Clock rate: 1085 MHz (1.08 GHz)
Memory Clock rate: 3004 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes

Max Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048
Total amount of constant memory: 65536 bytes

Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048

Maximum number of threads per block: 1024
Maximum sizes of each dimension of a block: 1024 x 1024 x 64
Maximum sizes of each dimension of a grid: 2147483647 x 65535 x 65535

Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)

Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes

Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): No

Device PCI Bus ID / PCI location ID: 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GTX 680

Testing Theano with GPU per instructions here (http://deeplearning.net/software/theano/tutorial/using_gpu.html#using-gpu) I get:

>>>
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded.

WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.45799994469 seconds

Result is [ 1.23178029 1.61879337 1.52278066 ..., 2.20771813 2.29967761
1.62323284]
Used the cpu

The g++ error I understand as I haven't installed MinGW (but it's not listed anywhere as being mandatory - is it?)

Here is my .theanorc:
[global]
floatX = float32
device = gpu

[nvcc]
flags=-LC:\Python27\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin

fastmath = True

Thoughts?

--

---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Frédéric Bastien

unread,

May 10, 2013, 3:57:05 PM5/10/13

to theano-dev

Also, if you have python 64 bit, you need g++ and 64 bits compatible microsoft and CUDA compiler to be 64 bits. All the stack

without any exception must be abel to work with the same 32 vs 64 architecture.

Microsoft compiler is a little different, it need to generate 64 bit output, but I think the compiler is itself 32 bit.