Re: [theano-dev] Win7 x64 + CUDA + Theano to work

143 views
Skip to first unread message

Frédéric Bastien

unread,
May 10, 2013, 3:42:03 PM5/10/13
to theano-dev
Hi,

g++ isn't mandatory, but highly recommanded. Without it, you won't have any c code, so everything will be slow. This will also disable the GPU code.

Note we do not officialy support GPU on windows, but we help as we can on the mailing list about it. Some people have it working. I would suggest to use the development version of Theano if you indent to try this, as there was fix since the last release about this.

Fred



On Thu, May 9, 2013 at 5:07 PM, Andrzej Gorski <nitro...@gmail.com> wrote:
I'm trying to get my Win7 x64 working with Theano + CUDA GPU support. I followed the tutorial for how to setup CUDA on Windows (installed 64bit drivers, but 32bit Toolkit and SDK).

I have Python 2.7.3 32-bit witn NumPy and SciPy installed. I also use Visual Studio 2012

I can run the 32-bit CUDA samples - for example here is the output of the deviceQuery.exe
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.0\bin\win32\Release>deviceQuery.exe
deviceQuery.exe Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 680"
  CUDA Driver Version / Runtime Version          5.0 / 5.0
  CUDA Capability Major/Minor version number:    3.0
  Total amount of global memory:                 2048 MBytes (2147483648 bytes)
  ( 8) Multiprocessors x (192) CUDA Cores/MP:    1536 CUDA Cores
  GPU Clock rate:                                1085 MHz (1.08 GHz)
  Memory Clock rate:                             3004 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 524288 bytes
  Max Texture Dimension Size (x,y,z)             1D=(65536), 2D=(65536,65536), 3D=(4096,4096,4096)
  Max Layered Texture Size (dim) x layers        1D=(16384) x 2048, 2D=(16384,16384) x 2048
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Maximum sizes of each dimension of a block:    1024 x 1024 x 64
  Maximum sizes of each dimension of a grid:     2147483647 x 65535 x 65535
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  CUDA Device Driver Mode (TCC or WDDM):         WDDM (Windows Display Driver Model)
  Device supports Unified Addressing (UVA):      No
  Device PCI Bus ID / PCI location ID:           1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 5.0, CUDA Runtime Version = 5.0, NumDevs = 1, Device0 = GeForce GTX 680


Testing Theano with GPU per instructions here (http://deeplearning.net/software/theano/tutorial/using_gpu.html#using-gpu) I get:
>>> 
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded.
WARNING (theano.sandbox.cuda): CUDA is installed, but device gpu is not available 
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.45799994469 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
  1.62323284]
Used the cpu

The g++ error I understand as I haven't installed MinGW (but it's not listed anywhere as being mandatory - is it?)

Here is my .theanorc:
[global]
floatX = float32
device = gpu

[nvcc]
flags=-LC:\Python27\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin
fastmath = True


Thoughts?

--
 
---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Frédéric Bastien

unread,
May 10, 2013, 3:57:05 PM5/10/13
to theano-dev
Also, if you have python 64 bit, you need g++ and 64 bits compatible microsoft and CUDA compiler to be 64 bits. All the stack
 without any exception must be abel to work with the same 32 vs 64 architecture.

Microsoft compiler is a little different, it need to generate 64 bit output, but I think the compiler is itself 32 bit.

Fred
Reply all
Reply to author
Forward
0 new messages