Windows support for cuda_convnet

164 views
Skip to first unread message

Ian Goodfellow

unread,
Apr 8, 2013, 9:19:49 AM4/8/13
to thean...@googlegroups.com, pylearn-dev
Bogdan Budescu has added support for 64 bit Windows to Alex Krizhevsky's cuda
convnet library in the pylearn2 github repository. Unfortunately, the way it's set up
right now, some of the compiler arguments are hardcoded for Bogdan's system. In
particular, they break compilation on linux.

What is the right way to handle this kind of configuration?

Ian Goodfellow

unread,
Apr 17, 2013, 5:11:25 PM4/17/13
to thean...@googlegroups.com, pylearn-dev
Fred is on vacation. Can someone else please give some sort of input on this?
Does theano have any generic system for solving this kind of compilation problem?
If not, is there any sort of design pattern that has been followed that we should aim to be
consistent with?

Olivier Delalleau

unread,
Apr 17, 2013, 5:50:44 PM4/17/13
to thean...@googlegroups.com, pylearn-dev
I'm not sure how the nvcc compilation is handled right now in Theano, but I think a good approach would be to abstract out the compiler (with a Compiler class and subclasses appropriate for various configs). It could come in handy as well if we want to support MS compiler on Windows.

That'd probably be a non-negligible amount of code refactoring though.

-=- Olivier
--
 
---
You received this message because you are subscribed to the Google Groups "theano-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to theano-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

David Warde-Farley

unread,
Apr 17, 2013, 6:00:17 PM4/17/13
to thean...@googlegroups.com
This is kind of what's done in numpy.distutils, I think Theano does something similar with NvccCompiler? could we reuse that?

Pascal Lamblin

unread,
Apr 17, 2013, 6:30:50 PM4/17/13
to thean...@googlegroups.com
On Wed, Apr 17, 2013, Olivier Delalleau wrote:
> I'm not sure how the nvcc compilation is handled right now in Theano,
> but I think a good approach would be to abstract out the compiler
> (with a Compiler class and subclasses appropriate for various
> configs). It could come in handy as well if we want to support MS
> compiler on Windows.

We already have that sort of thing: a GCC_Compiler class, that handles
compilation of C++ code, and an NVCC_Compiler that handles compilation
of CUDA Code.

Here, the problem is how to specify compilation options for some Ops.
The interface of Ops allows to specify, separately, different things:
include directories, library directories, libraries to link with, other arguments.
There are also a Theano config option that can be used to pass arbitrary
options a given compiler (here, config.nvcc.flags).

A first solution would be for people using Windows, who want to use GPU
ops needing pthreads (for the moment, cuda_convnet) to specify:
nvcc.flags="-I 'c:\\path\\to\\pthreads' -L 'c:\\path\\to\\pthreads' -l pthreadlibname"

We do not have a generic way of dealing with that in Theano for the
moment. The closest thing we have is for blas, where we specify all
the arguments in config.blas.ldflags. These arguments are then parsed
by a function that sorts them between the 4 supported categories
(header_dirs, lib_dirs, libraries, compile_args), and that is used by
Ops using BLAS.

A second solution would be to add a new config parameter, say
config.pthreads.ldflags, and either put it in compile_args, or refactor
and reuse the machinery of blas ld flags.

If we want to avoid the complexity of parsing these flags, and be
more robust to different compilers using different ways of passing
flags (for instance, VC uses "/I" instead of "-I", and so on), a third
solution would be to add 4 options: config.pthreads.header_dirs,
config.pthreads.lib_dirs, config.pthreads.libraries, and
config.pthreads.compile_args. We could also do that for blas afterwards.

--
Pascal

Olivier Delalleau

unread,
Apr 17, 2013, 6:39:15 PM4/17/13
to thean...@googlegroups.com
Ok I see, thanks. Currently, how does an Op advertise itself as "needing BLAS"?

-=- Olivier

Pascal Lamblin

unread,
Apr 17, 2013, 6:58:36 PM4/17/13
to thean...@googlegroups.com
On Wed, Apr 17, 2013, Olivier Delalleau wrote:
> Ok I see, thanks. Currently, how does an Op advertise itself as "needing BLAS"?

There's no "advertising", Ops that need BLAS define the following (or
inherit from an Op defining the following):

def c_libraries(self):
return ldflags()

def c_compile_args(self):
return ldflags(libs=False, flags=True)

def c_lib_dirs(self):
return ldflags(libs=False, libs_dir=True)

def c_header_dirs(self):
return ldflags(libs=False, include_dir=True)


--
Pascal

Olivier Delalleau

unread,
Apr 17, 2013, 9:44:42 PM4/17/13
to thean...@googlegroups.com
2013/4/17 Pascal Lamblin <lamb...@iro.umontreal.ca>
Ok, so that would be by inheritance. Inheritance may not be the best choice for this kind of stuff though, since if we happen to need multiple libraries (pthread today, what else tomorrow?) we won't want to support all combinations. Maybe a Mixin approach would work (not sure - I haven't thought it through), or (which I like better I think) we could define "Library" objects that are used to modify compilation arguments, and the Op would declare its list of library dependencies. Each Library could have subclasses associated to various compilers / platforms / versions, and we would use the one corresponding to the system's config (at least - the closest one). In addition, the library could add config variables (like blas.ldflags) so that the user can override the default behavior whenever needed.

Just throwing ideas out there, feel free to take / reject them at will :)

Btw, if I understand correctly, lib.ldflags is a bad name since it's not just linking flags, right?

-=- Olivier

Pascal Lamblin

unread,
Apr 18, 2013, 10:30:32 AM4/18/13
to thean...@googlegroups.com
On Wed, Apr 17, 2013, Olivier Delalleau wrote:
> 2013/4/17 Pascal Lamblin <lamb...@iro.umontreal.ca>
>
> > On Wed, Apr 17, 2013, Olivier Delalleau wrote:
> > > Ok I see, thanks. Currently, how does an Op advertise itself as "needing
> > BLAS"?
> >
> > There's no "advertising", Ops that need BLAS define the following (or
> > inherit from an Op defining the following):
> >
> > def c_libraries(self):
> > return ldflags()
> >
> > def c_compile_args(self):
> > return ldflags(libs=False, flags=True)
> >
> > def c_lib_dirs(self):
> > return ldflags(libs=False, libs_dir=True)
> >
> > def c_header_dirs(self):
> > return ldflags(libs=False, include_dir=True)
> >
>
> Ok, so that would be by inheritance.

For the moment, it is done by inheritance for all Ops that inherit from
GemmRelated, and manually for the other ones.

> Inheritance may not be the best choice
> for this kind of stuff though, since if we happen to need multiple
> libraries (pthread today, what else tomorrow?) we won't want to support all
> combinations. Maybe a Mixin approach would work (not sure - I haven't
> thought it through),

Isn't Mixin only a way of organizing multiple inheritance?

> or (which I like better I think) we could define
> "Library" objects that are used to modify compilation arguments, and the Op
> would declare its list of library dependencies. Each Library could have
> subclasses associated to various compilers / platforms / versions, and we
> would use the one corresponding to the system's config (at least - the
> closest one). In addition, the library could add config variables (like
> blas.ldflags) so that the user can override the default behavior whenever
> needed.

At that point, it looks kind of overkill to me. If we end up regularly
using more external libraries, and combinations of them, it looks like
the right approach.

> Btw, if I understand correctly, lib.ldflags is a bad name since it's not
> just linking flags, right?

Right, it used to be only '-l...' flags, especially since include
directories could be handled by the CPATH environment variable, and lib
dirs could be handled by LIBRARY_PATH at the lab level.
However, that was not suited for all BLAS versions we needed to support,
and the name stayed.

--
Pascal

bbud...@gmail.com

unread,
Apr 23, 2013, 5:34:38 AM4/23/13
to pylea...@googlegroups.com, thean...@googlegroups.com
I added another commit that conditions additional dependencies/search paths on the execution platform. Now it shouldn't break the linux build any more, but it still uses hardcoded paths.

Bogdan

bbud...@gmail.com

unread,
Apr 29, 2013, 9:55:55 AM4/29/13
to thean...@googlegroups.com, pylea...@googlegroups.com
Hi, 

I just realized that I accidentally posted in the thread with the same name from pylearn-dev. There's more in the comments of the PR. Any thoughts on what would be the best way to solve this?

Thanks,

Bogdan

On Tuesday, April 23, 2013 7:50:22 PM UTC+3, Frédéric Bastien wrote:
Hi,

I checked the PR and made a few small comments.

The only thing left is how to specify those functions. I think we should use only 1 theano flag to use the same interface as blas.ldflags. I would call it pthread.flags. We could reuse part of the function theano/tensor/blas.py:ldflags() for that.

But I don't think this is crucial, I think we should accept any PR that do 1 or more theano flags to specify those parameter.

Bogdan, can you do that? For an example of how to define a Theano flags, in the file blas.py, grep for AddConfigVar(), it will give you an example.

thanks

Fred


On Sunday, April 28, 2013 1:33:03 AM UTC+3, bbud...@gmail.com wrote:
Hi Fred, 

I added the code for the extra theano param to pr #225, along with some modifications to accommodate the use of matrix.cpp, as you proposed. Unfortunately it seems that there's more trouble with this extra cpp file than just longer compilation time. With this version of the code, the compilation fails on windows because it can't link to the MKL dlls. Unlike mingw, msvc needs import libs, which are not supplied along whit EPD.

Another performance issue is that, if MKL is available and matrix.cpp is compiled with USE_MKL, it uses MKL specific functions to further accelerate execution. These specific functions are defined in mkl.h, mkl_vsl.h and mkl_vml.h, which are also not delivered by EPD.

If the only thing for which the use of matrix.h and .cpp is required is the definition of the log and sqrt functions, then perhaps it would be easier to just copy them in nvmatrix.cuh and nvmatrix.cu (or just keep the explicit casts).

The dependency to pthreads in cuda-convnet can also be removed by using different API calls depending on the platform.

Thanks,

Bogdan 
Reply all
Reply to author
Forward
0 new messages