It is nice to hear that Theano will be further improved.
By the way, I have created highly GPU-paralleled routines having to do with solving
symmetric positive-definite matrix equations,
and auto-regressive systems (Levinson). It also has gradients. .They are coded in C and compiled with CUDA, then linked into Theano using
the Magma interface. (basically, I coded in C, compiled using CUDA, linked them to libmagma.so, added the headers to magma.h
and some interface code to skcuda/magma.py. I also added Python class definitions in gpuarry/linalg.py and corresponding C interface
code in gpuarray/c_code ). I think it is very useful and fast code and fixes some deficiencies in Theano: In Theano there is no fast and
fully parallel GPU implementation for solving and/or getting the determinant of symmetric positive-definite systems.
An example would be to compute the probability distribution of a multi-variate Gaussian distribution - given its symmetric
positive-definite covariance matrix - on a batch of data - very slow in Theano using scan, and not GPU parallel.
Perhaps there is some way to merge them intoTheano, but I'd need help in that. I would suggest
adding them as a separate library, similar to magma. I could provide the code and the interface code.