OK, I fixed the bug, I just need someone to commit it to Theano github as I don't have access.
I added support for triu/tril for float64 in /pygpu/basic.py
First, I added a version of _generate_kernel specially for float64. Here is the function:
def _generate_kernel64(ctx, cols, upper=True):
tmpl = Template("""
#include "cluda.h"
KERNEL void extract_tri(GLOBAL_MEM ga_double *a, ga_size a_off, ga_uint N) {
a = (GLOBAL_MEM ga_double *)(((GLOBAL_MEM char *)a) + a_off);
unsigned int idx = GID_1 * LDIM_0 * GDIM_0 +
GID_0 * LDIM_0 + LID_0;
unsigned int ix = idx/${cols};
unsigned int iy = idx%${cols};
if (idx < N) {
if (ix ${le} iy)
a[idx] = 0.0;
}
}
""")
if upper:
le = '>'
else:
le = '<'
src = tmpl.substitute(cols=cols, le=le)
spec = [GpuArray, SIZE, 'uint32']
k = GpuKernel(src, "extract_tri", spec, context=ctx)
return k
Then I put in the conditions into both triu and tril, replacing
k = _generate_kernel(A.context, cols, upper)
with
if A.dtype=='float64':
k = _generate_kernel64(A.context, cols, upper)
elif A.dtype=='float32':
k = _generate_kernel(A.context, cols, upper)
else:
raise ValueError("triu only works for float32,float64")
This fixes both slinalg.Cholesky and its gradient for float64.
Can someone please commit this fix?
Paul