R[j,j] = sqrt(T[j,j])
for i in range(j-1,-1,-1):
s = 0
for k in range(i+1,j):
s = s + R[i,k]*R[k,j]
R[i,j] = (T[i,j] - s)/(R[i,i] + R[j,j])
for k in range(i+1,j):
s = s + R[i,k]*R[k,j]
with
s = np.dot(R[i,(i+1):j],R[(i+1):j,j])
Run time decreases from 367 to 15.6 seconds. My guess is that you could get considerable further speedup, but I'm pleased with the 15.6 seconds. If you copy the sqrtm function from scipy and make that change I think that you'll see considerable improvement.