Hi folks,
I came up with an idea to speed up sampling on Gaussian Processes when X is sampled on a grid, in which case the pairwise distance matrix has high redundancy (nX unique distances repeated over (nX^2-nX)/2 cells of the matrix for the 1D X case). By pre-computing the unique pairwise distances and their indices in the full pairwise distance matrix, you can compute unique values for the subsequent covariance matrix then fill the full covariance matrix using the aforementioned indices.
This ends up being about 2x faster than a standard GP (also faster than a GP where the pairwise distances are pre-computed, but uniqueness isn't considered) when the covariance matrix is computed by hand.
Now, I will note that computing on the unique distances then filling the covariance matrix by hand *isn't* much faster than using the in-built cov_exp_quad() function. Well, I have observed a 30% speedup over cov_exp_quad() in some situations, but that's not much to rave about. But where I have no idea what magic is happening in cov_exp_quad() to make it so fast, I wonder if there might be room to achieve further speedups (in, say, a new cov_exp_quad_grid() function that takes in the unique distances & indices thereof) by combining my ideas with whatever sourcery cov_exp_quad() is doing internally.
Thoughts?
Oh, and here's gist demonstrating the unique-distances approach with comparison to cov_exp_quad() and the non-unique-but-pre-computed-differences approaches:
Mike
--
Mike Lawrence
Graduate Student
Department of Psychology & Neuroscience
Dalhousie University
~ Certainty is (possibly) folly ~