Hi Stanners, I'm using vb() in rstan to estimate a GP model similar to that described in the manual for NT = 2500 observations with P = 2 predictors under varying length scales in a squared exponential covariance formula. The stan script, below, builds an MCMC model for the parameters of the covariance formula. My interest is to extract the latent (de-noised) function, which i sample from the posterior predictive distribution in the generated quantities block. The advi estimate is returned in about 2 hours, but it has not yet completed generating 1000 samples after a day. The long time period is not due to returning samples of the parameters (since I have separately run that and validated such) but due to the monte carlo draws from the posterior predictive for the latent function. Since the generated quantities block draws monte carlo samples, only, i'm assuming there are no gradients to compute. I'd appreciate any tips on how I might re-configure the script in the generated quantities block to speed up the generated monte carlo samples.
/*
y[c] ~ N_{T}(u[c], sigma_y[c]), where c = 1,..,NT are domain-time case. sigma_y[c] are known
// squared exponential covariance function for NT x NT covariance matrix, Sigma
Sigma[i,j] = 1/eta * sum_{c=1}^{NT}sum_{c'=1}^{NT}
sum_{p=1}^{P} exp(-1/rho[p](x[c,p] - x[c',p])^{2})
u[1] ,..., u[NT] ~ * N_{NT}(0, Sigma),
eta ~ Gamma(1,1) - precision parameter of squared exponential covariance function
rho[k] ~ Gamma(1,1) - predictor indexed length-scale parameters
We may marginalize over the u to achieve
y[c] ~ N_{T}(0, Sigma + diag(sigma2_y))
Marginalizing over u adds the known response variances, sigma2_y, to the diagonals
which stabilizes the cholesky computation for Sigma (which is very high-dimensional).
We use the properties of the Gaussian wrapper in which the nonparametric GP is parameterized
to draw u as a post-processing step using sampled values of c(rho,eta)
*/
data{
int<lower=1> NT; // number of domain-months
int<lower=1> P; // number of predictors
vector[NT] y; // vectorized response variable, with T faster than N
matrix[NT,P] x; // Set of NT, 1 x P predictors
vector<lower=0>[NT] sigma2_y; // known domain variances
real<lower=0> jitter; // stabilizing constant to add to diagonals of GP cov, Sigma */
} /* end data block */
transformed data{
vector[NT] zros_NT;
zros_NT <- rep_vector(0,NT); /* vector of zeros for sampling Tx1 fixed effects, beta */
} /* end transformed data block */
parameters{
// parameters for squared exponential covariance function
real<lower=0> eta; /* rational quadratic precision parameter */
real<lower=0> rho[P]; /* length-scale parameters, indexed by predictor, x[,p] */
} /* end parameters block */
transformed parameters{
matrix[NT,NT] Sigma; /* GP covariance matrix for NT x 1, u */
matrix[NT,NT] K; /* GP covariance matrix for u */
matrix[NT,NT] L; /* cholesky_decompose(Sigma) */
{ /* local block */
real covformula_ijp;
real s_ij;
// off-diagonal
for( i in 1:(NT-1) )
{
for( j in (i+1):NT )
{
s_ij <- 0;
for( p in 1:P )
{
covformula_ijp <- inv(rho[p])* pow(x[i,p] - x[j,p],2);
s_ij <- s_ij + covformula_ijp ;
}/* end loop p over dimensions of x to fill cells of Sigma */
s_ij <- exp( -s_ij ) ;
s_ij <- inv(eta) * s_ij;
K[i,j] <- s_ij;
K[j,i] <- s_ij;
} /* end loop j over columns to fill upper triangular cells of Sigma */
} /* end loop i over rows to fill upper triangular cells of Sigma */
// diagonal
for( k in 1:NT )
{
for( p in 1:P )
{
K[k,k] <- inv(eta) + jitter;
} /* end loop p over dimensions of x to fill cells of Sigma */
} /* end loop k over diagonal entries of Sigma */
} /* end local block for building Sigma */
/* add model noise on the diagonal to achieve the marginal distribution */
/* of y */
Sigma <- K + diag_matrix( sigma2_y );
L <- cholesky_decompose( Sigma );
} /* end transformed parameters block */
model{
// priors for mean and covariance cluster location
eta ~ gamma( 1.0, 1.0 );
rho ~ gamma( 1.0, 1.0 );
// observed response likelihood
y ~ multi_normal_cholesky(zros_NT,L);
} /* end model{} block */
generated quantities{
matrix[NT,NT] K_transpose_div_Sigma; /* K'Sigma^-1 */
matrix[NT,NT] Tau; /* posterior predictive covariance matrix for u */
matrix[NT,NT] L_u;
vector[NT] mu_u; /* predictive mean of NT x 1, u */
vector[NT] u; /* GP function drawn from posterior predictive */
/* compute MSE for fit evaluation */
real<lower=0> TSE;
real<lower=0> MSE;
/* posterior predictive quantities */
K_transpose_div_Sigma <- K' / Sigma;
mu_u <- K_transpose_div_Sigma * y;
Tau <- K - K_transpose_div_Sigma * K;
L_u <- cholesky_decompose( Tau );
u <- multi_normal_cholesky_rng(mu_u,L_u);
TSE <- squared_distance(u,y);
MSE <- TSE/(NT);
} /* end generated quantities block */
I also used a variation on the above script for generated quantities that leverages the cholesky decomposition of Sigma that is already computed, but there was no speed improvement.
generated quantities{
matrix[NT,NT] div_L_K; /* L^-1 * K */
matrix[NT,NT] Tau; /* posterior predictive covariance matrix for u */
matrix[NT,NT] L_u; /* cholesky of Tau */
vector[NT] mu_u; /* predictive mean of NT x 1, u */
vector[NT] u; /* GP function drawn from posterior predictive */
/* compute MSE for fit evaluation */
real<lower=0> TSE;
real<lower=0> MSE;
/* posterior predictive quantities */
div_L_K <- mdivide_left_tri_low(L,K);
mu_u <- div_L_K' * mdivide_left_tri_low(L,y);
Tau <- K - div_L_K' * div_L_K;
// for( i in 1:(NT-1))
// {
// for( j in (i+1):NT)
// {
// Tau[i,j] <- Tau[j,i];
// }
// }
L_u <- cholesky_decompose(Tau);
u <- multi_normal_cholesky_rng(mu_u,L_u);
TSE <- squared_distance(u,y);
MSE <- TSE/(NT);
} /* end generated quantities block */
Thanks very much your help and comments.
Terrance
--
Thank you, Terrance Savitsky