# Hierarchical Gaussian Process modelling

164 views

### Mike Lawrence

Jun 25, 2014, 11:40:19 AM6/25/14
Hi folks,

As noted in a previous missive to the list, I'm learning about GPs in
hopes that it'll be the tool I need to deal with the common scenario I
encounter where I have a continuous predictor variable whose effect
might not be linear. I think I understand the simple GP whereby the
parameters specifying the covariance matrix creation function, eta,
rho, and sigma, reflect the amplitude, wiggliness, and measurement
noise, respectively. However, I'm having trouble wrapping my head
around what happens if one moves to a hierarchical framework with this
parameterization. That is, let's say there is some "true" data
generating function like (R code):

f = function(x,shift,scale,amp,noise){
z = (x-shift)/scale
y = ((sin(z*2)+sin(z))*dnorm(z,0,pi/2))*amp + rnorm(length(x),0,noise)
return(y)
}

And we have a population of units of observation within each of which
we observe the output of this function across a range of x values, but
each unit has it's own value for the parameters "shift", "scale",
"amp" and "noise", sampled hierarchically like (R code):

K = 100 #number of evaluation points on X
x = seq(0,2*pi*10,length.out=K) #actual sample points on X
N = 30 #number of units of observation
shift_samples = exp(rnorm(N,log(20),.1)) #log-normal with mean of 20
scale_samples = exp(rnorm(N,log(3),.2)) #log-normal with mean of 3
amp_samples = exp(rnorm(N,log(1),.3)) #log-normal with mean of 1
noise_samples = exp(rnorm(N,log(.01),.001)) #log-normal with mean of 1

data = matrix(NA,nrow=N,ncol=length(x))
for(i in 1:N){
data[i,] =
f(x,shift_samples[i],scale_samples[i],amp_samples[i],noise_samples[i])
#plot(x,data[i,],type='l') #uncomment to see each plot as the
data is generated
}

If I were to be ignorant of the underlying form of the data generating
function, I might try using a GP, in which case my first instinct
would be to let the parameters of the GP vary hierarchically (stan
code [might not be optimally efficient]):

data {
int<lower=0> N; //units of observation
int<lower=0> K; //# of observations along x
real x[K]; // points on x at which observations are made
real y[N,K]; //resulting observation matrix
}
parameters {
real<lower=0> eta_sq_pop_mean;
real<lower=0> rho_sq_pop_mean;
real<lower=0> sigma_sq_pop_mean;
real<lower=0> eta_sq_pop_sd;
real<lower=0> rho_sq_pop_sd;
real<lower=0> sigma_sq_pop_sd;
real<lower=0> eta_sq[N];
real<lower=0> rho_sq[N];
real<lower=0> sigma_sq[N];
}
model {
//set priors here, else they're uniform over support
// eta_sq_pop_mean ~ ;
// rho_sq_pop_mean ~ ;
// sigma_sq_pop_mean ~ ;
// eta_sq_pop_sd ~ ;
// rho_sq_pop_sd ~ ;
// sigma_sq_pop_sd ~ ;

//iterate over units of observation
for (n in 1:N){
//define sampling of unit-specific parameters
eta_sq[n] ~ lognormal(eta_sq_pop_mean,eta_sq_pop_sd);
rho_sq[n] ~ lognormal(rho_sq_pop_mean,rho_sq_pop_sd);
sigma_sq[n] ~ lognormal(sigma_sq_pop_mean,sigma_sq_pop_sd);
//create the matrix for this unit
matrix[K,K] Sigma;
// off-diagonal elements
for (i in 1:(K-1)) {
for (j in i:K) {
Sigma[i,j] <- eta_sq[n] * exp(-rho_sq[n] *
pow(x[i] - x[j],2));
Sigma[j,i] <- Sigma[i,j];
}
}
// diagonal elements
for (k in 1:K) {
Sigma[k,k] <- eta_sq[n] + sigma_sq[n]; // + jitter
}
increment_log_prob(multi_normal_log(y[n,],0,Sigma));
}
}

If I were to then generate samples from the posterior along x, would
the function reflecting the mean of the posterior samples at each x be
expected to generally approximate the "population" data generating
function (i.e. "f( x , shift=20 , scale=3 , amp=1 , noise=.01 )" )?

--
Mike Lawrence