164 views

Skip to first unread message

Jun 25, 2014, 11:40:19 AM6/25/14

to stan-...@googlegroups.com

Hi folks,

As noted in a previous missive to the list, I'm learning about GPs in

hopes that it'll be the tool I need to deal with the common scenario I

encounter where I have a continuous predictor variable whose effect

might not be linear. I think I understand the simple GP whereby the

parameters specifying the covariance matrix creation function, eta,

rho, and sigma, reflect the amplitude, wiggliness, and measurement

noise, respectively. However, I'm having trouble wrapping my head

around what happens if one moves to a hierarchical framework with this

parameterization. That is, let's say there is some "true" data

generating function like (R code):

f = function(x,shift,scale,amp,noise){

z = (x-shift)/scale

y = ((sin(z*2)+sin(z))*dnorm(z,0,pi/2))*amp + rnorm(length(x),0,noise)

return(y)

}

And we have a population of units of observation within each of which

we observe the output of this function across a range of x values, but

each unit has it's own value for the parameters "shift", "scale",

"amp" and "noise", sampled hierarchically like (R code):

K = 100 #number of evaluation points on X

x = seq(0,2*pi*10,length.out=K) #actual sample points on X

N = 30 #number of units of observation

shift_samples = exp(rnorm(N,log(20),.1)) #log-normal with mean of 20

scale_samples = exp(rnorm(N,log(3),.2)) #log-normal with mean of 3

amp_samples = exp(rnorm(N,log(1),.3)) #log-normal with mean of 1

noise_samples = exp(rnorm(N,log(.01),.001)) #log-normal with mean of 1

data = matrix(NA,nrow=N,ncol=length(x))

for(i in 1:N){

data[i,] =

f(x,shift_samples[i],scale_samples[i],amp_samples[i],noise_samples[i])

#plot(x,data[i,],type='l') #uncomment to see each plot as the

data is generated

}

If I were to be ignorant of the underlying form of the data generating

function, I might try using a GP, in which case my first instinct

would be to let the parameters of the GP vary hierarchically (stan

code [might not be optimally efficient]):

data {

int<lower=0> N; //units of observation

int<lower=0> K; //# of observations along x

real x[K]; // points on x at which observations are made

real y[N,K]; //resulting observation matrix

}

parameters {

real<lower=0> eta_sq_pop_mean;

real<lower=0> rho_sq_pop_mean;

real<lower=0> sigma_sq_pop_mean;

real<lower=0> eta_sq_pop_sd;

real<lower=0> rho_sq_pop_sd;

real<lower=0> sigma_sq_pop_sd;

real<lower=0> eta_sq[N];

real<lower=0> rho_sq[N];

real<lower=0> sigma_sq[N];

}

model {

//set priors here, else they're uniform over support

// eta_sq_pop_mean ~ ;

// rho_sq_pop_mean ~ ;

// sigma_sq_pop_mean ~ ;

// eta_sq_pop_sd ~ ;

// rho_sq_pop_sd ~ ;

// sigma_sq_pop_sd ~ ;

//iterate over units of observation

for (n in 1:N){

//define sampling of unit-specific parameters

eta_sq[n] ~ lognormal(eta_sq_pop_mean,eta_sq_pop_sd);

rho_sq[n] ~ lognormal(rho_sq_pop_mean,rho_sq_pop_sd);

sigma_sq[n] ~ lognormal(sigma_sq_pop_mean,sigma_sq_pop_sd);

//create the matrix for this unit

matrix[K,K] Sigma;

// off-diagonal elements

for (i in 1:(K-1)) {

for (j in i:K) {

Sigma[i,j] <- eta_sq[n] * exp(-rho_sq[n] *

pow(x[i] - x[j],2));

Sigma[j,i] <- Sigma[i,j];

}

}

// diagonal elements

for (k in 1:K) {

Sigma[k,k] <- eta_sq[n] + sigma_sq[n]; // + jitter

}

increment_log_prob(multi_normal_log(y[n,],0,Sigma));

}

}

If I were to then generate samples from the posterior along x, would

the function reflecting the mean of the posterior samples at each x be

expected to generally approximate the "population" data generating

function (i.e. "f( x , shift=20 , scale=3 , amp=1 , noise=.01 )" )?

--

Mike Lawrence

Graduate Student

Department of Psychology & Neuroscience

Dalhousie University

~ Certainty is (possibly) folly ~

As noted in a previous missive to the list, I'm learning about GPs in

hopes that it'll be the tool I need to deal with the common scenario I

encounter where I have a continuous predictor variable whose effect

might not be linear. I think I understand the simple GP whereby the

parameters specifying the covariance matrix creation function, eta,

rho, and sigma, reflect the amplitude, wiggliness, and measurement

noise, respectively. However, I'm having trouble wrapping my head

around what happens if one moves to a hierarchical framework with this

parameterization. That is, let's say there is some "true" data

generating function like (R code):

f = function(x,shift,scale,amp,noise){

z = (x-shift)/scale

y = ((sin(z*2)+sin(z))*dnorm(z,0,pi/2))*amp + rnorm(length(x),0,noise)

return(y)

}

And we have a population of units of observation within each of which

we observe the output of this function across a range of x values, but

each unit has it's own value for the parameters "shift", "scale",

"amp" and "noise", sampled hierarchically like (R code):

K = 100 #number of evaluation points on X

x = seq(0,2*pi*10,length.out=K) #actual sample points on X

N = 30 #number of units of observation

shift_samples = exp(rnorm(N,log(20),.1)) #log-normal with mean of 20

scale_samples = exp(rnorm(N,log(3),.2)) #log-normal with mean of 3

amp_samples = exp(rnorm(N,log(1),.3)) #log-normal with mean of 1

noise_samples = exp(rnorm(N,log(.01),.001)) #log-normal with mean of 1

data = matrix(NA,nrow=N,ncol=length(x))

for(i in 1:N){

data[i,] =

f(x,shift_samples[i],scale_samples[i],amp_samples[i],noise_samples[i])

#plot(x,data[i,],type='l') #uncomment to see each plot as the

data is generated

}

If I were to be ignorant of the underlying form of the data generating

function, I might try using a GP, in which case my first instinct

would be to let the parameters of the GP vary hierarchically (stan

code [might not be optimally efficient]):

data {

int<lower=0> N; //units of observation

int<lower=0> K; //# of observations along x

real x[K]; // points on x at which observations are made

real y[N,K]; //resulting observation matrix

}

parameters {

real<lower=0> eta_sq_pop_mean;

real<lower=0> rho_sq_pop_mean;

real<lower=0> sigma_sq_pop_mean;

real<lower=0> eta_sq_pop_sd;

real<lower=0> rho_sq_pop_sd;

real<lower=0> sigma_sq_pop_sd;

real<lower=0> eta_sq[N];

real<lower=0> rho_sq[N];

real<lower=0> sigma_sq[N];

}

model {

//set priors here, else they're uniform over support

// eta_sq_pop_mean ~ ;

// rho_sq_pop_mean ~ ;

// sigma_sq_pop_mean ~ ;

// eta_sq_pop_sd ~ ;

// rho_sq_pop_sd ~ ;

// sigma_sq_pop_sd ~ ;

//iterate over units of observation

for (n in 1:N){

//define sampling of unit-specific parameters

eta_sq[n] ~ lognormal(eta_sq_pop_mean,eta_sq_pop_sd);

rho_sq[n] ~ lognormal(rho_sq_pop_mean,rho_sq_pop_sd);

sigma_sq[n] ~ lognormal(sigma_sq_pop_mean,sigma_sq_pop_sd);

//create the matrix for this unit

matrix[K,K] Sigma;

// off-diagonal elements

for (i in 1:(K-1)) {

for (j in i:K) {

Sigma[i,j] <- eta_sq[n] * exp(-rho_sq[n] *

pow(x[i] - x[j],2));

Sigma[j,i] <- Sigma[i,j];

}

}

// diagonal elements

for (k in 1:K) {

Sigma[k,k] <- eta_sq[n] + sigma_sq[n]; // + jitter

}

increment_log_prob(multi_normal_log(y[n,],0,Sigma));

}

}

If I were to then generate samples from the posterior along x, would

the function reflecting the mean of the posterior samples at each x be

expected to generally approximate the "population" data generating

function (i.e. "f( x , shift=20 , scale=3 , amp=1 , noise=.01 )" )?

--

Mike Lawrence

Graduate Student

Department of Psychology & Neuroscience

Dalhousie University

~ Certainty is (possibly) folly ~

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu