gaussian process in stan

1,060 views
Skip to first unread message

Linas Mockus

unread,
Jun 18, 2014, 9:44:31 AM6/18/14
to stan-...@googlegroups.com
Hi,

Did anybody have any experience how fit and predict using Gaussian Process in Stan where model is:
g(x)=f(x)+h(x)*beta where f(x)~GP(0,k(x,x'))?
in other words with explicit basis functions, 2.39 in C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning

The Stan code is attached however the fit & predictions using 2.41 from Rasmussen are different (and I think better) than the ones from Stan.  In the figure below blue line represents predictions from Stan while red line is based on 2.41 from Rasmussen.  Black dots are actual measurements while green line shows the basis function.  Any comments are appreciated.


Linas
SE.stan

Andrew Gelman

unread,
Jun 18, 2014, 10:14:52 AM6/18/14
to stan-...@googlegroups.com, Michael Betancourt, Aki Vehtari
Michael, Aki:
Any thoughts on this?  You two are the GP experts.
Andrew

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<SE.stan>

Linas Mockus

unread,
Jun 18, 2014, 10:15:48 AM6/18/14
to stan-...@googlegroups.com
Sorry, attachment with figure was not shown...
SE.png

Michael Betancourt

unread,
Jun 18, 2014, 1:20:21 PM6/18/14
to stan-...@googlegroups.com
Is the blue one sample or an average?

One immediate issue is that Rassmussen and Williams typically show the conditional mean, E[x_{n} | x_{\setminus n} ],
where as you are sampling from the full joint distribution which would give the marginal distribution.

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<SE.png>

Linas Mockus

unread,
Jun 18, 2014, 2:05:11 PM6/18/14
to stan-...@googlegroups.com
Blue is 0.5 quantile of predicted (y2 variable in stan code).  I tried mean and results are the same.
It is possible that I messed up in the generated quantities but my expectation is that 0.5 quantile should go through the experimental points.

The prediction is defined as:
f.bar.star= E[f |X, y,X* ], where X is training data, y - observed output, X* - test data
Thank you,
Linas
765-494-0344 
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/ga4D703b9HA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Linas Mockus

unread,
Jun 19, 2014, 10:48:19 AM6/19/14
to stan-...@googlegroups.com
Michael,

Can you suggest how to update Stan code to get the Rasmussen's behavior?
Linas
On 6/18/2014 1:19 PM, Michael Betancourt wrote:
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/ga4D703b9HA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Michael Betancourt

unread,
Jun 19, 2014, 4:38:56 PM6/19/14
to stan-...@googlegroups.com
Again, you’re not quite doing the same thing as R&W because in the generated quantities
block you’re sampling jointly from the GP and not making conditional estimates at each point.
But that shouldn’t have a huge effect such as the bias you’re seeing.

What I don’t get is why you have a scalar mean for the GP — I think what you want is something
like

parameters {

vector[N] y;
}

model {

y1 ~ multi_normal ( (beta [1] + beta [2] * y1_cfd) * y, Sigma1);
}

I’ve attached a fun GP model that does work for comparison.

gp_agg.stan

Michael Betancourt

unread,
Jun 19, 2014, 5:21:03 PM6/19/14
to stan-...@googlegroups.com
Forget the comment about the mean — I see what you’re doing. I’m not sure what’s
going wrong, but I can say that the comparison isn’t exactly fair because in 2.41 the
hyperparameters (and indeed the basis function) must be constant where as in the
Stan model you’re marginalizing over hyperparameter uncertainty.

Stan’s still giving you something weird, but there’s nothing obviously wrong with
the model. Are you sure that you converged?
> <gp_agg.stan>
>
> On Jun 19, 2014, at 3:48 PM, Linas Mockus <linasm...@gmail.com> wrote:
>
>> Michael,
>>
>> Can you suggest how to update Stan code to get the Rasmussen's behavior?
>>
>> Linas
>>
>> On 6/18/2014 1:19 PM, Michael Betancourt wrote:
>>> Is the blue one sample or an average?
>>>
>>> One immediate issue is that Rassmussen and Williams typically show the conditional mean, E[x_{n} | x_{\setminus n} ],
>>> where as you are sampling from the full joint distribution which would give the marginal distribution.
>>>
>>> On Jun 18, 2014, at 3:15 PM, Linas Mockus <linasm...@gmail.com> wrote:
>>>
>>>> Sorry, attachment with figure was not shown...
>>>>
>>>> On Wednesday, June 18, 2014 9:44:31 AM UTC-4, Linas Mockus wrote:
>>>> Hi,
>>>>
>>>> Did anybody have any experience how fit and predict using Gaussian Process in Stan where model is:
>>>> g(x)=f(x)+h(x)*beta where f(x)~GP(0,k(x,x'))?
>>>> in other words with explicit basis functions, 2.39 in C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning
>>>>
>>>> The Stan code is attached however the fit & predictions using 2.41 from Rasmussen are different (and I think better) than the ones from Stan. In the figure below blue line represents predictions from Stan while red line is based on 2.41 from Rasmussen. Black dots are actual measurements while green line shows the basis function. Any comments are appreciated.
>>>>
>>>>

Linas Mockus

unread,
Jun 19, 2014, 5:29:23 PM6/19/14
to stan-...@googlegroups.com
Michael,

I agree, I didn't do exactly the same thing as R&W since I got different results. 

Thanks for the example.  In my application, however, the mean is not a scalar - it is a linear combination of basis functions.  One of the basis function is nonlinear and essentially is b-spline surrogate of a cfd simulation.

I have used R&W as a starting point but would like to impose different priors on the coefficients beta for those basis functions.  I feel that stan is the right way to go but I am still struggling.

I cleaned my code a little.  Basically my objective is write stan code which mimics R&W results and then continue exploring how well different priors on beta predict the data.  Can you help me with this?

Thank you,
Linas

On Jun 19, 2014, at 3:48 PM, Linas Mockus <linasm...@gmail.com> wrote:

Michael,

Can you suggest how to update Stan code to get the Rasmussen's behavior?
 
Linas

On 6/18/2014 1:19 PM, Michael Betancourt wrote:
Is the blue one sample or an average?

One immediate issue is that Rassmussen and Williams typically show the conditional mean, E[x_{n} | x_{\setminus n} ],
where as you are sampling from the full joint distribution which would give the marginal distribution.

On Jun 18, 2014, at 3:15 PM, Linas Mockus <linasm...@gmail.com> wrote:

Sorry, attachment with figure was not shown...

On Wednesday, June 18, 2014 9:44:31 AM UTC-4, Linas Mockus wrote:
Hi,

Did anybody have any experience how fit and predict using Gaussian Process in Stan where model is:
g(x)=f(x)+h(x)*beta where f(x)~GP(0,k(x,x'))?
in other words with explicit basis functions, 2.39 in C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning

The Stan code is attached however the fit & predictions using 2.41 from Rasmussen are different (and I think better) than the ones from Stan.  In the figure below blue line represents predictions from Stan while red line is based on 2.41 from Rasmussen.  Black dots are actual measurements while green line shows the basis function.  Any comments are appreciated.


Linas

-- 
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<SE.png>
-- 
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/ga4D703b9HA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
SE.stan
Message has been deleted

Linas Mockus

unread,
Jun 20, 2014, 4:42:52 PM6/20/14
to stan-...@googlegroups.com
Michael, Herra:

Thanks for your input.

Stan converges, see stan1.emf.  I have tried relaxing cauchy priors: stan2.5.emf corresponds to cauchy(0, 2.5) on all GP parameters; stan15.emf corresponds to cauchy(0,15) on all GP parameters.

Please note that there are several local optima of the likelihood function.  I am using global optimization algorithm - differential evolution. 

The latest stan model is also attached.
Thank you,
Linas
On 6/19/2014 6:01 PM, Herra Huu wrote:
For me it seems like Stan is relying more on the basis function h(x)*beta part of the model than the GP f(x). The prior distribution for betas is quite weak, I think much weaker than for the GP parameters. What happens if you increase the scale parameters for cauchy priors? After all, it's only 20 datapoints - priors could have quite big impact on the results?
stan1.emf
stan2.5.emf
stan15.emf
SE.stan

Michael Betancourt

unread,
Jun 20, 2014, 4:51:37 PM6/20/14
to stan-...@googlegroups.com
Linas,

I’ll try to take a look later — but I can only do so much with a model without data.  If the data is sensitive
then try to reduce the problem to a simple example where you can emulate the data/inits and then we’d
be able to look into the problem more deeply.

-Mike

You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
<SE.stan><stan1.emf><stan2.5.emf><stan15.emf>

Linas Mockus

unread,
Jun 20, 2014, 6:21:41 PM6/20/14
to stan-...@googlegroups.com
Michael,

Thanks a lot.  The r/stan code and data are attached.
Thank you,
Linas
SE.stan
merge3.r
surrogate.r
values_g.bar.star.RData
GPstanfit.r
test_data.csv
train_data.csv

Herra Huu

unread,
Jun 20, 2014, 8:53:24 PM6/20/14
to stan-...@googlegroups.com
I think the problem is in the generated quantities section. In the current code we are just sampling from MV-normal distribution with mean=(beta [1] + beta [2] * y_cfd). But I don't think that is what we actually want. Shouldn't we be sampling from conditional distribution y2|y1?

Following the example in Stan manual (p. 148) this could be done by (results shown in the attached image):

parameters {
    real <lower = 0> sigmaf_sq;
    real <lower = 0> sigman_sq;
    real <lower = 0> l_sq;
    vector [N2] y2;
    vector [2] beta;
}
transformed parameters {   
    vector[N] y;
    for (n in 1:N1) y[n] <- y1[n];
    for (n in 1:N2) y[N1 + n] <- y2[n];
}
model {
    matrix [N, N] Sigma; 
 
    for (i in 1 : N)
        for (j in 1 : N)
            Sigma [i, j] <- sigmaf_sq * exp (-0.5 * pow (x [i] - x [j], 2) / l_sq)
                + if_else (i == j, sigman_sq, 0.0);
 
    sigman_sq ~ cauchy (0, 5);
    sigmaf_sq ~ cauchy (0, 5);
    l_sq ~ cauchy (0, 5);
    beta ~ normal (0, 100);
    y ~ multi_normal (beta [1] + beta [2] * y_cfd, Sigma);
}
SE_modified.png

Linas Mockus

unread,
Jun 20, 2014, 9:56:03 PM6/20/14
to stan-...@googlegroups.com
Thanks, I felt that I messed up somewhere and stan is the right tool.

Linas

Michael Betancourt

unread,
Jun 21, 2014, 4:02:25 AM6/21/14
to stan-...@googlegroups.com
Right — this is what I was trying to get across when I said "One immediate issue is that Rassmussen 
and Williams typically show the conditional mean, E[x_{n} | x_{\setminus n} ], where as you are sampling 
from the full joint distribution which would give the marginal distribution.”

Thanks for digging further and fixing it Herra!

You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Linas Mockus

unread,
Jul 3, 2014, 12:55:36 PM7/3/14
to stan-...@googlegroups.com
Michael, Herra:

Thanks for all your help.  It did work.

Here is a more challenging example.  In this case Stan (in blue, see attached png) gives somewhat different results than Rasmussen (in red).  The basis function is in green.  Just to provide some background - basis function is computer simulated data, black dots are experimental data.  In this case I don't think that Rasmussen GP is giving good fit because of kink at 0.45.  I don't think Stan does good job either because between 0.5 and 1 it doesn't go through experimental data - rather it follows the shape of basis function.  This might be explained by multiple modes.

Anyway, I would be very happy to hear any comments how to remove kink and make Rasmussen's GP and Stan results closer.

The R code, stan code, test and training data, and Rasmussen's GP results (.RData) are attached. I am using 2.3.0 but the same results were in 2.2.0.

Thanks,
Linas
GPStan.png
values_g.bar.star.RData
train_data.csv
test_data.csv
GPStan.r
SE.stan
Reply all
Reply to author
Forward
0 new messages