gaussian process in stan

Linas Mockus

unread,

Jun 18, 2014, 9:44:31 AM6/18/14

to stan-...@googlegroups.com

Hi,

Did anybody have any experience how fit and predict using Gaussian Process in Stan where model is:
g(x)=f(x)+h(x)*beta where f(x)~GP(0,k(x,x'))?
in other words with explicit basis functions, 2.39 in C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning

The Stan code is attached however the fit & predictions using 2.41 from Rasmussen are different (and I think better) than the ones from Stan. In the figure below blue line represents predictions from Stan while red line is based on 2.41 from Rasmussen. Black dots are actual measurements while green line shows the basis function. Any comments are appreciated.

Linas

SE.stan

Andrew Gelman

unread,

Jun 18, 2014, 10:14:52 AM6/18/14

to stan-...@googlegroups.com, Michael Betancourt, Aki Vehtari

Michael, Aki:

Any thoughts on this? You two are the GP experts.

Andrew

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<SE.stan>

Linas Mockus

unread,

Jun 18, 2014, 10:15:48 AM6/18/14

to stan-...@googlegroups.com

Sorry, attachment with figure was not shown...

SE.png

Michael Betancourt

unread,

Jun 18, 2014, 1:20:21 PM6/18/14

to stan-...@googlegroups.com

Is the blue one sample or an average?

One immediate issue is that Rassmussen and Williams typically show the conditional mean, E[x_{n} | x_{\setminus n} ],

where as you are sampling from the full joint distribution which would give the marginal distribution.

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

<SE.png>

Linas Mockus

unread,

Jun 18, 2014, 2:05:11 PM6/18/14

to stan-...@googlegroups.com

Blue is 0.5 quantile of predicted (y2 variable in stan code). I tried mean and results are the same.
It is possible that I messed up in the generated quantities but my expectation is that 0.5 quantile should go through the experimental points.

The prediction is defined as:
f.bar.star= E[f |X, y,X* ], where X is training data, y - observed output, X* - test data

Thank you,
Linas
765-494-0344

You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/ga4D703b9HA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Linas Mockus

unread,

Jun 19, 2014, 10:48:19 AM6/19/14

to stan-...@googlegroups.com

Michael,

Can you suggest how to update Stan code to get the Rasmussen's behavior?

Linas

On 6/18/2014 1:19 PM, Michael Betancourt wrote:

You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/ga4D703b9HA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Michael Betancourt

unread,

Jun 19, 2014, 4:38:56 PM6/19/14

to stan-...@googlegroups.com

Again, you’re not quite doing the same thing as R&W because in the generated quantities
block you’re sampling jointly from the GP and not making conditional estimates at each point.
But that shouldn’t have a huge effect such as the bias you’re seeing.

What I don’t get is why you have a scalar mean for the GP — I think what you want is something
like

parameters {
…
vector[N] y;
}

model {
…
y1 ~ multi_normal ( (beta [1] + beta [2] * y1_cfd) * y, Sigma1);
}

I’ve attached a fun GP model that does work for comparison.

gp_agg.stan

Michael Betancourt

unread,

Jun 19, 2014, 5:21:03 PM6/19/14

to stan-...@googlegroups.com

Forget the comment about the mean — I see what you’re doing. I’m not sure what’s
going wrong, but I can say that the comparison isn’t exactly fair because in 2.41 the
hyperparameters (and indeed the basis function) must be constant where as in the
Stan model you’re marginalizing over hyperparameter uncertainty.

Stan’s still giving you something weird, but there’s nothing obviously wrong with
the model. Are you sure that you converged?

> <gp_agg.stan>

>
> On Jun 19, 2014, at 3:48 PM, Linas Mockus <linasm...@gmail.com> wrote:
>
>> Michael,
>>
>> Can you suggest how to update Stan code to get the Rasmussen's behavior?
>>
>> Linas
>>
>> On 6/18/2014 1:19 PM, Michael Betancourt wrote:
>>> Is the blue one sample or an average?
>>>
>>> One immediate issue is that Rassmussen and Williams typically show the conditional mean, E[x_{n} | x_{\setminus n} ],
>>> where as you are sampling from the full joint distribution which would give the marginal distribution.
>>>
>>> On Jun 18, 2014, at 3:15 PM, Linas Mockus <linasm...@gmail.com> wrote:
>>>
>>>> Sorry, attachment with figure was not shown...
>>>>
>>>> On Wednesday, June 18, 2014 9:44:31 AM UTC-4, Linas Mockus wrote:
>>>> Hi,
>>>>
>>>> Did anybody have any experience how fit and predict using Gaussian Process in Stan where model is:
>>>> g(x)=f(x)+h(x)*beta where f(x)~GP(0,k(x,x'))?
>>>> in other words with explicit basis functions, 2.39 in C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning
>>>>
>>>> The Stan code is attached however the fit & predictions using 2.41 from Rasmussen are different (and I think better) than the ones from Stan. In the figure below blue line represents predictions from Stan while red line is based on 2.41 from Rasmussen. Black dots are actual measurements while green line shows the basis function. Any comments are appreciated.
>>>>
>>>>

Linas Mockus

unread,

Jun 19, 2014, 5:29:23 PM6/19/14

to stan-...@googlegroups.com

Michael,

I agree, I didn't do exactly the same thing as R&W since I got different results.

Thanks for the example. In my application, however, the mean is not a scalar - it is a linear combination of basis functions. One of the basis function is nonlinear and essentially is b-spline surrogate of a cfd simulation.

I have used R&W as a starting point but would like to impose different priors on the coefficients beta for those basis functions. I feel that stan is the right way to go but I am still struggling.

I cleaned my code a little. Basically my objective is write stan code which mimics R&W results and then continue exploring how well different priors on beta predict the data. Can you help me with this?

Thank you,
Linas


On Jun 19, 2014, at 3:48 PM, Linas Mockus <linasm...@gmail.com> wrote:

Michael,

Can you suggest how to update Stan code to get the Rasmussen's behavior?
 
Linas

On 6/18/2014 1:19 PM, Michael Betancourt wrote:

Is the blue one sample or an average?

One immediate issue is that Rassmussen and Williams typically show the conditional mean, E[x_{n} | x_{\setminus n} ],
where as you are sampling from the full joint distribution which would give the marginal distribution.

On Jun 18, 2014, at 3:15 PM, Linas Mockus <linasm...@gmail.com> wrote:

Sorry, attachment with figure was not shown...

On Wednesday, June 18, 2014 9:44:31 AM UTC-4, Linas Mockus wrote:
Hi,

Did anybody have any experience how fit and predict using Gaussian Process in Stan where model is:
g(x)=f(x)+h(x)*beta where f(x)~GP(0,k(x,x'))?
in other words with explicit basis functions, 2.39 in C. E. Rasmussen & C. K. I. Williams, Gaussian Processes for Machine Learning

The Stan code is attached however the fit & predictions using 2.41 from Rasmussen are different (and I think better) than the ones from Stan.  In the figure below blue line represents predictions from Stan while red line is based on 2.41 from Rasmussen.  Black dots are actual measurements while green line shows the basis function.  Any comments are appreciated.


Linas

-- 
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
<SE.png>

-- 
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/ga4D703b9HA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

SE.stan

Message has been deleted

Linas Mockus

unread,

Jun 20, 2014, 4:42:52 PM6/20/14

to stan-...@googlegroups.com

Michael, Herra:

Thanks for your input.

Stan converges, see stan1.emf. I have tried relaxing cauchy priors: stan2.5.emf corresponds to cauchy(0, 2.5) on all GP parameters; stan15.emf corresponds to cauchy(0,15) on all GP parameters.

Please note that there are several local optima of the likelihood function. I am using global optimization algorithm - differential evolution.

The latest stan model is also attached.

Thank you,
Linas

On 6/19/2014 6:01 PM, Herra Huu wrote:

For me it seems like Stan is relying more on the basis function h(x)*beta part of the model than the GP f(x). The prior distribution for betas is quite weak, I think much weaker than for the GP parameters. What happens if you increase the scale parameters for cauchy priors? After all, it's only 20 datapoints - priors could have quite big impact on the results?

stan1.emf

stan2.5.emf

stan15.emf

SE.stan

Michael Betancourt

unread,

Jun 20, 2014, 4:51:37 PM6/20/14

to stan-...@googlegroups.com

Linas,

I’ll try to take a look later — but I can only do so much with a model without data. If the data is sensitive

then try to reduce the problem to a simple example where you can emulate the data/inits and then we’d

be able to look into the problem more deeply.

-Mike

You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

<SE.stan><stan1.emf><stan2.5.emf><stan15.emf>

Linas Mockus

unread,

Jun 20, 2014, 6:21:41 PM6/20/14

to stan-...@googlegroups.com

Michael,

Thanks a lot. The r/stan code and data are attached.

Thank you,
Linas

SE.stan

merge3.r

surrogate.r

values_g.bar.star.RData

GPstanfit.r

test_data.csv

train_data.csv

Herra Huu

unread,

Jun 20, 2014, 8:53:24 PM6/20/14

to stan-...@googlegroups.com

I think the problem is in the generated quantities section. In the current code we are just sampling from MV-normal distribution with mean=(beta [1] + beta [2] * y_cfd). But I don't think that is what we actually want. Shouldn't we be sampling from conditional distribution y2|y1?

Following the example in Stan manual (p. 148) this could be done by (results shown in the attached image):

parameters {
    real <lower = 0> sigmaf_sq;
    real <lower = 0> sigman_sq;
    real <lower = 0> l_sq;
    vector [N2] y2;
    vector [2] beta;
}
transformed parameters {
    vector[N] y;
    for (n in 1:N1) y[n] <- y1[n];
    for (n in 1:N2) y[N1 + n] <- y2[n];
}
model {
    matrix [N, N] Sigma;

    for (i in 1 : N)
        for (j in 1 : N)
            Sigma [i, j] <- sigmaf_sq * exp (-0.5 * pow (x [i] - x [j], 2) / l_sq)
                + if_else (i == j, sigman_sq, 0.0);

    sigman_sq ~ cauchy (0, 5);
    sigmaf_sq ~ cauchy (0, 5);
    l_sq ~ cauchy (0, 5);
    beta ~ normal (0, 100);
    y ~ multi_normal (beta [1] + beta [2] * y_cfd, Sigma);
}

SE_modified.png

Linas Mockus

unread,

Jun 20, 2014, 9:56:03 PM6/20/14

to stan-...@googlegroups.com

Thanks, I felt that I messed up somewhere and stan is the right tool.

Linas

Michael Betancourt

unread,

Jun 21, 2014, 4:02:25 AM6/21/14

to stan-...@googlegroups.com

Right — this is what I was trying to get across when I said "One immediate issue is that Rassmussen

and Williams typically show the conditional mean, E[x_{n} | x_{\setminus n} ], where as you are sampling

from the full joint distribution which would give the marginal distribution.”

Thanks for digging further and fixing it Herra!

You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Linas Mockus

unread,

Jul 3, 2014, 12:55:36 PM7/3/14

to stan-...@googlegroups.com

Michael, Herra:

Thanks for all your help. It did work.

Here is a more challenging example. In this case Stan (in blue, see attached png) gives somewhat different results than Rasmussen (in red). The basis function is in green. Just to provide some background - basis function is computer simulated data, black dots are experimental data. In this case I don't think that Rasmussen GP is giving good fit because of kink at 0.45. I don't think Stan does good job either because between 0.5 and 1 it doesn't go through experimental data - rather it follows the shape of basis function. This might be explained by multiple modes.

Anyway, I would be very happy to hear any comments how to remove kink and make Rasmussen's GP and Stan results closer.

The R code, stan code, test and training data, and Rasmussen's GP results (.RData) are attached. I am using 2.3.0 but the same results were in 2.2.0.

Thanks,
Linas

GPStan.png

values_g.bar.star.RData

train_data.csv

test_data.csv

GPStan.r

SE.stan

Reply all

Reply to author

Forward