Trouble with Matern covariance functions

John Reid

unread,

Oct 31, 2014, 3:33:37 PM10/31/14

to stan-...@googlegroups.com

Hi,

I'm running into all sorts of numerical issues with covariance matrices in Stan. In general a squared exponential covariance function works more or less with some warning messages just as Stan warms up. Noise on the diagonal always seems to help as does having fewer dimensions. I'd like to try Matern covariance functions as I don't really believe in the smoothness of the squared exponential for my data. However I just can't get them to work at all. For 40 or so dimensions I always get not positive definite error messages whatever I do. Maybe I'm doing something wrong but the code below seems to be a minimal example. When COVFN is set to 1 or 2 nothing seems to work. I'm taking the Matern covariance functions from Williams and Rasmussen's book http://www.gaussianprocess.org/gpml/ :

library(rstan)  
stan.code <- '                                                                                                                                                                                                     
data {                                                                                                                                                                                                             
    int<lower=1> T;  # Number of time points                                                                                                                                                                       
    real noise;  # Noise to add to covariance                                                                                                                                                                      
    int<lower=1,upper=3> COVFN;  # Choose covariance function                                                                                                                                                      
}                                                                                                                                                                                                                  
parameters {                                                                                                                                                                                                       
    vector[T] tau;  # Pseudotime                                                                                                                                                                                   
}                                                                                                                                                                                                                  
transformed parameters {                                                                                                                                                                                           
    cov_matrix[T] Sigma;  # Covariance matrix over pseudotime                                                                                                                                                      
    for (t1 in 1:T) {                                                                                                                                                                                              
        for (t2 in 1:t1) {                                                                                                                                                                                         
            real d;                                                                                                                                                                                                
            d <- tau[t1] - tau[t2];                                                                                                                                                                                
            if (1 == COVFN) {                                                                                                                                                                                      
                # Matern32 covariance                                                                                                                                                                              
                real x32;                                                                                                                                                                                          
                x32 <- sqrt(3) * d;                                                                                                                                                                                
                Sigma[t1,t2] <- (1 + x32) * exp(-x32);                                                                                                                                                             
            } else if (2 == COVFN) {                                                                                                                                                                               
                # Matern52 covariance                                                                                                                                                                              
                real x52;                                                                                                                                                                                          
                x52 <- sqrt(5) * d;                                                                                                                                                                                
                Sigma[t1,t2] <- (1 + x52 + pow(x52, 2) / 3) * exp(-x52);                                                                                                                                           
            } else if (3 == COVFN) {                                                                                                                                                                               
                # Squared exponential covariance                                                                                                                                                                   
                Sigma[t1,t2] <- exp(- pow(d, 2) / 2);                                                                                                                                                              
            }                                                                                                                                                                                                      
            if (t1 != t2) {                                                                                                                                                                                        
                Sigma[t2,t1] <- Sigma[t1,t2];                                                                                                                                                                      
            } else {                                                                                                                                                                                               
                Sigma[t1,t1] <- Sigma[t1,t1] + noise;  # Add noise                                                                                                                                                 
            }                                                                                                                                                                                                      
        }                                                                                                                                                                                                          
    }                                                                                                                                                                                                              
}                                                                                                                                                                                                                  
model {                                                                                                                                                                                                            
    tau ~ normal(0, 1);                                                                                                                                                                                            
}                                                                                                                                                                                                                  
'                                                                                                                                                                                                                  
compiled <- stan(model_code=stan.code, chains=0)                                                                                                                                                                   
stan.data <- list(T=100, noise=0.0, COVFN=3)                                                                                                                                                                       
# Define a function to initialise the chains                                                                                                                                                                       
init.chain <- function() { list(tau=rnorm(stan.data$T)) }                                                                                                                                                          
# Try one iteration to check everything is OK                                                                                                                                                                      
fit <- stan(fit=compiled, data=stan.data, init=init.chain, iter=1, chains=1)

Has any one got any helpful ideas?

Thanks for any help,
John.

> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rstan_2.4.0    inline_0.3.13  Rcpp_0.11.3    vimcom_0.9-93  setwidth_1.0-3

loaded via a namespace (and not attached):
[1] codetools_0.2-9 stats4_3.0.2    tools_3.0.2

Ben Goodrich

unread,

Oct 31, 2014, 6:20:07 PM10/31/14

to stan-...@googlegroups.com

On Friday, October 31, 2014 3:33:37 PM UTC-4, John Reid wrote:

Has any one got any helpful ideas?

If you are sure that you are using a positive definite kernel (which Matern is) to construct the covariance matrix, then go ahead and declare it as a matrix[T,T] rather than a cov_matrix[T]. That will circumvent the check for positive definiteness, which is numerically unstable.

Ben

John Reid

unread,

Oct 31, 2014, 6:45:52 PM10/31/14

to stan-users

I've tried that trick but then it falls over using it in multi_normal_log so I'm not sure what to try next.

Michael Betancourt

unread,

Oct 31, 2014, 6:55:40 PM10/31/14

to stan-...@googlegroups.com

Add a constant term to the diagonal — Matern is theoretically guaranteed to be

positive-definite but that doesn’t mean that it will be positive-definite in practice,

especially when using floating-point arithmetic.

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben Goodrich

unread,

Oct 31, 2014, 8:52:40 PM10/31/14

to stan-...@googlegroups.com

On Friday, October 31, 2014 6:45:52 PM UTC-4, John Reid wrote:

> If you are sure that you are using a positive definite kernel (which Matern is) to construct the covariance matrix, then go ahead and declare it as a matrix[T,T] rather than a cov_matrix[T]. That will circumvent the check for positive definiteness, which is numerically unstable.

I've tried that trick but then it falls over using it in multi_normal_log so I'm not sure what to try next.

y ~ multi_normal_cholesky(mu, cholesky_decompose(Sigma));

Ben

John Reid

unread,

Nov 1, 2014, 5:02:38 AM11/1/14

to stan-...@googlegroups.com

On 31/10/14 22:55, Michael Betancourt wrote:

Add a constant term to the diagonal — Matern is theoretically guaranteed to be
positive-definite but that doesn’t mean that it will be positive-definite in practice,

especially when using floating-point arithmetic.

That constant term is what I was referring to as noise. It doesn't always help. It seems to help the squared exponential covariances more than the Matern.
John.

John Reid

unread,

Nov 1, 2014, 5:13:28 AM11/1/14

to stan-...@googlegroups.com

That gives me this message

> fit <- stan(fit=compiled,
          data=stan.data, init=init.chain, iter=1, chains=1)


          SAMPLING FOR MODEL 'stan.code' NOW (CHAIN 1).

          Error in eval(expr, envir, enclos) : 

            Rejecting user-specified initialization because of vanishing
          density.

          error occurred during calling the sampler; sampling not done

Ben Goodrich

unread,

Nov 1, 2014, 10:24:50 AM11/1/14

to stan-...@googlegroups.com

What are your inits? It shouldn't be necessary to specify them, but if you are starting with an identity matrix or something and getting this message, then probably there is a problem with your Stan program.

Ben

John Reid

unread,

Nov 1, 2014, 4:59:34 PM11/1/14

to stan-users

On 1 Nov 2014 14:24, "Ben Goodrich" <goodri...@gmail.com> wrote:

> What are your inits? It shouldn't be necessary to specify them, but if you are starting with an identity matrix or something and getting this message, then probably there is a problem with your Stan program.

Well the program is very simple, just what I sent in the first post with what you recommended added. The inits are just a random draw from a unit normal. I will post it when I get back to my machine.
John.

John Reid

unread,

Nov 2, 2014, 4:59:49 AM11/2/14

to stan-...@googlegroups.com


On Saturday, 1 November 2014 20:59:34 UTC, John Reid  wrote:



On 1 Nov 2014 14:24, "Ben Goodrich" <> wrote:
> What are your inits? It shouldn't be necessary to 
specify them, but if you are starting with an identity matrix or 
something and getting this message, then probably there is a problem 
with your Stan program.
Well the program is very simple, just what I sent in the 
first post with what you recommended added. The inits are just a random 
draw from a unit normal. I will post it when I get back to my machine. 

John.

Here's the code. Playing around with stan.data

 allows you to quickly see what is possible and what isn't:


library(rstan)                                                                                                                                                                                                     
stan.code <- '

data {                                                                                                                                                                                                             
    int<lower=1> T;  # Number of time points                                                                                                                                                                       
    real noise;  # Noise to add to covariance                                                                                                                                                                      
    int<lower=1,upper=3> COVFN;  # Choose covariance function                                                                                                                                                      
}                                                                                                                                                                                                                  
parameters {                                                                                                                                                                                                       
    vector[T] tau;  # Pseudotime

    vector[T] y;  # Outcomes                                                                                                                                                                                       
}                                                                                                                                                                                                                  
transformed parameters {                                                                                                                                                                                           
    matrix[T,T] Sigma;  # Covariance matrix over pseudotime


    for (t1 in 1:T) {                                                                                                                                                                                              
        for (t2 in 1:t1) {                                                                                                                                                                                         
            real d;                                                                                                                                                                                                
            d <- tau[t1] - tau[t2];                                                                                                                                                                                
            if (1 == COVFN) {                                                                                                                                                                                      
                # Matern32 covariance                                                                                                                                                                              
                real x32;                                                                                                                                                                                          
                x32 <- sqrt(3) * d;                                                                                                                                                                                
                Sigma[t1,t2] <- (1 + x32) * exp(-x32);                                                                                                                                                             
            } else if (2 == COVFN) {                                                                                                                                                                               
                # Matern52 covariance                                                                                                                                                                              
                real x52;                                                                                                                                                                                          
                x52 <- sqrt(5) * d;                                                                                                                                                                                
                Sigma[t1,t2] <- (1 + x52 + pow(x52, 2) / 3) * exp(-x52);                                                                                                                                           
            } else if (3 == COVFN) {                                                                                                                                                                               
                # Squared exponential covariance                                                                                                                                                                   
                Sigma[t1,t2] <- exp(- pow(d, 2) / 2);                                                                                                                                                              
            }                                                                                                                                                                                                      
            if (t1 != t2) {                                                                                                                                                                                        
                Sigma[t2,t1] <- Sigma[t1,t2];                                                                                                                                                                      
            } else {                                                                                                                                                                                               
                Sigma[t1,t1] <- Sigma[t1,t1] + noise;  # Add noise                                                                                                                                                 
            }                                                                                                                                                                                                      
        }                                                                                                                                                                                                          
    }                                                                                                                                                                                                              
}                                                                                                                                                                                                                  
model {                                                                                                                                                                                                            
    tau ~ normal(0, 1);

    # y ~ multi_normal(rep_vector(0, T), Sigma);                                                                                                                                                                   
    y ~ multi_normal_cholesky(rep_vector(0, T), cholesky_decompose(Sigma));                                                                                                                                        
}                                                                                                                                                                                                                  
'                                                                                                                                                                                                                  
compiled <- stan(model_code=stan.code, chains=0)                                                                                                                                                                   
stan.data <- list(T=100, noise=0.0, COVFN=3)

# Define a function to initialise the chains


init.chain <- function() { with(stan.data, list(tau=rnorm(T), y=rep(0,T))) }

# Try one iteration to check everything is OK


fit <- stan(fit=compiled, data=stan.data, init=init.chain, iter=1, chains=1)

                                                                                                                                       
sessionInfo()

Michael Betancourt

unread,

Nov 2, 2014, 5:30:05 AM11/2/14

to stan-...@googlegroups.com

Those aren’t Matern covariance functions -- replace d with | d |.

John Reid

unread,

Nov 2, 2014, 5:35:34 AM11/2/14

to stan-...@googlegroups.com

On 02/11/14 10:30, Michael Betancourt wrote:

Those aren’t Matern covariance functions -- replace d with | d |.

Oops, thanks for pointing that out. Sorted.

John.

Reply all

Reply to author

Forward