transformed parameters leading to "validate transformed params: ... is not symmetric."

437 views
Skip to first unread message

Lloyd Elliott

unread,
Sep 24, 2016, 1:04:23 PM9/24/16
to Stan users mailing list
Hello all,

I'm trying to implement a mixture of multivariate Gaussians in Stan. I'm using rstan version 2.12.1. I have simulated bimodal data for the two dimensional case, with means at (-7, 6) and (6, -7), and I'm using a LKJ prior on the covariance matrices (lambda[k]) of the mixture components. It works perfectly. But when I try to transform lambda[k] (thereby using a more complicated prior on the covariance matrices of the mixture), I'm running into trouble. Instead of using lambda[k] I'd like to use omega[k] = f(lambda[k]), for a given function f (which could involve other random variables). I'm running into trouble even when f(X) = X. Here's the code for the mixture of multivariate Gaussians before adding f (this code works perfectly):

library(rstan)
model = " // Multivariate mixture of Gaussians
  data {
    int<lower=2> n;
    int<lower=2> k;
    int<lower=2> d;
    row_vector[d] x[n];
  }

  parameters {
    simplex[k] rho;
    row_vector[d] mu[k];
    corr_matrix[d] lambda[k];
  }

  model {
    rho ~ dirichlet(rep_vector(1.0, k));
    for (j in 1:k) {
      mu[j] ~ normal(0, 1);
      lambda[k] ~ lkj_corr(1);
    }

    for (i in 1:n) {
      real p[k];
      for (j in 1:k) {
        p[j] = log(rho[j]) + multi_normal_lpdf(x[i] | mu[j], lambda[j]);
      }

      target += log_sum_exp(p);
    }
  }
"

d = 2
library(mvtnorm) # for rmvnorm
x1 = rmvnorm(200, c(-7, 6), sigma = diag(d))
x2 = rmvnorm(100, c(6, -7), sigma = diag(d))
x = rbind(x1, x2)
n = dim(x)[1]
k = 3

data = list(x = x, n = n, d = d, k = k)
pars = c("rho", "mu", "lambda")
identities = list(diag(rep(1, d), diag(rep(1, d))))
init = list(list(simplex = rep(1/k, k),
  mu = matrix(0, nrow = k, ncol = d),
  lambda = identities))

samples = stan(model_code = model,
  data = data,
  pars = pars,
  iter = 100,
  chains = 1,
  thin = 1,
  seed = 1337,
  init = init)

print(samples)

The output is as follows:

...
Chain 1, Iteration: 100 / 100 [100%]  (Sampling)
 Elapsed Time: 2.80285 seconds (Warm-up)
               3.98537 seconds (Sampling)
               6.78822 seconds (Total)

The following numerical problems occured the indicated number of times after warmup on chain 1
                                                                       count
Exception thrown at line 19: lkj_corr_log: y is not positive definite.    12
When a numerical problem occurs, the Hamiltonian proposal gets rejected.
If the number in the 'count' column is small, do not ask about this message on stan-users.
Inference for Stan model: 23b4bc39bbc3c199b918718a2a7e8c32.
1 chains, each with iter=100; warmup=50; thin=1; 
post-warmup draws per chain=50, total post-warmup draws=50.

                  mean se_mean   sd     2.5%      25%      50%      75%    97.5% n_eff Rhat
rho[1]            0.33    0.00 0.02     0.30     0.32     0.33     0.34     0.37    37 1.00
rho[2]            0.00    0.00 0.00     0.00     0.00     0.00     0.01     0.02    17 1.02
rho[3]            0.66    0.00 0.02     0.63     0.65     0.66     0.68     0.70    50 0.99
mu[1,1]           5.97    0.01 0.09     5.82     5.89     5.97     6.02     6.16    50 1.02
mu[1,2]          -7.09    0.01 0.09    -7.24    -7.17    -7.11    -7.03    -6.93    50 1.04
mu[2,1]           0.02    0.21 0.95    -1.78    -0.47    -0.07     0.80     1.43    21 1.01
mu[2,2]          -0.11    0.14 0.72    -1.45    -0.57    -0.13     0.24     1.07    27 1.04
mu[3,1]          -7.01    0.01 0.08    -7.17    -7.06    -7.01    -6.95    -6.85    50 1.01
mu[3,2]           5.99    0.01 0.07     5.88     5.95     5.98     6.03     6.13    50 1.01
...

So, my prior parameterization is with 3 components (k = 3), but I've correctly found two posterior mixture components with positive values of rho (rho[1] = .33, rho[3] = .66), and I've found the right means (mu[1] = 5.97, -7.09, mu[3] = -7.01, 5.99) for these components. Perfect!

But now, when I extend this code to include omega[k] = f(lambda[k]), even with f(X) = X, I'm running into trouble. I'm trying to implement f through a "transformed parameters" block. The code is as follows:

library(rstan)
model = " // Multivariate mixture of Gaussians
  data {
    int<lower=2> n;
    int<lower=2> k;
    int<lower=2> d;
    row_vector[d] x[n];
  }

  parameters {
    simplex[k] rho;
    row_vector[d] mu[k];
    corr_matrix[d] lambda[k];
  }

  transformed parameters {
    cov_matrix[d] omega[k];
    for (j in 1:k) {
      omega[k] = lambda[k]; // omega = f(lambda), with f(X) = X
    }
  }

  model {
    rho ~ dirichlet(rep_vector(1.0, k));
    for (j in 1:k) {
      mu[j] ~ normal(0, 1);
      lambda[k] ~ lkj_corr(1);
    }

    for (i in 1:n) {
      real p[k];
      for (j in 1:k) {
        p[j] = log(rho[j]) + multi_normal_lpdf(x[i] | mu[j], omega[j]);
      }

      target += log_sum_exp(p);
    }
  }
"

d = 2
library(mvtnorm) # for rmvnorm
x1 = rmvnorm(200, c(-7, 6), sigma = diag(d))
x2 = rmvnorm(100, c(6, -7), sigma = diag(d))
x = rbind(x1, x2)
n = dim(x)[1]
k = 3

data = list(x = x, n = n, d = d, k = k)
pars = c("rho", "mu", "lambda")
identities = list(diag(rep(1, d), diag(rep(1, d))))
init = list(list(simplex = rep(1/k, k),
  mu = matrix(0, nrow = k, ncol = d),
  lambda = identities))

samples = stan(model_code = model,
  data = data,
  pars = pars,
  iter = 100,
  chains = 1,
  thin = 1,
  seed = 1337,
  init = init)

print(samples)

The output is as follows:

...
[595] "Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                        
[596] "validate transformed params: omega[k0__] is not symmetric. omega[k0__][1,2] = nan, but omega[k0__][2,1] = nan"   
[597] "If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,"
[598] "but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."                              
[599] "Rejecting initial value:"                                                                                                              
[600] "  Error evaluating the log probability at the initial value."                                                                          
[601] "Initialization between (-2, 2) failed after 100 attempts. "                                                                            
[602] " Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model."                                 
[603] "Error in eval(ei, envir) : "                                                                                                           
error occurred during calling the sampler; sampling not done
Stan model '5244928831b01aff407c40bddf406215' does not contain samples.
Stan model '5244928831b01aff407c40bddf406215' does not contain samples.
Stan model '5244928831b01aff407c40bddf406215' does not contain samples.

The "Informal message" is repeated many times. I thought that maybe omega wasn't being initialized correctly, so I modified the line in the R code where I define "init" as follows (i.e., I specify init so that lambda and omega are initialized to the same values):

init = list(list(simplex = rep(1/k, k),
  mu = matrix(0, nrow = k, ncol = d),
  lambda = identities,
  omega = identities))

With this modification, I still get the same error. I also tried adding "omega" to "pars" (in the off chance that "pars" was specifying which variables were initialized using my "init" argument to "stan"). But that didn't work either. Can anyone help me debug this? How do I correctly use the "transformed parameters" block for this situation?

Thanks a lot,
L

Ben Goodrich

unread,
Sep 24, 2016, 2:11:18 PM9/24/16
to Stan users mailing list
On Saturday, September 24, 2016 at 1:04:23 PM UTC-4, Lloyd Elliott wrote:
I'm trying to implement a mixture of multivariate Gaussians in Stan. I'm using rstan version 2.12.1. I have simulated bimodal data for the two dimensional case, with means at (-7, 6) and (6, -7), and I'm using a LKJ prior on the covariance matrices (lambda[k]) of the mixture components. It works perfectly. But when I try to transform lambda[k] (thereby using a more complicated prior on the covariance matrices of the mixture), I'm running into trouble. Instead of using lambda[k] I'd like to use omega[k] = f(lambda[k]), for a given function f (which could involve other random variables). I'm running into trouble even when f(X) = X. Here's the code for the mixture of multivariate Gaussians before adding f (this code works perfectly):

If you are absolutely sure that f() outputs a positive definite matrix by construction and can only fail to be positive definite numerically, then you can declare omega to be an array of "plain" matrices, like

transformed parameters {
  matrix[d,d] omega[k];
  for (j in 1:k) {
    omega[k] = f(lambda[k]);
}

That will drop the positive definiteness check for omega. Conversely, if f() is not positive definite by construction, then this strategy will fail to produce a reasonable posterior, but so will your original formulation because HMC cannot easily account for a measureable hole in the parameter space.

Ben

Lloyd Elliott

unread,
Sep 28, 2016, 6:16:32 AM9/28/16
to stan-...@googlegroups.com
Hi Ben,

Thanks for your reply. In the second block of code I provided, the function f I used was f(X) = X, and so I'm sure it's outputting a positive definite matrix (as X is defined as a corr_matrix). The relevant code is here:

transformed parameters {
  cov_matrix[d] omega[k];
  for (j in 1:k) {
    omega[k] = lambda[k]; // omega = f(lambda), with f(X) = X
  }
}

That is the code that was causing problems. Eventually I want to use f(X) = HXH, with H diagonal, but I'm already running into trouble with the "transformed parameters" block even with f(X) = X. I've now tried replacing "cov_matrix[d] omega[k];" with "matrix[d, d] omega[k];" based on your suggestion (this change was made to the second code block from my first post). When I do that, I get the following error:

SAMPLING FOR MODEL 'a9cdf93d5bc3b2355caa9fa0aa8f3b52' NOW (CHAIN 1).
[1] "Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                                   
[2] "Exception thrown at line 32: multi_normal_log: Covariance matrix is not symmetric. Covariance matrix[1,2] = nan, but Covariance matrix[2,1] = nan"
[3] "If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,"           
[4] "but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."                                         
[5] "Rejecting initial value:"                                                                                                                         
[6] "  Error evaluating the log probability at the initial value."                                                                                     
[7] "Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                                   
[8] "Exception thrown at line 32: multi_normal_log: Covariance matrix is not symmetric. Covariance matrix[1,2] = nan, but Covariance matrix[2,1] = nan"
[9] "If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,"           
[10] "but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."                                         
[11] "Rejecting initial value:"                                                                                                                         
[12] "  Error evaluating the log probability at the initial value."                                                                                     
[13] "Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                                   
[14] "Exception thrown at line 32: multi_normal_log: Covariance matrix is not symmetric. Covariance matrix[1,2] = nan, but Covariance matrix[2,1] = nan"
[15] "If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,"           
[16] "but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."

...

error occurred during calling the sampler; sampling not done
Stan model 'a9cdf93d5bc3b2355caa9fa0aa8f3b52' does not contain samples.

The "Informational Message" is repeated many times. This error is slightly different than the one I was getting with the "cov_matrix" code I was using in my last post. The error is now in the "multi_normal_log" function instead of the "validate transformed params:" operation. But there is some similarity in that in both cases the transformed parameters are complaining about being "nan".

I thought that I could inline a version of "multi_normal_log" that didn't require it's argument to be positive definite.

I did this by replacing "p[j] = log(rho[j]) + multi_normal_lpdf(x[i] | mu[j], omega[j]);" with the following code:

row_vector[d] z;
z = x[i] - mu[j];
p[j] = log(rho[j]) - log_determinant(omega[k])/2 - multiply(z, multiply(omega[j], z'))/2;

and changing the transformed parameters block so that omega was the inverse of lambda:

transformed parameters {
  cov_matrix[d] omega[k];
  for (j in 1:k) {
    omega[k] = inverse(lambda[k]); // omega = f(lambda), with f(X) = X^{-1}
  }
}

When I did this, I get the following error, which seems to be complaining that the entries of omega are "nan":

...
[589] "Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                        
[590] "Exception thrown at line 34: multiply: A[1] is nan, but must not be nan!"                                                              
[591] "If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,"
[592] "but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."                              
[593] "Rejecting initial value:"                                                                                                              
[594] "  Error evaluating the log probability at the initial value."                                                                          
[595] "Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:"                        
[596] "Exception thrown at line 34: multiply: A[1] is nan, but must not be nan!"                                                              
[597] "If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,"
[598] "but if this warning occurs often then your model may be either severely ill-conditioned or misspecified."                              
[599] "Rejecting initial value:"                                                                                                              
[600] "  Error evaluating the log probability at the initial value."                                                                          
[601] "Initialization between (-2, 2) failed after 100 attempts. "                                                                            
[602] " Try specifying initial values, reducing ranges of constrained values, or reparameterizing the model."                                 
[603] "Error in eval(ei, envir) : "                                                                                                           
error occurred during calling the sampler; sampling not done
Stan model 'a2d1881831356711c29d61facd9e5a3a' does not contain samples.

So in conclusion, it seems that the problem is more that my transformed parameters are nan, and that this is happening regardless of whether they are defined as cov_matrix or matrix. Any more ideas as to how to fix this?

Thanks,
L

PS --- version of code with the inlined multi_normal_lpdf  follows (complete rstan script): 


library(rstan)
model = " // Multivariate mixture of Gaussians
  data {
    int<lower=2> n;
    int<lower=2> k;
    int<lower=2> d;
    row_vector[d] x[n];
  }

  parameters {
    simplex[k] rho;
    row_vector[d] mu[k];
    corr_matrix[d] lambda[k];
  }

  transformed parameters {
    matrix[d, d] omega[k];
    for (j in 1:k) {
      omega[k] = inverse(lambda[k]); // omega = f(lambda), with f(X) = X^{-1}
    }
  }

  model {
    rho ~ dirichlet(rep_vector(1.0, k));
    for (j in 1:k) {
      mu[j] ~ normal(0, 1);
      lambda[k] ~ lkj_corr(1);
    }

    for (i in 1:n) {
      real p[k];
      for (j in 1:k) {
        row_vector[d] z;
        z = x[i] - mu[j];
        p[j] = log(rho[j]) + log_determinant(omega[k])/2 - multiply(z, multiply(omega[j], z'))/2;

Daniel Lee

unread,
Sep 28, 2016, 7:14:36 AM9/28/16
to stan-...@googlegroups.com
Hi Lloyd,

The problem you're having is a lot simpler than you think. 

In your for loop, the index variable is j, but you're using k within the loop. The elements 1:k-1 aren't being assigned reasonable values. The error messages were pointing out the problem. 



Daniel

--
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Lloyd Elliott

unread,
Sep 29, 2016, 10:17:25 AM9/29/16
to stan-...@googlegroups.com
Hello Daniel,

Thanks so much for your reply. Looks like I had that bug in the first code sample I posted, but stan was recovering from it (I didn't have priors on lambda[1] ... lambda[k-1], and only a prior on lambda[k] --- but stan didn't seem to really mind those missing priors). When I added the "transform parameters" block (in the second code sample I posted) I seem to have blasted that bug everywhere :).

When I fixed the bug you pointed out, the "transform parameters" block works fine. I've now implemented a mixture of multivariate Gaussians in rstan with a prior on the covariance matrix of each mixture component given by HLH, where H is a diagonal matrix with exponentially distributed entries, and L is a uniform distribution on correlation matrices. I feel like this is a good prior, because L can capture arbitrary correlation structure, and then H can scale the extent to which that structure affects each coordinate of the data, with some robustness added through the L1 penalty induced by the exponential prior on H.

In case anyone is interested, I've wrapped inference based on this model into an R S3 object with a predict method, which can impute sporadically missing values in a test set using the conditional distribution induced by the samples produced by stan. The training and target data is rescaled internally into a range for which the prior makes sense, so not much preprocessing or parameter tweaking needs to be done, aside from chosing k (k is the number of mixture components, it is not a for loop index).

Usage is like this:

fit = mgfit(x)
pred = predict(fit, y)

Here, x is an nxd matrix with no NAs, and y is an mxd matrix with any pattern of NAs. pred will be mxd and it will have the predictions for NA values of y, found by averaging the predictions (conditional multivariate Gaussian means) over all collected MCMC iterations and all components (weighted by posterior component assignment probabilities).

I'm calling this v0.1 because there may still be bugs in my for loop indices :) and I haven't tested corner cases. Here's an example on the iris dataset, showing better than chance performance:

# load dataset
x = iris[, 1:4]

# split into test set and train set
x = x[sample(length(x))]
n = dim(x)[1]
d = dim(x)[2]
n.train = floor(n/2)
x.train = x[1:n.train, ]
x.validate = x[-(1:n.train), ]
x.test = x.validate
n.test = dim(x.test)[1]

# hold out 1/2 entries sporadically in test set
for (j in 1:d) {
  x.test[sample(n.test, floor(n.test/2)), j] = NA
}

x.validate[!is.na(x.test)] = NA

# fit mixture of Gaussians
fit = mgfit(x.train)

# predict into test set
x.pred = predict(fit, x.test)

# print RMSEs of model, compared to baseline
cat('RMSEs:\n')
cat('VARIABLE MODEL BASE\n')
for (j in 1:d) {
  mask = is.na(x.test[, j])
  model = x.pred[mask, j]
  truth = x.validate[mask, j]
  base = mean(x.train[, j])
  model.rmse = sqrt(mean(truth - model)^2)
  base.rmse = sqrt(mean(truth - base)^2)
  cat(sprintf("%s %.4f %.4f\n", colnames(x)[j], model.rmse, base.rmse))
}

The output is:

...
RMSEs:
VARIABLE MODEL BASE
Sepal.Length 0.2311 1.0938
Petal.Length 1.1687 2.5691
Petal.Width 0.8141 1.2664
Sepal.Width 0.1416 0.2485

Not bad! Here's the code for mgfit and predict.mg (mg.r):

# mgfit/predict v0.1: model and predict variables using a mixture of Gaussians
# Copyright (c) 2016, Lloyd T. Elliott
# All rights reserved.

# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:

# 1. Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.

# 2. Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.

# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# fit a model 
mgfit = function(x, # dataset
    k = 7, # number of components to use in mixture
    seed = NULL, # random seed for stan (NULL = generate random seed)
    quiet = TRUE, # suppress all output (currently unimplemented)
    iter = 1000, # number of iterations
    thin = 10, # amount of thinning (collected = iter/thin)
    burnin = .75 # proportion of initial iterations to throw away as burnin
    ) {

  if (is.null(seed)) {
    seed = sample(.Machine$integer.max, 1)
  }

  model = "// Multivariate mixture of Gaussians
    data {
      int<lower=2> n;
      int<lower=2> k;
      int<lower=2> d;
      row_vector[d] x[n];
    }

    parameters {
      simplex[k] rho;
      row_vector[d] mu[k];
      corr_matrix[d] lambda[k];
      vector<lower=0>[d] h[k];
    }

    transformed parameters {
      cov_matrix[d] omega[k];
      for (j in 1:k) {
        vector[d] al;
        for (i in 1:d) {
          al[i] = 1.0/h[j][i];
        }
        omega[j] = diag_matrix(al) * inverse(lambda[j]) * diag_matrix(al);
      }
    }

    model {
      rho ~ dirichlet(rep_vector(1.0, k));
      for (j in 1:k) {
        mu[j] ~ normal(0, 1);
        lambda[j] ~ lkj_corr(1);
        h[j] ~ exponential(1);
      }

      for (i in 1:n) {
        real p[k];
        for (j in 1:k) {
          row_vector[d] z;
          z = x[i] - mu[j];
          p[j] = log(rho[j]) + multi_normal_prec_lpdf(x[i] | mu[j], omega[j]);
        }

        target += log_sum_exp(p);
      }
    }"

  library(rstan)
  n = dim(x)[1]
  d = dim(x)[2]

  # scale training set
  means = colMeans(x)
  stds = apply(x, 2, sd)
  x = t(apply(x, 1, function(v) { return((v - means)/stds) }))
  pars = c("rho", "mu", "lambda", "omega", "h")
  identities = list()
  ones = list()
  for (j in 1:k) {
    identities[[j]] = diag(rep(1, d))
    ones[[j]] = rep(1, d)
  }

  init = list(list(rho = rep(1/k, k),
    mu = matrix(0, nrow = k, ncol = d),
    lambda = identities,
    omega = identities,
    h = ones))

  data = list(x = x, n = n, d = d, k = k)
  samples = stan(model_code = model,
    data = data,
    pars = pars,
    chains = 1,
    iter = iter,
    thin = thin,
    seed = seed,
    init = init)

  result = structure(list(k = k,
    d = d,
    samples = samples,
    means = means,
    stds = stds,
    count = iter/thin,
    burnin = burnin), class = "mg")

  return(result)
}
 
# helper functions
mvnpdf = function(x, mu, logdet, omega) {
  v = matrix(x - mu)
  return(-logdet/2 - (t(v) %*% omega %*% v)/2)
}

logsumexp = function(x) {
  m = max(x)
  return(log(sum(exp(x - m))) + m)
}
 
# perform predictions based on model
predict.mg = function(mg, x) {

  # scale test set
  x = t(apply(x, 1, function(v) { return((v - mg$means)/mg$stds) }))
  k = mg$k
  d = mg$d
  n = dim(x)[1]
  z = matrix(0, nrow = n, ncol = d)
  count = mg$count
  burnin = mg$burnin
  samples = fit$samples@sim$samples[[1]]
  denom = 0
  for (iter in ceiling(count * burnin):count) {
    denom = denom + 1

    # extract sample from stan's trace
    rho = rep(NA, k)
    mus = list()
    lambdas = list()
    for (j in 1:k) {
      token = sprintf("rho[%d]", j)
      rho[j] = unlist(samples[token])[iter]
      mu = matrix(NA, nrow = d, ncol = 1)
      for (i in 1:d) {
        token = sprintf("mu[%d,%d]", j, i)
        mu[i, 1] = unlist(samples[token])[iter]
      }

      omega = matrix(NA, nrow = d, ncol = d)
      for (i1 in 1:d) {
        for (i2 in 1:d) {
          token = sprintf("omega[%d,%d,%d]", j, i1, i2)
          omega[i1, i2] = unlist(samples[token])[iter]
        }
      }

      mus[[j]] = mu
      lambdas[[j]] = solve(omega)
    }

    transforms = list()
    for (i in 1:n) {

      # find the missingness pattern
      pattern = is.na(x[i, ])
      x1 = matrix(x[i, pattern])
      x2 = matrix(x[i, !pattern])
      if (sum(pattern) == d) {
        
        # fully observed row: nothing to predict
        next
      }

      if (sum(pattern) == 0) {

        # completely missing row: predict the mean
        next
      }

      p = rep(0, k)
      for (j in 1:k) {
        key = sprintf("%s%d", paste(sprintf("%d", pattern), collapse = ""), j)
        if (is.null(transforms[[key]])) {

          # we haven't seen this pattern: calculate the conditionals
          lambda22 = lambdas[[j]][!pattern, !pattern]
          omega22 = solve(lambda22)
          if (is.matrix(lambda22)) {
            logdet22 = unlist(determinant(lambda22, logarithm = TRUE))[[1]]
          } else {
            logdet22 = log(lambda22)
          }

          lambda12 = lambdas[[j]][pattern, !pattern]
          transform = lambda12 %*% solve(lambda22)
          info = list(omega22 = omega22,
            logdet22 = logdet22,
            transform = transform)

          transforms[[key]] = info
        } else {

          # we've already seen this pattern: retreive the conditionals
          info = transforms[[key]]
          omega22 = info$omega22
          logdet22 = info$logdet22
        }

        # compute posterior component assignment
        mu2 = mus[[j]][!pattern, 1]
        p[j] = log(rho[j]) + mvnpdf(x2, mu2, logdet22, omega22)
      }

      # normalize posterior component assignments
      p = exp(p - logsumexp(p))
      for (j in 1:k) {

        # average the conditional predictions
        key = sprintf("%s%d", paste(sprintf("%d", pattern), collapse = ""), j)
        transform = transforms[[key]]$transform
        mu1 = matrix(mus[[j]][pattern, 1])
        mu2 = matrix(mus[[j]][!pattern, 1])
        inc = p[j] * (mu1 + transform %*% (x2 - mu2))
        z[i, pattern] = z[i, pattern] + inc
      }
    }
  }

  z = z/denom

  # transform the predictions back to the original space
  z = t(apply(z, 1, function(v) { return(v * mg$stds + mg$means) }))

  # remove observed values from predictions
  z[!is.na(x)] = NA
  return(z)
}

That seems to work for rstan version 2.12.1 and R version 3.3.1.

But I've found that my call to stan is pretty slow. Does the following block introduce inefficiency in stan?

vector[d] al;
for (i in 1:d) {
  al[i] = 1.0/h[j][i];
}

I know that sometimes software like stan can be sped up if for loops are rolled into a single function call. Is that true for stan? I tried "al = 1.0/h[j];" but that gave me an error. I'm not having much luck profiling this code.

L

Bob Carpenter

unread,
Sep 29, 2016, 5:39:35 PM9/29/16
to stan-...@googlegroups.com
Thanks for sharing.

The biggest speedup would come from using a Cholesky factor
data type to parameterize a correlation matrix. See the manual
chapter on regression for an example. And this is going to
be really slow:

omega[j] = diag_matrix(al) * inverse(lambda[j]) * diag_matrix(al);

First, we have quad_form_diag() function that'd make it much more
efficient than filling diagonal matrices, but you don't want to do
that, either.

Then use multi_normal_cholesky with the scaled cholesky factor, and you
won't need to use the precision form, and won't have to do inverses yourself.

- Bob

> On Sep 29, 2016, at 10:17 AM, Lloyd Elliott <lloyd.elli...@gmail.com> wrote:
>
> Hello Daniel,
>
> Thanks so much for your reply. Looks like I had that bug in the first code sample I posted, but stan was recovering from it (I didn't have priors on lambda[1] ... lambda[k-1], and only a prior on lambda[k] --- but stan didn't seem to really mind those missing priors). When I added the "transform parameters" block I seem to have blasted that bug everywhere :).
>
> When I fixed the bug you pointed out, the "transform parameters" block works fine. I've now implemented a mixture of multivariate Gaussians in rstan with a prior on the covariance matrix of each mixture component given by HLH, where H is a diagonal matrix with exponentially distributed entries, and L is a uniform distribution on correlation matrices. I feel like this is a good prior, because L can capture arbitrary correlation structure, and then H can scale the extent to which that structure affects each coordinate of the data, with some robustness added through the L1 penalty induced by the exponential prior on H.
>
> In case anyone is interested, I've wrapped inference based on this model into an R S3 object with a predict method, which can impute sporadically missing values in a test set using the conditional distribution induced by the samples produced by stan. The training and target data is rescaled internally into a range for which the prior makes sense, so not much preprocessing or parameter tweaking needs to be done, aside from chosing k (k is the number of mixture components, it is not a for loop index).
>
> Usage is like this:
>
> fit = mgfit(x)
> pred = predict(fit, y)
>
> Here, x is an nxd matrix with no NAs, and y is an mxd matrix with any pattern of NAs. pred will be mxd and it will have the predictions for non-NA values of y, found by averaging the predictions (conditional multivariate Gaussian means) over all collected MCMC iterations and all components (weighted by posterior component assignment probabilities).
Reply all
Reply to author
Forward
0 new messages