Latent variable model

310 views
Skip to first unread message

choko04

unread,
Feb 14, 2013, 11:29:14 AM2/14/13
to stan-...@googlegroups.com
Consider the following "small" model (latent variable model) written in STAN:

### simulate a data set --- 
N=10
n1=2
n2=3
d=2 
set.seed(2001)
A=matrix(rnorm(n1*d),byrow=TRUE,ncol=n1)
#A=matrix(c(.8,0.5,0.95,.9),byrow=TRUE,ncol=2)
set.seed(20013)
B=matrix(rnorm(n2*d),ncol=n2,byrow=TRUE)
#B=matrix(c(.7,.4,.8,.95,.8,.5),ncol=3,byrow=TRUE)
d=nrow(A)
al0=.1
bet0=10
sig_te=1
sig_lb=1
set.seed(200014)
Z=rmvnorm(N, mean=rep(1,d), sigma= diag(rep(1,d)), method=c("svd")) ## latent variable
set.seed(200015)
teta=Z%*%A  + rmvnorm(nrow(Z), mean=rep(0,ncol(A)), sigma= diag(rep(sig_te,ncol(A))),method=c("svd"))  ## construct the mean of the response X (counts)
set.seed(200016)
lambd=Z%*%B + rmvnorm(nrow(Z), mean=rep(0,ncol(B)), sigma= diag(rep(sig_lb,ncol(B))),method=c("svd")) ## construct the mean of the response X (counts)


X=apply(exp(teta),c(1,2),rpois,n=1) ## Simulate a matrix of counts

Y=apply(exp(lambd),c(1,2),rpois,n=1)  ## Simulate a matrix of counts

### stan code looks something like

Bcca_mod <- '
  data {
int<lower=1> N; // sample size
int<lower=1> n1;
int<lower=1> n2;
int<lower=1> d;
int X[n1,N];
int Y[n2,N];
}
parameters {
real<lower=0> bet[d];
real<lower=0> sig_te;
real<lower=0> sig_lb;
matrix[d,N] Z;
matrix[n1,d] A; 
matrix[n2,d] B; 
matrix<lower=0>[n1,N] teta;
matrix<lower=0>[n2,N] lambd;
}
transformed parameters {
matrix[n1,N] mu_te;
matrix[n2,N] mu_lb;
real<lower=0> sigte_sr;
real<lower=0> siglb_sr;
real<lower=0> beta_sqt[d];
 
mu_te <- A*Z;
mu_lb <- B*Z;
siglb_sr <- sqrt(sig_lb);
sigte_sr <- sqrt(sig_te);
for(i in 1:d)
beta_sqt[i] <- sqrt(bet[i]);
}
model {
bet ~ inv_gamma(.1, .1);
sig_lb ~ scaled_inv_chi_square(1, .1) ;
sig_te ~ scaled_inv_chi_square(1, .1) ;
for(i in 1:n1)
for(j in 1:d)
A[i,j] ~ normal(0, beta_sqt[j]);

for(i in 1:n2)
for(j in 1:d)
B[i,j] ~ normal(0, beta_sqt[j]);

for(j in 1:N)
{
for(i in 1:n1)
{
log(teta[i,j]) ~ normal(mu_te[i,j], sigte_sr);
lp__ <- lp__ - log(fabs(teta[i,j]));
X[i,j] ~ poisson(teta[i,j]);  
}
for(k in 1:n2)
{
log(lambd[k,j]) ~ normal(mu_lb[k,j], siglb_sr); 
lp__ <- lp__ - log(fabs(lambd[k,j]));
Y[k,j] ~ poisson(lambd[k,j]);
}
}
}
'

## Load the data
bcca_dat<-list("X"=t(X_nmct),"Y"=t(Y_nmct),"n1"=n1,"n2"=n2,"N"=N,"d"=d)

## run the Stan code
fit <- stan(model_code = Bcca_mod, data = bcca_dat, thin=3, iter = 1000, chains =3 ,pars=c("A","B","Z","sig_te","sig_lb","bet"))


## I am fitting this latent variable model and all the elements of the matrix A and B are estimated to be zero, while the element of Z are all estimated very large (see printout bellow)

Inference for Stan model: Bcca_mod.
3 chains: each with iter=20000; warmup=10000; thin=3; 6667 iterations saved.

               mean    se_mean         sd         2.5%         25%         50%        75%      97.5% n_eff Rhat
A[1,1]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     2  1.6
A[1,2]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     2  2.0
A[2,1]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     3  1.6
A[2,2]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     4  2.1
B[1,1]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     8  1.5
B[1,2]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     2  2.9
B[2,1]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     3  2.4
B[2,2]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     3  2.3
B[3,1]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     2  1.6
B[3,2]          0.0        0.0        0.0          0.0         0.0         0.0        0.0        0.0     3  1.5
Z[1,1]   -3245858.6  3027342.4  4694869.0   -8296523.8  -8287517.5  -2200344.4   318705.7  6124306.6     2  2.0
Z[1,2]    4504070.7  3684314.0  7250106.8   -2874141.0   -657176.8   4265641.0  4280775.2 26880865.4     4  1.6
Z[1,3]   19626308.7 22365727.0 31572636.9   -1064155.1   -557280.3    350743.2 31023577.3 97536345.9     2  2.0
Z[1,4]   11655709.4 17601811.2 24830371.8   -8987598.7  -8958613.4   3195439.8 22584286.8 73142855.9     2  2.4
Z[1,5]   -1122785.7  1150233.5  5007368.7  -13019383.0  -2837632.1     23611.4   617058.3  9846651.0    19  1.2
Z[1,6]   13407266.6 16947374.1 22742390.5   -7224670.5  -7181384.1   7740008.7 22675889.7 69516936.7     2  2.3
Z[1,7]   10338674.6  9285371.6 15815139.6    -730800.5    101530.3   3995719.8 12034942.7 55220472.6     3  1.9
Z[1,8]    7637098.9  7952136.4 13203756.9   -1226119.8     38935.5   1973112.7  9378422.6 45014451.8     3  1.5
Z[1,9]    1210542.8  1061282.0  1974234.5   -1194412.8   -164422.0   1245553.9  1254125.8  6725986.0     3  1.8
Z[1,10]  -2693903.5  2686573.4  3841212.4   -7148124.2  -7139701.0   -685440.4   141490.1  4113091.7     2  2.0
Z[2,1]   -5801621.5  9253713.8 14810936.4  -46321837.9  -9843362.2  -1341058.0  5800328.4  5816089.0     3  1.6
Z[2,2]   10096596.5 10222673.9 13586522.8   -6755993.9  -6735483.5  13484506.7 17905026.5 36518931.4     2  2.9
Z[2,3]    2800418.4 16855696.0 25850461.6  -26269696.8 -26218933.4   7560253.9 23756635.8 48619880.5     2  2.8
Z[2,4]    4772436.9 24295180.2 32893391.8  -48369262.5 -11745741.8  -9802489.5 39142586.9 60667291.8     2  2.3
Z[2,5]  -18865730.6 21355658.6 27715893.3  -75868998.5 -40927244.9  -1598305.9  1489068.9  1516422.9     2  6.3
Z[2,6]    2154677.3  8229186.5 23962144.5  -42000505.6 -11155852.2 -11115477.7 18637537.0 57604470.1     8  1.4
Z[2,7]   10948755.3 21065524.9 28735783.6  -16035994.5 -16006463.7   5552546.1 22052034.6 89614348.3     2  2.1
Z[2,8]  -11040127.8  9346621.3 34092517.8 -100445107.4  -9607665.2  -9582981.7  3254107.2 67018467.2    13  1.1
Z[2,9]    1951657.3  1878997.1  5318720.3   -9089229.1  -1078605.5    292002.1  4742178.3 14580104.7     8  1.6
Z[2,10]  -2725101.6  8803446.4 11911511.2  -29859881.7 -10526394.2   -917495.2  9189455.9  9206906.8     2  2.6
sig_te          1.8        1.5        2.7          0.3         0.5         0.7        1.9        8.6     3  1.4
sig_lb          2.6        1.3        1.9          0.4         1.3         2.1        3.5        7.3     2  1.7
bet[1]          3.4        0.3        2.2          1.4         2.6         2.8        3.4        9.2    45  1.0
bet[2]          3.5        0.6        2.8          1.3         2.4         2.5        4.1        9.4    21  1.1
lp__          336.9        5.5       11.7        312.5       328.5       339.3      345.5      358.0     5  1.3


Is it something wrong with my code and/or model?  The MCMC seems to converge in Openbugs, after a somewhat long run (200000).

Any help will be appreciated.




Bob Carpenter

unread,
Feb 14, 2013, 3:24:44 PM2/14/13
to stan-...@googlegroups.com
The suspicious part of this is that the A and B matrices aren't sampling at all.  You're declaring A as a parameter here:

  matrix[d,N] Z;
  matrix[n1,d] A; 

and you define a transformed parameter:

  mu_te <- A*Z;

and sample in the model with:

  for(i in 1:n1)
    for(j in 1:d)
      A[i,j] ~ normal(0, beta_sqt[j]);

but I don't see any prior on Z itself, only its use through the transformed
parameter mu_te.  

  log(teta[i,j]) ~ normal(mu_te[i,j], sigte_sr);

This seems like a recipe for unified results where A tries
to be as small as possible and Z as big as possible.

I'd start by printing all the parameters, including the transformed parameters.  The only way I can see A staying at 0 is if its standard deviation parameter stays at 0.  Or if RStan is doing too much rounding of numbers in its prints.  Maybe you could look at them in R itself using RStan's extract functions?

Some other tips:

You might want to use the built-in lognormal distribution, which can be vectorized and also unfolds the change-of-variables adjustment directly into the probability function, so it's much more efficient.  You do the change-of-variables yourself here:

  log(teta[i,j]) ~ normal(mu_te[i,j], sigte_sr);
  lp__ <- lp__ - log(fabs(teta[i,j]));

You don't actually need the fabs, because teta is constrained to be positive in its declaration.

But that won't change convergence -- it looks like you have the correct adjustment.

In addition, you wouldn't need to take an absolute value of lambd, because it's declared with a lower bound of 0.

You can vectorize the normal distributions, replacing

for(i in 1:n2)
  for(j in 1:d)
    B[i,j] ~ normal(0, beta_sqt[j]);

with

  for (i in 1:n2)
    B[i] ~ normal(0,beta_sqt);

which will be much faster.

- Bob

choko04

unread,
Feb 15, 2013, 1:40:43 AM2/15/13
to stan-...@googlegroups.com
 Bob,

Indeed I omit to define Z. Things seem to be working better now that I properly define it. But the very low effective sample size is still worrying some, say about 25 for 100,000 samples. 

Thanks.
Reply all
Reply to author
Forward
0 new messages