Latent variable model

310 views

Skip to first unread message

choko04

unread,

Feb 14, 2013, 11:29:14 AM2/14/13

to stan-...@googlegroups.com

Consider the following "small" model (latent variable model) written in STAN:

### simulate a data set ---

N=10

n1=2

n2=3

d=2

set.seed(2001)

A=matrix(rnorm(n1*d),byrow=TRUE,ncol=n1)

#A=matrix(c(.8,0.5,0.95,.9),byrow=TRUE,ncol=2)

set.seed(20013)

B=matrix(rnorm(n2*d),ncol=n2,byrow=TRUE)

#B=matrix(c(.7,.4,.8,.95,.8,.5),ncol=3,byrow=TRUE)

d=nrow(A)

al0=.1

bet0=10

sig_te=1

sig_lb=1

set.seed(200014)

Z=rmvnorm(N, mean=rep(1,d), sigma= diag(rep(1,d)), method=c("svd")) ## latent variable

set.seed(200015)

teta=Z%*%A + rmvnorm(nrow(Z), mean=rep(0,ncol(A)), sigma= diag(rep(sig_te,ncol(A))),method=c("svd")) ## construct the mean of the response X (counts)

set.seed(200016)

lambd=Z%*%B + rmvnorm(nrow(Z), mean=rep(0,ncol(B)), sigma= diag(rep(sig_lb,ncol(B))),method=c("svd")) ## construct the mean of the response X (counts)

X=apply(exp(teta),c(1,2),rpois,n=1) ## Simulate a matrix of counts

Y=apply(exp(lambd),c(1,2),rpois,n=1) ## Simulate a matrix of counts

### stan code looks something like

Bcca_mod <- '

data {

int<lower=1> N; // sample size

int<lower=1> n1;

int<lower=1> n2;

int<lower=1> d;

int X[n1,N];

int Y[n2,N];

}

parameters {

real<lower=0> bet[d];

real<lower=0> sig_te;

real<lower=0> sig_lb;

matrix[d,N] Z;

matrix[n1,d] A;

matrix[n2,d] B;

matrix<lower=0>[n1,N] teta;

matrix<lower=0>[n2,N] lambd;

}

transformed parameters {

matrix[n1,N] mu_te;

matrix[n2,N] mu_lb;

real<lower=0> sigte_sr;

real<lower=0> siglb_sr;

real<lower=0> beta_sqt[d];

mu_te <- A*Z;

mu_lb <- B*Z;

siglb_sr <- sqrt(sig_lb);

sigte_sr <- sqrt(sig_te);

for(i in 1:d)

beta_sqt[i] <- sqrt(bet[i]);

}

model {

bet ~ inv_gamma(.1, .1);

sig_lb ~ scaled_inv_chi_square(1, .1) ;

sig_te ~ scaled_inv_chi_square(1, .1) ;

for(i in 1:n1)

for(j in 1:d)

A[i,j] ~ normal(0, beta_sqt[j]);

for(i in 1:n2)

for(j in 1:d)

B[i,j] ~ normal(0, beta_sqt[j]);

for(j in 1:N)

{

for(i in 1:n1)

{

log(teta[i,j]) ~ normal(mu_te[i,j], sigte_sr);

lp__ <- lp__ - log(fabs(teta[i,j]));

X[i,j] ~ poisson(teta[i,j]);

}

for(k in 1:n2)

{

log(lambd[k,j]) ~ normal(mu_lb[k,j], siglb_sr);

lp__ <- lp__ - log(fabs(lambd[k,j]));

Y[k,j] ~ poisson(lambd[k,j]);

}

## Load the data

bcca_dat<-list("X"=t(X_nmct),"Y"=t(Y_nmct),"n1"=n1,"n2"=n2,"N"=N,"d"=d)

## run the Stan code

fit <- stan(model_code = Bcca_mod, data = bcca_dat, thin=3, iter = 1000, chains =3 ,pars=c("A","B","Z","sig_te","sig_lb","bet"))

## I am fitting this latent variable model and all the elements of the matrix A and B are estimated to be zero, while the element of Z are all estimated very large (see printout bellow)

Inference for Stan model: Bcca_mod.

3 chains: each with iter=20000; warmup=10000; thin=3; 6667 iterations saved.

mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat

A[1,1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2 1.6

A[1,2] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2 2.0

A[2,1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3 1.6

A[2,2] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4 2.1

B[1,1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8 1.5

B[1,2] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2 2.9

B[2,1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3 2.4

B[2,2] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3 2.3

B[3,1] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2 1.6

B[3,2] 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3 1.5

Z[1,1] -3245858.6 3027342.4 4694869.0 -8296523.8 -8287517.5 -2200344.4 318705.7 6124306.6 2 2.0

Z[1,2] 4504070.7 3684314.0 7250106.8 -2874141.0 -657176.8 4265641.0 4280775.2 26880865.4 4 1.6

Z[1,3] 19626308.7 22365727.0 31572636.9 -1064155.1 -557280.3 350743.2 31023577.3 97536345.9 2 2.0

Z[1,4] 11655709.4 17601811.2 24830371.8 -8987598.7 -8958613.4 3195439.8 22584286.8 73142855.9 2 2.4

Z[1,5] -1122785.7 1150233.5 5007368.7 -13019383.0 -2837632.1 23611.4 617058.3 9846651.0 19 1.2

Z[1,6] 13407266.6 16947374.1 22742390.5 -7224670.5 -7181384.1 7740008.7 22675889.7 69516936.7 2 2.3

Z[1,7] 10338674.6 9285371.6 15815139.6 -730800.5 101530.3 3995719.8 12034942.7 55220472.6 3 1.9

Z[1,8] 7637098.9 7952136.4 13203756.9 -1226119.8 38935.5 1973112.7 9378422.6 45014451.8 3 1.5

Z[1,9] 1210542.8 1061282.0 1974234.5 -1194412.8 -164422.0 1245553.9 1254125.8 6725986.0 3 1.8

Z[1,10] -2693903.5 2686573.4 3841212.4 -7148124.2 -7139701.0 -685440.4 141490.1 4113091.7 2 2.0

Z[2,1] -5801621.5 9253713.8 14810936.4 -46321837.9 -9843362.2 -1341058.0 5800328.4 5816089.0 3 1.6

Z[2,2] 10096596.5 10222673.9 13586522.8 -6755993.9 -6735483.5 13484506.7 17905026.5 36518931.4 2 2.9

Z[2,3] 2800418.4 16855696.0 25850461.6 -26269696.8 -26218933.4 7560253.9 23756635.8 48619880.5 2 2.8

Z[2,4] 4772436.9 24295180.2 32893391.8 -48369262.5 -11745741.8 -9802489.5 39142586.9 60667291.8 2 2.3

Z[2,5] -18865730.6 21355658.6 27715893.3 -75868998.5 -40927244.9 -1598305.9 1489068.9 1516422.9 2 6.3

Z[2,6] 2154677.3 8229186.5 23962144.5 -42000505.6 -11155852.2 -11115477.7 18637537.0 57604470.1 8 1.4

Z[2,7] 10948755.3 21065524.9 28735783.6 -16035994.5 -16006463.7 5552546.1 22052034.6 89614348.3 2 2.1

Z[2,8] -11040127.8 9346621.3 34092517.8 -100445107.4 -9607665.2 -9582981.7 3254107.2 67018467.2 13 1.1

Z[2,9] 1951657.3 1878997.1 5318720.3 -9089229.1 -1078605.5 292002.1 4742178.3 14580104.7 8 1.6

Z[2,10] -2725101.6 8803446.4 11911511.2 -29859881.7 -10526394.2 -917495.2 9189455.9 9206906.8 2 2.6

sig_te 1.8 1.5 2.7 0.3 0.5 0.7 1.9 8.6 3 1.4

sig_lb 2.6 1.3 1.9 0.4 1.3 2.1 3.5 7.3 2 1.7

bet[1] 3.4 0.3 2.2 1.4 2.6 2.8 3.4 9.2 45 1.0

bet[2] 3.5 0.6 2.8 1.3 2.4 2.5 4.1 9.4 21 1.1

lp__ 336.9 5.5 11.7 312.5 328.5 339.3 345.5 358.0 5 1.3

Is it something wrong with my code and/or model? The MCMC seems to converge in Openbugs, after a somewhat long run (200000).

Any help will be appreciated.

Bob Carpenter

unread,

Feb 14, 2013, 3:24:44 PM2/14/13

to stan-...@googlegroups.com

The suspicious part of this is that the A and B matrices aren't sampling at all. You're declaring A as a parameter here:

matrix[d,N] Z;

matrix[n1,d] A;

and you define a transformed parameter:

mu_te <- A*Z;

and sample in the model with:

for(i in 1:n1)

for(j in 1:d)

A[i,j] ~ normal(0, beta_sqt[j]);

but I don't see any prior on Z itself, only its use through the transformed

parameter mu_te.

log(teta[i,j]) ~ normal(mu_te[i,j], sigte_sr);

This seems like a recipe for unified results where A tries

to be as small as possible and Z as big as possible.

I'd start by printing all the parameters, including the transformed parameters. The only way I can see A staying at 0 is if its standard deviation parameter stays at 0. Or if RStan is doing too much rounding of numbers in its prints. Maybe you could look at them in R itself using RStan's extract functions?

Some other tips:

You might want to use the built-in lognormal distribution, which can be vectorized and also unfolds the change-of-variables adjustment directly into the probability function, so it's much more efficient. You do the change-of-variables yourself here:

log(teta[i,j]) ~ normal(mu_te[i,j], sigte_sr);

lp__ <- lp__ - log(fabs(teta[i,j]));

You don't actually need the fabs, because teta is constrained to be positive in its declaration.

But that won't change convergence -- it looks like you have the correct adjustment.

In addition, you wouldn't need to take an absolute value of lambd, because it's declared with a lower bound of 0.

You can vectorize the normal distributions, replacing

for(i in 1:n2)

for(j in 1:d)

B[i,j] ~ normal(0, beta_sqt[j]);

with

for (i in 1:n2)

B[i] ~ normal(0,beta_sqt);

which will be much faster.

- Bob

choko04

unread,

Feb 15, 2013, 1:40:43 AM2/15/13

to stan-...@googlegroups.com

Bob,

Indeed I omit to define Z. Things seem to be working better now that I properly define it. But the very low effective sample size is still worrying some, say about 25 for 100,000 samples.