Help fitting a simple Bayesian Generalized Method of Moments (GMM) model

Jaron

unread,

Mar 23, 2017, 5:17:10 PM3/23/17

to Stan users mailing list

For those not familiar with bayesian generalized method of moments (GMM), the basic idea is really simple:

1) construct estimating equations u for parameters of interest theta

we are interested in solving u=0 to obtain estimates of theta

2) then the sample mean U = 1/N * sum(u_1,....,u_N) will be asymptotically normal with mean 0 and covariance which is consistently estimated by Sigma_hat (formula is given in Yin's paper).

3) Apply a normal likelihood function to U (with mean 0 and covariance Sigma_hat) and flat priors for theta, then use MCMC to estimate the posterior distribution for all parameters of interest

Yin's 2009 paper proposed the Bayesian GMM which I am trying to fit in STAN. Essentially the model I'm trying to fit is a Bayesian anlog to the frequentist GEE model using an indepdence working correlation structure. I simulated some simple data which consists of N 2-person families, thus we do not have independent subjects since the response (Y) will be correlated within each family. However, GMM can be used to conduct valid inference on the mean parameters despite the fact that our working independent correlation structure is wrong.

I attached some simple simulated data and my current .stan file. Lastly, here is my R code to fit the model in Rstan:

library(rstan)
load("GEE1data1.RData")
data=list(Y=simdata$Y,X1=simdata$X1,N=length(table(simdata$famID)),N2=100.0)
fit = stan(file = "stan_GEE1_ind.stan",data = data, cores = 1, chains = 1, iter = 10)

However the model fails to fit. Specifically, it appears the main problem is with the final line in my .stan file where i attempt to define the multivariate normal likelihood from step (3):

target += -0.5*U'*inverse(Sigma)*U;

Any help getting this model to fit would be greatly appreciated!! I will note that I was unable to get this model to fit in JAGS, and I'm now trying to decide if it's possible to fit bayesian GMM models in STAN.

Thanks

stan_GEE1_ind.stan

GEE1data1.RData

Ben Goodrich

unread,

Mar 23, 2017, 6:04:07 PM3/23/17

to Stan users mailing list

On Thursday, March 23, 2017 at 5:17:10 PM UTC-4, Jaron wrote:

However the model fails to fit. Specifically, it appears the main problem is with the final line in my .stan file where i attempt to define the multivariate normal likelihood from step (3):
target += -0.5*U'*inverse(Sigma)*U;

The actual problem (or at least one of them) is that you define Sigma as (on line 67)

Sigma= 1/N2^2 * u2temp - 1/N2 * U*U';

while utemp2 is incremented in the loop ending on line 60 as

u2temp = u2temp + u*u';

but u2temp is declared but not defined on line 30

matrix[3,3] u2temp;

in which case it is initialized to NaN, in which case the elements of Sigma are all NaN, and the pseudo-likelihood fails.

Perhaps you meant to initialize u2temp to a matrix of zeros before starting the loop, in which case you could change line 30 to

matrix[3,3] u2temp = rep_matrix(0, 3, 3);

Ben

Jaron

unread,

Mar 23, 2017, 7:26:17 PM3/23/17

to Stan users mailing list

My model appears to fit fine now, thanks a lot! Interestingly I was unable to get this model to converge in JAGS but it fits fine in STAN, looks like I'm going to be using STAN a lot more from now on.

Thanks

Jonah Gabry

unread,

Mar 23, 2017, 8:06:14 PM3/23/17

to Stan users mailing list

Glad to hear that Stan has been useful for you. Let us know if you have questions in the future and we always like to hear about the interestinf work people are doing with Stan, so keep us posted!

Jonah

Bob Carpenter

unread,

Mar 23, 2017, 9:22:20 PM3/23/17

to stan-...@googlegroups.com

Glad it finally fit for you, but there are problems with this
from a Stan perspective. First, you could use some indentation and
spacing around operators so that it's more readable.

These two:

real<lower=0> phi; //variance
...
phi~uniform(0,100000);

are not compatible. You really should be declaring

real<lower=0, upper=1000000> phi;

because Stan requires every variable meeting the constraints
to have a finite log density (2000000 meets your constraint
but has 0 log density because of the bounds on the uniform).

But then you'll get initializations around 500,000 because
you've said the values are equally likely to be anywhere
between 0 and 1M. Is that what you intended? If not,
we recommend actually using priors that do express what
you know ahead of time about phi.

You can also just get rid of the uniform(...) and use
an improper prior. Then you initialize around 1 because
that's the center of the exp() transform to get values
to be positive.

For efficiency, don't duplicate calculations:

sk[1]=(Y[2*i-1]-yhat[1])^2;
sk[2]=(Y[2*i]-yhat[2])^2;
Fk[1]=Y[2*i-1]-yhat[1];
Fk[2]=Y[2*i]-yhat[2];

Instead, you can vectorize the definition of sk as

Fk[1] = Y[2*i-1] - yhat[1];
Fk[2] = Y[2*i] - yhat[2];
sk[1] = square(Fk[1])
sk[2] = square(Fk[2]);

or

Fk[1:2] = Y[(2 * i - 1):(2 * i)] - yhat[1:2];
sk[1:2] = square(Fk[1:2]);

Also later,

V[4, 4] = V[3, 3];

You never want to compute inverses, so instead of

u= D'*inverse(V)*Fk;

it's much more stable to write

D' / V * Fk;

where D' / V =def= D' * inverse(V). Same with later calc.

We have a quad form, but you're inverting, so you can't use that.
We should probably have an inverse quad form, and also a more
general multiply_self_transpose---as is, we don't have the inverse
quad form and only have multiply_self_transpose for triangular
matrices.

- Bob

> On Mar 23, 2017, at 5:17 PM, Jaron <arbe...@umn.edu> wrote:
>
> For those not familiar with bayesian generalized method of moments (GMM), the basic idea is really simple:
> 1) construct estimating equations u for parameters of interest theta
> we are interested in solving u=0 to obtain estimates of theta
> 2) then the sample mean U = 1/N * sum(u_1,....,u_N) will be asymptotically normal with mean 0 and covariance which is consistently estimated by Sigma_hat (formula is given in Yin's paper).
> 3) Apply a normal likelihood function to U (with mean 0 and covariance Sigma_hat) and flat priors for theta, then use MCMC to estimate the posterior distribution for all parameters of interest
>
> Yin's 2009 paper proposed the Bayesian GMM which I am trying to fit in STAN. Essentially the model I'm trying to fit is a Bayesian anlog to the frequentist GEE model using an indepdence working correlation structure. I simulated some simple data which consists of N 2-person families, thus we do not have independent subjects since the response (Y) will be correlated within each family. However, GMM can be used to conduct valid inference on the mean parameters despite the fact that our working independent correlation structure is wrong.
>
> I attached some simple simulated data and my current .stan file. Lastly, here is my R code to fit the model in Rstan:
> library(rstan)
> load("GEE1data1.RData")
> data=list(Y=simdata$Y,X1=simdata$X1,N=length(table(simdata$famID)),N2=100.0)
> fit = stan(file = "stan_GEE1_ind.stan",data = data, cores = 1, chains = 1, iter = 10)
>
> However the model fails to fit. Specifically, it appears the main problem is with the final line in my .stan file where i attempt to define the multivariate normal likelihood from step (3):
> target += -0.5*U'*inverse(Sigma)*U;

You really don't want to implement this this way.
You can do U' / Sigma * U, and it should be more stable.

>
> Any help getting this model to fit would be greatly appreciated!! I will note that I was unable to get this model to fit in JAGS, and I'm now trying to decide if it's possible to fit bayesian GMM models in STAN.
>
> Thanks
>

> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <stan_GEE1_ind.stan><GEE1data1.RData>

Jaron

unread,

Mar 24, 2017, 12:02:18 PM3/24/17

to Stan users mailing list

Hi Bob,

Thanks for the STAN coding advice. Here are my comments:

I simply want a flat non-informative prior for phi and I know STAN applies flat priors by default so I will remove the "uniform(0,100000);" prior specification since it is unnecessary and incorrect as you mentioned
I will try to vectorize my code more. As you probably noticed, I am filling in the elements of a vector or matrix one element at a time which is tedious... I look forward to the future STAN 2.15 update you mentioned (in my other thread) that will allow me to easily fill in the elements of matrices and vectors.
Not sure what you mean by "Also later, V[4, 4] = V[3, 3];".... I think you may be confused about V and Sigma. V is a 4x4 matrix used in the construction of the estimating equation u , and u is a 3x1 vector. Sigma is the 3x3 estimated covariance matrix of u.
Instead of using the inverse() function to calculate the inverse of a matrix, I will use the division operator "/" instead as you mentioned

Thanks again

Michael Betancourt

unread,

Mar 24, 2017, 2:03:29 PM3/24/17

to stan-...@googlegroups.com

Uniform priors are not non-informative. See

http://mc-stan.org/documentation/case-studies/weakly_informative_shapes.html

for some discussion.

Bob Carpenter

unread,

Mar 25, 2017, 4:50:20 PM3/25/17

to stan-...@googlegroups.com

> On Mar 24, 2017, at 12:02 PM, Jaron <arbe...@umn.edu> wrote:
>

> ...
> • Not sure what you mean by "Also later, V[4, 4] = V[3, 3];".... I think you may be confused about V and Sigma. V is a 4x4 matrix used in the construction of the estimating equation u , and u is a 3x1 vector. Sigma is the 3x3 estimated covariance matrix of u.

I just meant your program contained these two statements:

V[3,3]=2*phi^2;
V[4,4]=2*phi^2;

and the second should be replaced with:

V[4, 4] = V[3, 3];

It saves a multiply and square going forward and an equivalent
amount of work in propagating derivatives (though it's worse because
it's like interpreted code) going backwards. In general, you want
to reuse values rather than recompute them. When they're recomputed,
they take up more space on the expression graph.

Maybe Sean will get around to finding them in code and optimizing
our graphs so it'll do this automatically under the hood. But that's
not going into 2.15!

- Bob

Reply all

Reply to author

Forward