Modeling a K-distribution

461 views
Skip to first unread message

Luc Coffeng

unread,
May 27, 2013, 7:16:12 PM5/27/13
to
Dear Group-members, Stan developers,

I am trying to fit a latent K-distribution model to data. In short, a K-distribution is the distribution of the product of two random, gamma-distributed variables. In my case, the latter two random variables have expectation 1.0 (i.e. each gamma distribution has parameters alpha == beta), and for one of the random variables I also know the variance (alpha == beta == 3.5, so variance is 1/3.5). My objective is to estimate the unknown variance of the other (unobserved) variable, given (simulated) data.

To make things a little more complicated, the K-distributed variable of interest is unobserved (latent), and I only know whether it is above or below some threshold value. This means that I have to define the likelihood of the data in terms of the CDF for the K-distribution. After some hours of skull-cracking thinking, I (think I) managed to come up with a formulation based on the CDF of the inverse gamma distribution. This approach is based on the nice feature of the gamma distribution that when c*X ~ gamma(alpha,beta), X ~ gamma(c*alpha,beta).

However, when I run the model, it goes haywire. Below I posted the model syntax for rstan (v1.3.0) which I used in R 3.0.0 on my MacBook Pro. Mind that because for each gamma-distribution alpha==beta, only one parameter (shape1 or shape2) is specified for each gamma distribution. Further down, I posted the error message :D.

I suspect that I may be missing something Jacobian-related in the log-likelihood with regards to the first random gamma-distributed variable for which I know the variance (and which is used to scale the second gamma-distributed variable with unknown variance), but I'm not sure what it is. My brain is stuck there, and I'm not even sure whether this is the reason why rstan goes haywire (forgetting stuff in the log-likelihood function should just lead to wrong parameter estimates, not a model crashing).

Many thanks in advance for any input/thoughts!

Luc

------------------------------------

R-syntax:

rm(list=ls());

library(rstan);

set_cppo("fast");  # for best stan running speed

# Simulate data
N = 1000;
var1 <- 1/3.5;
var2 <- 1/20;
thres <- 2;
sim_data <- rgamma(N,1/var1,1/var1) * rgamma(N,1/var2,1/var2) > thres;

DATA <- list(
  N = N,
  outcome = as.numeric(sim_data),
  shape1 = 1/var1,
  thres = thres
);

# Model specification in STAN language
CODE <- '
  data {
  int<lower=1> N;                                   // Number of observations;
  int<lower=0,upper=1> outcome[N];                  // Outcome (0 = no, 1 = yes);
  real<lower=0> shape1;                             // Shape parameter of the first gamma distribution (with mean one);
  real<lower=0> thres;                              // Outcome threshold;
 }
parameters {
  real<lower=0> inv.sqrt.shape2;                    // Standard deviation of the second gamma distribution (with mean one);
}
transformed parameters {
  real<lower=0> shape2;                             // Shape of the second gamma distribution (with mean one);
  shape2 <- pow(inv.sqrt.shape2,-2);
}
model {
  real rscale[N];
   
  for (i in 1:N) {
    rscale[i] ~ gamma(shape2,shape2);
    lp__ <- lp__ + log( (1-outcome[i]) * inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1) + outcome[i] * (1-inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1)) );
  }
}
'

# Fit model with STAN;
fit <- stan(model_name = "Test K-distribution", model_code = CODE, data = DATA, iter = 1, chains = 0, savedso = TRUE);
fit.new <- stan(fit = fit,  data = DATA, pars = c("shape2"), iter = 2000, chains = 1, savedso = TRUE, verbose = FALSE);

R-console:
> fit <- stan(model_name = "Test K-distribution", model_code = CODE, data = DATA, iter = 1, chains = 0, savedso = TRUE);

TRANSLATING MODEL 'Test K-distribution' FROM Stan CODE TO C++ CODE NOW.
COMPILING THE C++ CODE FOR MODEL 'Test K-distribution' NOW.
During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_PAPER failed, using "C"
file522a5e77eb93.cpp: In member function 'void model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution::transform_inits(const stan::io::var_context&, std::vector<int, std::allocator<int> >&, std::vector<double, std::allocator<double> >&)':
file522a5e77eb93.cpp:112: error: expected initializer before '.' token
file522a5e77eb93.cpp:113: error: 'inv' was not declared in this scope
file522a5e77eb93.cpp: In member function 'T__ model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution::log_prob_poly(std::vector<T, std::allocator<_T2> >&, std::vector<int, std::allocator<int> >&, std::ostream*)':
file522a5e77eb93.cpp:138: error: expected initializer before '.' token
file522a5e77eb93.cpp:139: error: 'inv' was not declared in this scope
file522a5e77eb93.cpp: In member function 'void model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution::write_array(RNG&, std::vector<double, std::allocator<double> >&, std::vector<int, std::allocator<int> >&, std::vector<double, std::allocator<double> >&, std::ostream*)':
file522a5e77eb93.cpp:202: error: expected initializer before '.' token
file522a5e77eb93.cpp:203: error: 'inv' was not declared in this scope
file522a5e77eb93.cpp: In member function 'void model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution::write_csv(RNG&, std::vector<double, std::allocator<double> >&, std::vector<int, std::allocator<int> >&, std::ostream&, std::ostream*)':
file522a5e77eb93.cpp:249: error: expected initializer before '.' token
file522a5e77eb93.cpp:250: error: 'inv' was not declared in this scope
make: *** [file522a5e77eb93.o] Error 1

ERROR(s) during compilation: source code errors or compiler configuration errors!

Program source:
  1:
  2: // includes from the plugin
  3:
  4:
  5: // user includes
  6: #include <rstan/rstaninc.hpp>
  7: // Code generated by Stan version 1.3
  8:
  9: #include <stan/model/model_header.hpp>
 10:
 11: namespace model522a71334adc_Test_K_distribution_namespace {
 12:
 13: using std::vector;
 14: using std::string;
 15: using std::stringstream;
 16: using stan::agrad::var;
 17: using stan::model::prob_grad_ad;
 18: using stan::math::get_base1;
 19: using stan::math::stan_print;
 20: using stan::io::dump;
 21: using std::istream;
 22: using namespace stan::math;
 23: using namespace stan::prob;
 24: using namespace stan::agrad;
 25:
 26: typedef Eigen::Matrix<double,Eigen::Dynamic,1> vector_d;
 27: typedef Eigen::Matrix<double,1,Eigen::Dynamic> row_vector_d;
 28: typedef Eigen::Matrix<double,Eigen::Dynamic,Eigen::Dynamic> matrix_d;
 29:
 30: class model522a71334adc_Test_K_distribution : public prob_grad_ad {
 31: private:
 32:     int N;
 33:     vector<int> outcome;
 34:     double shape1;
 35:     double thres;
 36: public:
 37:     model522a71334adc_Test_K_distribution(stan::io::var_context& context__,
 38:         std::ostream* pstream__ = 0)
 39:         : prob_grad_ad::prob_grad_ad(0) {
 40:         static const char* function__ = "model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution(%1%)";
 41:         (void) function__; // dummy call to supress warning
 42:         size_t pos__;
 43:         (void) pos__; // dummy call to supress warning
 44:         std::vector<int> vals_i__;
 45:         std::vector<double> vals_r__;
 46:         context__.validate_dims("data initialization", "N", "int", context__.to_vec());
 47:         N = int(0);
 48:         vals_i__ = context__.vals_i("N");
 49:         pos__ = 0;
 50:         N = vals_i__[pos__++];
 51:         context__.validate_dims("data initialization", "outcome", "int", context__.to_vec(N));
 52:         stan::math::validate_non_negative_index("outcome", "N", N);
 53:         outcome = std::vector<int>(N,int(0));
 54:         vals_i__ = context__.vals_i("outcome");
 55:         pos__ = 0;
 56:         size_t outcome_limit_0__ = N;
 57:         for (size_t i_0__ = 0; i_0__ < outcome_limit_0__; ++i_0__) {
 58:             outcome[i_0__] = vals_i__[pos__++];
 59:         }
 60:         context__.validate_dims("data initialization", "shape1", "double", context__.to_vec());
 61:         shape1 = double(0);
 62:         vals_r__ = context__.vals_r("shape1");
 63:         pos__ = 0;
 64:         shape1 = vals_r__[pos__++];
 65:         context__.validate_dims("data initialization", "thres", "double", context__.to_vec());
 66:         thres = double(0);
 67:         vals_r__ = context__.vals_r("thres");
 68:         pos__ = 0;
 69:         thres = vals_r__[pos__++];
 70:         // validate data
 71:         try {
 72:             check_greater_or_equal(function__,N,1,"N");
 73:         } catch (std::domain_error& e) { throw std::domain_error(std::string("Invalid value of N: ") + std::string(e.what())); };
 74:         for (int k0__ = 0; k0__ < N; ++k0__) {
 75:             try {
 76:                 check_greater_or_equal(function__,outcome[k0__],0,"outcome[k0__]");
 77:                 check_less_or_equal(function__,outcome[k0__],1,"outcome[k0__]");
 78:             } catch (std::domain_error& e) { throw std::domain_error(std::string("Invalid value of outcome: ") + std::string(e.what())); };
 79:         }
 80:         try {
 81:             check_greater_or_equal(function__,shape1,0,"shape1");
 82:         } catch (std::domain_error& e) { throw std::domain_error(std::string("Invalid value of shape1: ") + std::string(e.what())); };
 83:         try {
 84:             check_greater_or_equal(function__,thres,0,"thres");
 85:         } catch (std::domain_error& e) { throw std::domain_error(std::string("Invalid value of thres: ") + std::string(e.what())); };
 86:
 87:         // validate transformed data
 88:
 89:         set_param_ranges();
 90:     } // dump ctor
 91:
 92:     void set_param_ranges() {
 93:         num_params_r__ = 0U;
 94:         param_ranges_i__.clear();
 95:         ++num_params_r__;
 96:     }
 97:
 98:     void transform_inits(const stan::io::var_context& context__,
 99:                          std::vector<int>& params_i__,
100:                          std::vector<double>& params_r__) {
101:         stan::io::writer<double> writer__(params_r__,params_i__);
102:         size_t pos__;
103:         std::vector<double> vals_r__;
104:         std::vector<int> vals_i__;
105:
106:
107:         if (!(context__.contains_r("inv.sqrt.shape2")))
108:             throw std::runtime_error("variable inv.sqrt.shape2 missing");
109:         vals_r__ = context__.vals_r("inv.sqrt.shape2");
110:         pos__ = 0U;
111:         context__.validate_dims("initialization", "inv.sqrt.shape2", "double", context__.to_vec());
112:         double inv.sqrt.shape2(0);
113:         inv.sqrt.shape2 = vals_r__[pos__++];
114:         writer__.scalar_lb_unconstrain(0,inv.sqrt.shape2);
115:         params_r__ = writer__.data_r();
116:         params_i__ = writer__.data_i();
117:     }
118:
119:     var log_prob(vector<var>& params_r__,
120:                  vector<int>& params_i__,
121:                  std::ostream* pstream__ = 0) {
122:       return log_prob_poly<true,var>(params_r__,params_i__,pstream__);
123:     }
124:
125:     template <bool propto__, typename T__>
126:     T__ log_prob_poly(vector<T__>& params_r__,
127:                       vector<int>& params_i__,
128:                       std::ostream* pstream__ = 0) {
129:
130:         T__ DUMMY_VAR__(std::numeric_limits<double>::quiet_NaN());
131:         (void) DUMMY_VAR__;  // suppress unused var warning
132:
133:         T__ lp__(0.0);
134:
135:         // model parameters
136:         stan::io::reader<T__> in__(params_r__,params_i__);
137:
138:         T__ inv.sqrt.shape2 = in__.scalar_lb_constrain(0,lp__);
139:         (void) inv.sqrt.shape2;  // supress unused variable warning
140:
141:         // transformed parameters
142:         T__ shape2;
143:
144:         // initialized transformed params to avoid seg fault on val access
145:                 stan::agrad::fill(shape2,DUMMY_VAR__);
146:
147:         assign(shape2, pow(inv.sqrt.shape2,-(2)));
148:
149:         // validate transformed parameters
150:         if (stan::agrad::is_uninitialized(shape2)) {
151:             std::stringstream msg__;
152:             msg__ << "Undefined transformed parameter: shape2";
153:             throw std::runtime_error(msg__.str());
154:         }
155:
156:         const char* function__ = "validate transformed params %1%";
157:         (void) function__; // dummy to suppress unused var warning
158:         try {
159:             check_greater_or_equal(function__,shape2,0,"shape2");
160:         } catch (std::domain_error& e) { throw std::domain_error(std::string("Invalid value of shape2: ") + std::string(e.what())); };
161:         // model body
162:         {
163:             vector<T__> rscale(N);
164:             for (int i = 1; i <= N; ++i) {
165:                 lp__ += stan::prob::gamma_log<true>(get_base1(rscale,i,"rscale",1), shape2, shape2);
166:                 assign(lp__, (lp__ + log((((1 - get_base1(outcome,i,"outcome",1)) * inv_gamma_cdf((1 / thres),(get_base1(rscale,i,"rscale",1) * shape1),shape1)) + (get_base1(outcome,i,"outcome",1) * (1 - inv_gamma_cdf((1 / thres),(get_base1(rscale,i,"rscale",1) * shape1),shape1)))))));
167:             }
168:         }
169:
170:         return lp__;
171:
172:     } // log_prob(...var...)
173:
174:
175:     void get_param_names(std::vector<std::string>& names__) {
176:         names__.resize(0);
177:         names__.push_back("inv.sqrt.shape2");
178:         names__.push_back("shape2");
179:     }
180:
181:
182:     void get_dims(std::vector<std::vector<size_t> >& dimss__) {
183:         dimss__.resize(0);
184:         std::vector<size_t> dims__;
185:         dims__.resize(0);
186:         dimss__.push_back(dims__);
187:         dims__.resize(0);
188:         dimss__.push_back(dims__);
189:     }
190:
191:     template <typename RNG>
192:     void write_array(RNG& base_rng__,
193:                      std::vector<double>& params_r__,
194:                      std::vector<int>& params_i__,
195:                      std::vector<double>& vars__,
196:                      std::ostream* pstream__ = 0) {
197:         vars__.resize(0);
198:         stan::io::reader<double> in__(params_r__,params_i__);
199:         static const char* function__ = "model522a71334adc_Test_K_distribution_namespace::write_array(%1%)";
200:         (void) function__; // dummy call to supress warning
201:         // read-transform, write parameters
202:         double inv.sqrt.shape2 = in__.scalar_lb_constrain(0);
203:         vars__.push_back(inv.sqrt.shape2);
204:
205:         // declare and define transformed parameters
206:         double lp__ = 0.0;
207:         (void) lp__; // dummy call to supress warning
208:         double shape2(0.0);
209:
210:         assign(shape2, pow(inv.sqrt.shape2,-(2)));
211:
212:         // validate transformed parameters
213:         try {
214:             check_greater_or_equal(function__,shape2,0,"shape2");
215:         } catch (std::domain_error& e) { throw std::domain_error(std::string("Invalid value of shape2: ") + std::string(e.what())); };
216:
217:         // write transformed parameters
218:         vars__.push_back(shape2);
219:
220:         // declare and define generated quantities
221:
222:
223:         // validate generated quantities
224:
225:         // write generated quantities
226:     }
227:
228:
229:     void write_csv_header(std::ostream& o__) {
230:         stan::io::csv_writer writer__(o__);
231:         writer__.comma();
232:         o__ << "inv.sqrt.shape2";
233:         writer__.comma();
234:         o__ << "shape2";
235:         writer__.newline();
236:     }
237:
238:     template <typename RNG>
239:     void write_csv(RNG& base_rng__,
240:                    std::vector<double>& params_r__,
241:                    std::vector<int>& params_i__,
242:                    std::ostream& o__,
243:                    std::ostream* pstream__ = 0) {
244:         stan::io::reader<double> in__(params_r__,params_i__);
245:         stan::io::csv_writer writer__(o__);
246:         static const char* function__ = "model522a71334adc_Test_K_distribution_namespace::write_csv(%1%)";
247:         (void) function__; // dummy call to supress warning
248:         // read-transform, write parameters
249:         double inv.sqrt.shape2 = in__.scalar_lb_constrain(0);
250:         writer__.write(inv.sqrt.shape2);
251:
252:         // declare, define and validate transformed parameters
253:         double lp__ = 0.0;
254:         (void) lp__; // dummy call to supress warning
255:         double shape2(0.0);
256:
257:         assign(shape2, pow(inv.sqrt.shape2,-(2)));
258:
259:         try {
260:             check_greater_or_equal(function__,shape2,0,"shape2");
261:         } catch (std::domain_error& e) { throw std::domain_error(std::string("Invalid value of shape2: ") + std::string(e.what())); };
262:
263:         // write transformed parameters
264:         writer__.write(shape2);
265:
266:         // declare and define generated quantities
267:
268:
269:         // validate generated quantities
270:
271:         // write generated quantities
272:         writer__.newline();
273:     }
274:
275: }; // model
276:
277: } // namespace
278:
279: /**
280:  * Define Rcpp Module to expose stan_fit's functions to R.
281:  */
282: RCPP_MODULE(stan_fit4model522a71334adc_Test_K_distribution_mod){
283:   Rcpp::class_<rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution,
284:                boost::random::ecuyer1988> >("stan_fit4model522a71334adc_Test_K_distribution")
285:     // .constructor<Rcpp::List>()
286:     .constructor<SEXP, SEXP>()
287:     // .constructor<SEXP, SEXP>()
288:     .method("call_sampler",
289:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::call_sampler)
290:     .method("param_names",
291:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::param_names)
292:     .method("param_names_oi",
293:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::param_names_oi)
294:     .method("param_fnames_oi",
295:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::param_fnames_oi)
296:     .method("param_dims",
297:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::param_dims)
298:     .method("param_dims_oi",
299:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::param_dims_oi)
300:     .method("update_param_oi",
301:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::update_param_oi)
302:     .method("param_oi_tidx",
303:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::param_oi_tidx)
304:     .method("grad_log_prob",
305:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::grad_log_prob)
306:     .method("log_prob",
307:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::log_prob)
308:     .method("unconstrain_pars",
309:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::unconstrain_pars)
310:     .method("constrain_pars",
311:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::constrain_pars)
312:     .method("num_pars_unconstrained",
313:             &rstan::stan_fit<model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution, boost::random::ecuyer1988>::num_pars_unconstrained)
314:     ;
315: }
316:
317: // declarations
318: extern "C" {
319: SEXP file522a5e77eb93( ) ;
320: }
321:
322: // definition
323:
324: SEXP file522a5e77eb93(  ){
325:  return Rcpp::wrap("Test K-distribution");
326: }
327:
328:
Error in compileCode(f, code, language = language, verbose = verbose) :
  Compilation ERROR, function(s)/method(s) not created! During startup - Warning messages:
1: Setting LC_CTYPE failed, using "C"
2: Setting LC_COLLATE failed, using "C"
3: Setting LC_TIME failed, using "C"
4: Setting LC_MESSAGES failed, using "C"
5: Setting LC_PAPER failed, using "C"
file522a5e77eb93.cpp: In member function 'void model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution::transform_inits(const stan::io::var_context&, std::vector<int, std::allocator<int> >&, std::vector<double, std::allocator<double> >&)':
file522a5e77eb93.cpp:112: error: expected initializer before '.' token
file522a5e77eb93.cpp:113: error: 'inv' was not declared in this scope
file522a5e77eb93.cpp: In member function 'T__ model522a71334adc_Test_K_distribution_namespace::model522a71334adc_Test_K_distribution::log_prob_poly(std::vector<T, std::allocator<_T2> >&, std::vector<int, std::allocator<int> >&, std::ostream*)':
file522a5e77eb93.cpp:138: error: expected ini
In addition: Warning message:
running command '/Library/Frameworks/R.framework/Resources/bin/R CMD SHLIB file522a5e77eb93.cpp 2> file522a5e77eb93.cpp.err.txt' had status 1
> fit.new <- stan(fit = fit,  data = DATA, pars = c("shape2"), iter = 2000, chains = 1, savedso = TRUE, verbose = FALSE);
Error in is(fit, "stanfit") : object 'fit' not found




Ben Goodrich

unread,
May 27, 2013, 7:45:50 PM5/27/13
to stan-...@googlegroups.com
On Monday, May 27, 2013 7:08:57 PM UTC-4, Luc Coffeng wrote:
Many thanks in advance for any input/thoughts!

For starters, don't use periods for symbol names anywhere in a Stan model (e.g. inv.sqrt.shape2). I think we used to catch this problem, but somehow didn't in this case. Then, you have this part
 

model {
  real rscale[N];
   
  for (i in 1:N) {
    rscale[i] ~ gamma(shape2,shape2);
    lp__ <- lp__ + log( (1-outcome[i]) * inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1) + outcome[i] * (1-inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1)) );
  }
}


where you try to model rscale[i] with a gamma log-likelihood without having first filled in the N elements of rscale. That causes R to crash, although we've been trying to reduce such crashes. I didn't exactly follow what you are trying to do, but at a minimum, you have to fill in rscale first. And if the expression for the elements of rscale involves a function of the parameter(s), then you do need to adjust the lp__ for the change-of-variables.

Ben


Bob Carpenter

unread,
May 28, 2013, 1:55:51 AM5/28/13
to stan-...@googlegroups.com
On 5/27/13 7:08 PM, Luc Coffeng wrote:
> Dear Group-members, Stan developers,
>
> I am trying to fit a latent K-distribution model to data. In short, a K-distribution is the distribution of the product
> of two random, gamma-distributed variables. In my case, the latter two random variables have expectation 1.0 (i.e. each
> distribution has parameters alpha == beta), and for one of the random variables I also know the variance (alpha == beta
> == 3.5, so variance is 1/3.5). My objective is to estimate the unknown variance of the other (unobserved) variable,
> given (simulated) data.
>
> To make things a little more complicated, the K-distributed variable of interest is unobserved (latent), and I only know
> whether it is above or below some threshold value. This means that I have to define the likelihood of the data in terms
> of the CDF for the K-distribution. After some hours of skull-cracking thinking, I (think I) managed to come up with a
> formulation based on the CDF of the inverse gamma distribution. This approach is based on the nice feature of the gamma
> distribution that when c*X ~ gamma(alpha,beta), X ~ gamma(c*alpha,beta).

The equations are easy enough to set up:

K(y|a1,b1,a2,b2) = INT gamma(u|a1,b1) * gamma(y/u|a2,b2) du

No Jacobian needed here. But you need to evaluate
the integral, and I'm terrible at this. And then
evaluate the the integral for the CDF.

> The model compiles succesfully. However, when I run the model, it goes haywire. Below I posted the model syntax for
> rstan (v1.3.0) which I used in R 3.0.0 on my MacBook Pro. Mind that because for each gamma-distribution alpha==beta,
> only one parameter (shape1 or shape2) is specified for each gamma distribution. Further down, I posted the error message
> :D. I suspect that I may be missing something Jacobian-related in the log-likelihood with regards to the first random
> gamma-distributed variable for which I know the variance (and which is used to scale the second gamma-distributed
> variable with unknown variance), but I'm not sure what it is. My brain is stuck there, and I'm not even sure whether
> this is the reason why rstan goes haywire (forgetting stuff in the log-likelihood function should just lead to wrong
> parameter estimates, not a model crashing).

Crashing is a bug and we're looking into fixing that
for the next release --- we need to initialize local
variables.

> thres <- 2;
> sim_data <- rgamma(N,1/var1,1/var1) * rgamma(N,1/var2,1/var2) > thres;
>
> DATA <- list(
> N = N,
> outcome = as.numeric(sim_data),
> shape1 = 1/var1,
> thres = thres
> );
>
> # Model specification in STAN language
> CODE <- '
> data {
> int<lower=1> N; // Number of observations;
> // Outcome (0 = no, 1 = yes);
> real<lower=0> shape1; // Shape parameter of the first gamma distribution (with mean one);
> real<lower=0> thres; // Disease threshold;
> }
> parameters {
> real<lower=0> inv.sqrt.shape2; // Standard deviation of the second gamma distribution (with mean one);
> }
> transformed parameters {
> real<lower=0> shape2; // Shape of the second gamma distribution (with mean one);
> shape2 <- pow(inv.sqrt.shape2,-2);

As Stan's written now, it's more efficient to write

shape2 <- 1 / (iss2 * iss2);

We'll eventually rewrite pow() so that it recognizes the -2
exponent as a special case.

> }
> model {
> real rscale[N];
>
> for (i in 1:N) {
> rscale[i] ~ gamma(shape2,shape2);
> lp__ <- lp__ + log( (1-outcome[i]) * inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1) + outcome[i] *
> (1-inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1)) );
> }
> }

The real problem here conceptually for Stan is that the ~ is
not an assignment operator. So you can't take an undefined local
variable rscale[i] and use ~ on it. The way Stan works,

y ~ foo(theta);

translates to the same thing as:

lp__ <- lp__ + foo_log(y,theta);

So you need to define rscale as a parameter.

That's not so easy because of the constraint. Stan only allows homogeneous
constraints on parameters. So what you need to do is break the data
down into an array of constrained-above and an array of constrained below
values. Say their counts are N1 and N2, then you'd have:

real<lower=0,upper=1/thres> rscale_upper[N1];
real<lower=1/thres> rscale_lower[N2];

Then just sample without the (1 - outcome[i]) and outcome[i] indicators in
two separate loops.

The values N1 and N2 and rscale_upper and rscale_lower can all be
calculated in Stan.


Coding this the way you did in Stan is very inefficient because all
the functions get calculated. Instead of:

> lp__ <- lp__ + log( (1-outcome[i]) * inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1) + outcome[i] *
> (1-inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1)) );

you would have wanted (if you didn't have to break rscale in two
because of the constraints):

if (outcome[i])
lp__ <- lp__ + log(1 - inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1));
else
lp__ <- lp__ + log(inv_gamma_cdf(1/thres,rscale[i]*shape1,shape1));

For arithmetic stability, you also want log1m(z) instead of log(1 - z)
everywhere.

- Bob






Luc Coffeng

unread,
May 28, 2013, 11:02:17 AM5/28/13
to
Thanks Bob and Ben for your feedback on my K-distribution problem! I'm trying to integrate your comments, will report results later :).

Bob proposed to define the likelihood of data y as INT gamma(u|a1,b1) * gamma(y/u|a2,b2) du. This integral is solveable (see wikipedia on K-distribution), but leads to a PDF that contains a modified Bessel function of the second kind which has the form K(alpha,x) = INT[0,Inf] exp(-x*cosh(t)) * cosh(alpha*t) dt. Not surprisingly, this type of function is not supported by Stan. I don't even attempt to think what the CDF would be for this.

I also understand now why I have to define rscale as a parameter, though I'm not sure why it should be constrained (other than being positive real). rscale is just another parameter which feeds into the likelihood of observation y occuring or not, and y[i] can theoretically occur given any value of rscale[i].

Will get back with results :)

- Luc

P.s.: most commercial companies could learn a lot from the level of 'service' that is given in this group! Thnx guys!

Bob Carpenter

unread,
May 28, 2013, 5:18:32 PM5/28/13
to stan-...@googlegroups.com


On 5/28/13 11:01 AM, Luc Coffeng wrote:
> Thanks Bob and Ben for your feedback on my K-distribution problem! I'm trying to integrate your comments, will report
> results later :).
>
> Bob proposed to define the likelihood of data y as INT gamma(u|a1,b1) * gamma(y/u|a2,b2) du. This integral is solveable
> (see wikipedia on K-distribution), but leads to a PDF that contains a modified Bessel function of the second kind which
> has the form K(alpha,x) = INT[0,Inf] exp(-x*cosh(t) * cosh(alpha*t) dt. Not surprisingly, this type of function is not
> supported by Stan. I don't even attempt to think what the CDF would be for this.

Michael Betancourt may know. He did the
CDFs and their derivatives with respect to
parameters for Stan. It's pretty hairy. We have
exposing Bessel functions in Stan on our to-do
list.

> I also understand now why I have to define rscale as a parameter, though I'm not sure why it should be constrained
> (other than being positive real). rscale is just another parameter which feeds into the likelihood of observation y
> occuring or not, and y[i] can theoretically occur given any value of rscale[i].

You need to constrain a parameter to its support.
So you're right --- I was misreading what was being
truncated. You should only need to constrain rscale[n]
to be positive.

> Will get back with results :)
>
> - Luc
>
> P.s.: most commercial companies could learn a lot from the level of 'service' that is given in this group! Thnx guys!

You're welcome. It really helps us to engage users, not just
for their benefit, but for our own, so we can see what's
being done with Stan and what we need to add. We've been
trying to take user feedback seriously, especially on
error handling/reporting and on doc. And we want to see how
far the system can be pushed in terms of model complexity.

- Bob

> On Tuesday, 28 May 2013 07:55:51 UTC+2, Bob Carpenter wrote:
>
> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Michael Betancourt

unread,
May 28, 2013, 5:29:52 PM5/28/13
to stan-...@googlegroups.com
It wouldn't be much different from the CDFs, but certainly not trivial.

Ben Goodrich

unread,
May 29, 2013, 1:05:47 AM5/29/13
to stan-...@googlegroups.com
On Monday, May 27, 2013 7:08:57 PM UTC-4, Luc Coffeng wrote:
I am trying to fit a latent K-distribution model to data. In short, a K-distribution is the distribution of the product of two random, gamma-distributed variables. In my case, the latter two random variables have expectation 1.0 (i.e. each gamma distribution has parameters alpha == beta), and for one of the random variables I also know the variance (alpha == beta == 3.5, so variance is 1/3.5). My objective is to estimate the unknown variance of the other (unobserved) variable, given (simulated) data.

I think this would be easier in Stan if you thought of the K distribution as a compound distribution as described in WIkipedia:

http://en.wikipedia.org/wiki/K-distribution

"The model is that X has a gamma distribution with mean sigma and shape parameter L, with sigma being treated as a random variable having another gamma distribution, this time with mean mu and shape parameter nu."

So if X were fully observed, the .stan file would be something like the following (with a reparameterization from shape-mean to shape-rate)

data {
 
int<lower=1> N;
  real
<lower=0> X[N];
}
parameters
{
  real
<lower=0> sigma; // mean
  real
<lower=0> L;     // shape
  real
<lower=0> mu;    // mean
  real
<lower=0> nu;    // shape
}
model
{
  sigma
~ gamma(nu, nu / mu);
  X
~ gamma(L,L / sigma);
 
// priors on parameters
}
 
To make things a little more complicated, the K-distributed variable of interest is unobserved (latent), and I only know whether it is above or below some threshold value.

In this case, X is latent but above or below a known point, so we need to constrain its support and renormalize the second gamma distribution. This is easiest if you partition X into X_0 (cases below the threshold) and X_1 (cases above the threshold):

  real<lower=0> sigma;
  real
<lower=0> L;
  real
<lower=0> mu;
  real
<lower=0> nu;

I think that is correct up to typos, although there apparently is no gamma_cdf() function in v1.3.0 (it is already in for the next version of Stan)

Ben

Luc Coffeng

unread,
May 29, 2013, 10:12:26 AM5/29/13
to stan-...@googlegroups.com
Thanks Ben,

Your proposal for the compounded gamma distributions seem to work, but only when there is more information on at least one of the unobserved variable X and Y. In other words, estimating parameters shape2 is very difficult if there is only one K-distributed variable Z. This is most likely due to the negative correlation of the two unobserved, gamma-distributed random variables X and Y (of which Z is the observed product). The PDF for the K-distribution would solve this problem, I suppose, as it only needs two shape parameters (of which one is known as we know the distribution of X!). Until such a function becomes available (or the Bessel function for that matter), the compounded gamma distribution can only be adequately modelled given data Z1 = X*Y1 and Z2 = X*Y2. Then things become much easier to estimate as the model now has a better idea of what X should be, given Z1 and Z2 and the distribution of X.

Notes on the side:
- there being no gamma_cdf() function is not a problem because Stan has a function for the inverse gamma CDF and: gamma_cdf(x) = 1 - invgamma_cdf(1/x)
- with regard to your proposal to model "X ~ gamma(L,L / sigma)", Stan currently does not allow multiplication of scalar L with (the inverse of) vector sigma. I've moved the transformation to the transformed parameter section (I guess this should be done anyway because of the Jacobian, right?), but also there, multiplication of vector and scalar did not work, and I had to loop things over the elements of sigma.

For future work: I found a definition of the K-distributioned PDF and CDF in Mathetica, where they are defined it in terms of two shape parameters. Both the PDF and CDF use gamma functions and modified Bessel functions of the second kind. In addition, the CDF uses a Pochhammer symbol (which can be defined in terms of the gamma function). If the mean of the K-distribution is 1, one of the shape parameters can be expressed in terms of the other shape parameter using the Pochhammer symbol. I have tried to work the formulation in Mathematica around to a function defined in terms of the shape parameters of the compounded gamma distributions, but to no avail. If anybody has an idea how this can be done, this would be great!

Below I have pasted an example syntax of the K-distribution problem, formulated as two compounded gamma-distributions.

Luc

--------------------------------------------------------
rm(list=ls());

library(rstan);


# Simulate data;
N = 1000;
var1 <- 1/3.5;  # Lvl 1: gamma variation X;
var2 <- 1/20;   # Lvl 2: gamma variation Y1;
var3 <- 1/10;   # Lvl 2: gamma variation Y2;
thres <- 2;
X <- rgamma(N,1/var1,1/var1);
Y1 <- rgamma(N,1/var2,1/var2);
Y2 <- rgamma(N,1/var3,1/var3);
Z1 <- X * Y1;
Z2 <- X * Y2;


DATA <- list(
  N = N,
  Z1 = Z1,
  Z2 = Z2,
  shape1 = 1/var1

);

# Model specification in STAN language
CODE <- '
  data {
  int<lower=1> N;                                   // Number of observations;
  real<lower=0> Z1[N];                              // Outcome 1;
  real<lower=0> Z2[N];                              // Outcome 2;
  real<lower=0> shape1;                             // Shape parameter of the first gamma distribution;
 }
parameters {
  real<lower=0> X[N];                               // Random variable from the first gamma distribution (with expectation one);
  real<lower=0> inv_sqrt_shape2;                    // Standard deviation of the second gamma distribution;
  real<lower=0> inv_sqrt_shape3;                    // Standard deviation of the third gamma distribution;
}
transformed parameters {
  real<lower=0> shape2;                             // Shape of the second gamma distribution;
  real<lower=0> shape3;                             // Shape of the third gamma distribution;
  real<lower=0> rate2[N];                           // Rate of the second gamma distribution;
  real<lower=0> rate3[N];                           // Rate of the third gamma distribution;
  shape2 <- 1/(inv_sqrt_shape2 * inv_sqrt_shape2);
  shape3 <- 1/(inv_sqrt_shape3 * inv_sqrt_shape3);

  for (i in 1:N) {
    rate2[i] <- shape2 / X[i];
    rate3[i] <- shape3 / X[i];
  }
}
model {   
  X ~ gamma(shape1,shape1);
  Z1 ~ gamma(shape2,rate2);
  Z2 ~ gamma(shape3,rate3);

}
'

# Fit model with STAN;
fit <- stan(model_name = "Test nested gamma distribution", model_code = CODE, data = DATA, iter = 1, chains = 0, savedso = TRUE);
fit.new <- stan(fit = fit,  data = DATA, pars = c("shape2","shape3"), iter = 2000, chains = 2, savedso = TRUE, verbose = FALSE);



--
You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/Lib1Wxj6x70/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,
May 29, 2013, 1:20:07 PM5/29/13
to stan-...@googlegroups.com


On 5/29/13 10:12 AM, Luc Coffeng wrote:
> Your proposal for the compounded gamma distributions seem to work, but
> only when there is more information on at least
> one of the unobserved variable X and Y. In other words, estimating parameters shape2
> is very difficult if there is only
> one K-distributed variable Z. This is most likely due to the negative correlation of the two unobserved,
> gamma-distributed random variables X and Y (of which Z is the observed product).

Is the problem mainly due to identifiability issue? And if
so, does it matter for the variables you care about?

> - with regard to your proposal to model "|X ~ gamma(L,L / sigma)|", Stan currently does not allow multiplication of
> scalar L with (the inverse of) vector sigma. I've moved the transformation to the transformed parameter section (I guess
> this should be done anyway because of the Jacobian, right?), but also there, multiplication of vector and scalar did not
> work, and I had to loop things over the elements of sigma.

Are you expecting y = L / sigma to have elements
y[n] = L / sigma[n]? You're right that Stan doesn't
do that directly -- it's not matrix division.

We could do something elementwise, though, so that with

y <- L ./ sigma;

we would have y[n] = L / sigma[n]. It's not quite an elementwise
op, either.

With Stan now, you could do:

y <- rep_vector(L,size(sigma)) ./ sigma;

But the loop should be faster here and arguably easier to comprehend.

> For future work: I found a definition of the K-distributioned PDF and CDF in Mathetica, where they are defined it in
> terms of two shape parameters. Both the PDF and CDF use gamma functions and modified Bessel functions of the second
> kind. In addition, the CDF uses a Pochhammer symbol (which can be defined in terms of the gamma function).

It'd be easy enough to add the Pochhammer symbols:

falling_factorial(x,n) = x! / n!

rising_factorial(x,n) = (x + n - 1)! / (x - 1)!

> If the mean
> of the K-distribution is 1, one of the shape parameters can be expressed in terms of the other shape parameter using the
> Pochhammer symbol. I have tried to work the formulation in Mathematica around to a function defined in terms of the
> shape parameters of the compounded gamma distributions, but to no avail. If anybody has an idea how this can be done,
> this would be great!

As far as I can tell from grepping, we don't have any of the Bessel
functions implemented internally yet. This also shouldn't be too hard
to do given Boost.

We might be able to squeeze them in for the next release.

> Below I have pasted an example syntax of the K-distribution problem, formulated as two compounded gamma-distributions.

Thanks for sharing the model code.

- Bob

Luc Coffeng

unread,
May 30, 2013, 5:56:26 AM5/30/13
to stan-...@googlegroups.com
On Wednesday, 29 May 2013 19:20:07 UTC+2, Bob Carpenter wrote:
On 5/29/13 10:12 AM, Luc Coffeng wrote:
 > Your proposal for the compounded gamma distributions seem to work, but
 > only when there is more information on at least
 > one of the unobserved variable X and Y. In other words, estimating parameters shape2
 > is very difficult if there is only
 > one K-distributed variable Z. This is most likely due to the negative correlation of the two unobserved,
 > gamma-distributed random variables X and Y (of which Z is the observed product).

Is the problem mainly due to identifiability issue?  And if
so, does it matter for the variables you care about?

The problem is that observation Z = X*Y and both X and Y are latent. If I do 1,000 NUTS II samples, I get an effective number of samples of about five to ten for the shape parameter of the gamma distribution of Y (which I would like to now). Same happens when I use NUTS I or the full mass matrix approach. Apparently, none of these algorithms is currently very efficient at numerically integrating over all values of X (of which we happen to know the distribution). I'm not sure whether this answers your question.
 

> - with regard to your proposal to model "|X ~ gamma(L,L / sigma)|", Stan currently does not allow multiplication of
> scalar L with (the inverse of) vector sigma. I've moved the transformation to the transformed parameter section (I guess
> this should be done anyway because of the Jacobian, right?), but also there, multiplication of vector and scalar did not
> work, and I had to loop things over the elements of sigma.

Are you expecting y = L / sigma to have elements
y[n] = L / sigma[n]?  You're right that Stan doesn't
do that directly -- it's not matrix division.

We could do something elementwise, though, so that with

    y <- L ./ sigma;

we would have y[n] = L / sigma[n].  It's not quite an elementwise
op, either.

With Stan now, you could do:

    y <- rep_vector(L,size(sigma)) ./ sigma;

But the loop should be faster here and arguably easier to comprehend.


My bad, I just found out I was still using arrays instead of vectors. By modeling the inverse of sigma with an inv-gamma-distribution, I could just do L * inv-sigma without the need for a loop.
 
> For future work: I found a definition of the K-distributioned PDF and CDF in Mathetica, where they are defined it in
> terms of two shape parameters. Both the PDF and CDF use gamma functions and modified Bessel functions of the second
> kind. In addition, the CDF uses a Pochhammer symbol (which can be defined in terms of the gamma function).

It'd be easy enough to add the Pochhammer symbols:

    falling_factorial(x,n) = x! / n!

    rising_factorial(x,n) = (x + n - 1)! / (x - 1)!

> If the mean
> of the K-distribution is 1, one of the shape parameters can be expressed in terms of the other shape parameter using the
> Pochhammer symbol. I have tried to work the formulation in Mathematica around to a function defined in terms of the
> shape parameters of the compounded gamma distributions, but to no avail. If anybody has an idea how this can be done,
> this would be great!

As far as I can tell from grepping, we don't have any of the Bessel
functions implemented internally yet.  This also shouldn't be too hard
to do given Boost.

We might be able to squeeze them in for the next release.


Looking forward to it!
 

Luc Coffeng

unread,
May 30, 2013, 10:36:26 AM5/30/13
to stan-...@googlegroups.com
Dear all,

I have taken the model to its next stage: from the K-distribution to the censored K-distribution (i.e. the original goal). Based on the working example of the K-distribution (nested gamma distributions) model that I shared in a previous post, I have now changed the continuous response Z3 to a dichotomous variable that indicates whether Z3 is below or above some known threshold. As before, I modelled the likelihood of Z3 with the gamma CDF; for this I used one minus the inv_gamma_cdf(1/x) as there is no gamma_cdf in Stan. When I start sampling, either of the three following things happens:

1) Error : Error in function boost::math::tgamma<e>(e): Result of tgamma is too large to represent.
error occurred during calling the sampler; sampling not done

2) The model start adapting, but at a slower and slower pace, until it coming to a grinding halt, freezing R.

3) The model flies through the simulations, resulting in effective sample size = 1.

I have attached the R script. If you take out the modelling of Z3, or set it to be continuous again, there is no problem whatsoever.

Is this a bug? Can I resolve it?

Luc
test censored nested gamma distribution STAN.r

Ben Goodrich

unread,
May 30, 2013, 12:37:50 PM5/30/13
to stan-...@googlegroups.com
I haven't looked at the new code in detail yet, but for any model with latent variables, you almost certainly will need proper priors on the (shape) parameters to get it to work well.

Ben

Luc Coffeng

unread,
May 30, 2013, 4:08:44 PM5/30/13
to stan-...@googlegroups.com
Thanks for pointing that out Ben. Unfortunately, specifying proper priors or even informative priors (as in attached script) did not solve the problem. What it did change though is that R just freezes in the adaption phase at every attempt, whereas before, I sometimes still got an error message.
test censored nested gamma distribution STAN.r

Ben Goodrich

unread,
May 30, 2013, 5:05:25 PM5/30/13
to stan-...@googlegroups.com
We just merged something that should hopefully prevent the hanging

https://github.com/stan-dev/stan/issues/82

Usually, that is a sign that something is amiss with the model, but the samplers in Stan are still being ironed out right now.

Ben

Bob Carpenter

unread,
May 30, 2013, 5:15:43 PM5/30/13
to stan-...@googlegroups.com


On 5/30/13 5:05 PM, Ben Goodrich wrote:
> We just merged something that should hopefully prevent the hanging
>
> https://github.com/stan-dev/stan/issues/82
>
> Usually, that is a sign that something is amiss with the model, but the samplers in Stan are still being ironed out
> right now.

Ben's referring to Stan 2.0. The Stan 1.3 samplers in
our current release should be fine as far as behavior
goes if they work. We're still trying to trap errors
so there's no freezing or crashing of R from ill-behaved
models (indexes out of bounds, inits out of support,
constrained support for parameters not matching declarations,
etc.).

- Bob


>
> Ben
>
> On Thursday, May 30, 2013 4:08:44 PM UTC-4, Luc Coffeng wrote:
>
> Thanks for pointing that out Ben. Unfortunately, specifying proper priors or even informative priors (as in attached
> script) did not solve the problem. What it did change though is that R just freezes in the adaption phase at every
> attempt, whereas before, I sometimes still got an error message.
>
> On Thursday, 30 May 2013 18:37:50 UTC+2, Ben Goodrich wrote:
>
> I haven't looked at the new code in detail yet, but for any model with latent variables, you almost certainly
> will need proper priors on the (shape) parameters to get it to work well.
>
> Ben
>
> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Ben Goodrich

unread,
May 30, 2013, 5:34:28 PM5/30/13
to stan-...@googlegroups.com
If you include the test_grad = TRUE argument in the call to stan(), you can see that the partial derivatives of the log_posterior with respect to the three shape parameters are very large in magnitude.

...
       
500               0        -497.507        -497.507        6.84e-06
       
501               0        -497.505        -497.505       1.123e-05
       
502               0        -72.2355        -72.2357      0.00025697

where the columns are the parameter number, the starting value (in the unbounded space), the auto-differentiation derivative, the numerical derivative, and the difference between the previous two. It is going to have a hard time adapting if the surface were really so steep. This suggests to me that maybe there is a mistake in one of the identities you use, but I didn't see one at first glance.

Ben

Bob Carpenter

unread,
May 30, 2013, 6:20:44 PM5/30/13
to stan-...@googlegroups.com
The good news is that Peter's already mostly done with
the Bessel functions of the first and second kind (for
constant integer order) and the rising and falling
factorials.

Given what Ben observed about the gradients, one thing
to check if you adapt for a bit is what the varying step
sizes are for each of the components. We should be able to
adapt to this amount of scale, but the combination of varying
scales and correlations in the posterior can be difficult.

More inline.

On 5/30/13 10:36 AM, Luc Coffeng wrote:
> Dear all,
>
> I have taken the model to its next stage: from the K-distribution to the censored K-distribution (i.e. the original
> goal). Based on the working example of the K-distribution (nested gamma distributions) model that I shared in a previous
> post, I have now changed the continuous response Z3 to a dichotomous variable that indicates whether Z3 is below or
> above some known threshold. As before, I modelled the likelihood of Z3 with the gamma CDF; for this I used one minus the
> inv_gamma_cdf(1/x) as there is no gamma_cdf in Stan. When I start sampling, either of the three following things happens:
>
> 1) Error : Error in function boost::math::tgamma<e>(e): Result of tgamma is too large to represent.
> error occurred during calling the sampler; sampling not done

Can you work on the log scale? tgamma() is the gamma function
and lgamma() is the log of the gamma function.

> 2) The model start adapting, but at a slower and slower pace, until it coming to a grinding halt, freezing R.

It may just have found a very small step size and be taking a very
large number of iterations. You can try turning down the max number
of iterations.

> 3) The model flies through the simulations, resulting in effective sample size = 1.

That'll happen if it doesn't move.

> I have attached the R script. If you take out the modelling of Z3, or set it to be continuous again, there is no problem
> whatsoever.
>
> Is this a bug? Can I resolve it?

The usual problem is a variable out of support or a variable
without a prior. I don't see either of those issues in your
model.

Is the X in the model supposed to be the same as the
X in the R code to generate data? If so, I don't
understand why it's not:

X ~ gamma(1/var1,1/var1);

instead of what you have, which is essentially:

X ~ inv_gamma(1/var1, 1/var1);

Our gamma is parameterized the same way as R's (R's "rate"
is our "inverse scale").

I can't imagine this would cause sampling to grind to
a halt even if it were misspecified.

And are you sure that the inv_gamma is doing what you want?
What you'd like to write is:

log(gamma_cdf(thres, shape4, shape4 * X[i]));

but we don't have gamma_cdf, so you used:

log(inv_gamma_cdf(1/thres,shape4,shape4*X[i])

Can someone else verify this is OK?

As a minor point, you want to declare this:

int<lower=0> Z3[N]; // Outcome 3 (0 = under threshold, 1 = above threshold);

with upper=1, since it's going to be boolean.

- Bob

Bob Carpenter

unread,
May 30, 2013, 6:31:13 PM5/30/13
to stan-...@googlegroups.com
On 5/30/13 6:20 PM, Bob Carpenter wrote:
> The good news is that Peter's already mostly done with
> the Bessel functions of the first and second kind (for
> constant integer order) and the rising and falling
> factorials.

And now that I look at the form of the K distribution, it
looks like we need the modified (i.e., hyperbolic) Bessel
functions. Looks like Peter's going to be busy :-)

- Bob

Luc Coffeng

unread,
May 31, 2013, 4:19:28 AM5/31/13
to stan-...@googlegroups.com
Wow, thnx again for all the input guys.

I appreciate the point that you might not want to put too much stuff into Stan. The K-distribution is a distribution used in the field of sonar/radar engineering, which is definitely not my field. Personally, I hope to introduce it to the biomedical field :D.

On the point of the Bessel function needed for the K-distribution: yes, it requires a modified (hyperbolic) Bessel function of the second kind.

For the sake of comparison, I attach two scripts, each to model three K-distributed variables Z1, Z2 and Z3, and the only difference between the scripts being that in one script, Z3 is censored (plus informative prior on the shape parameter for the distribution of Z3). As I mentioned before, the uncensored Z3-script works like a charm, whereas the censored one gives the problem we are discussing. I put some responses inline below.


On Friday, 31 May 2013 00:20:44 UTC+2, Bob Carpenter wrote:

On 5/30/13 10:36 AM, Luc Coffeng wrote:
> 1) Error : Error in function boost::math::tgamma<e>(e): Result of tgamma is too large to represent.
> error occurred during calling the sampler; sampling not done

Can you work on the log scale?  tgamma() is the gamma function
and lgamma() is the log of the gamma function.

I actually tried defining all calculation on the log-scale, but I would need the log_gamma_cdf(), log_inv_gamma_cdf() or at least the log_incompl_gamma() function for that. With the latter, you can manually define the log_inv_gamma_cdf().
 

> 2) The model start adapting, but at a slower and slower pace, until it coming to a grinding halt, freezing R.

It may just have found a very small step size and be taking a very
large number of iterations.  You can try turning down the max number
of iterations.

> 3) The model flies through the simulations, resulting in effective sample size = 1.

That'll happen if it doesn't move.

> I have attached the R script. If you take out the modelling of Z3, or set it to be continuous again, there is no problem
> whatsoever.
>
> Is this a bug? Can I resolve it?

The usual problem is a variable out of support or a variable
without a prior.  I don't see either of those issues in your
model.

My point :). And specifying informative priors (and appropriately constraining the support!) doesn't help either.
 

Is the X in the model supposed to be the same as the
X in the R code to generate data?  If so, I don't
understand why it's not:

    X ~ gamma(1/var1,1/var1);

instead of what you have, which is essentially:

    X ~ inv_gamma(1/var1, 1/var1);

Our gamma is parameterized the same way as R's (R's "rate"
is our "inverse scale").

I can't imagine this would cause sampling to grind to
a halt even if it were misspecified.

And are you sure that the inv_gamma is doing what you want?

Yes absolutely. My bad is that in the model, I should have called this 'X' 'Xinv' instead. I use the inverse of X because at some point I have to divide a scalar ('shape#'') by the vector 'X'. To speed things up (which it absolutely does!), I therefore model 'Xinv' and multiply this with the scalar. Nifty but perhaps not so handy for the sake of comprehension.
 
What you'd like to write is:

    log(gamma_cdf(thres, shape4, shape4 * X[i]));

but we don't have gamma_cdf, so you used:

    log(inv_gamma_cdf(1/thres,shape4,shape4*X[i])

 
N.B.: If an observation Z is known to be above 'thres', than 1/Z is known to be under 1/thres. The following statement corresponds to the log-probability of Z being above thres according to the gamma_cdf:
log1m(gamma_cdf(thres, shape4, shape4 * X[i]))

Therefore, the following corresponds to 1/Z being under 1/thres:
log(inv_gamma_cdf(1/thres,shape4,shape4*X[i])

Conversely, if Z is known to be under thres (1/Z is above 1/thres):
log1m(inv_gamma_cdf(1/thres,shape4,shape4*X[i])

 
test censored nested gamma distribution STAN.r
test nested gamma STAN.r

Luc Coffeng

unread,
May 31, 2013, 4:31:45 AM5/31/13
to stan-...@googlegroups.com
I tested the gradients for the uncensored model as well (which works like a charm), and there they are even extremer than those Ben found for the censored model:

...
    
  500       -0.106031         1080.54         1080.54      5.9033e-06
       501          1.9808        -911.981        -911.981    -1.00584e-05
       502        0.769544        -517.485        -517.485     5.98373e-06

Luc

Bob Carpenter

unread,
May 31, 2013, 12:26:22 PM5/31/13
to stan-...@googlegroups.com


On 5/31/13 4:19 AM, Luc Coffeng wrote:
> Wow, thnx again for all the input guys.

This has brought up a lot of interesting issues on our side
as you may have seen if you're following our stan-dev list.

> I appreciate the point that you might not want to put too much stuff into Stan. The K-distribution is a distribution
> used in the field of sonar/radar engineering, which is definitely not my field. Personally, I hope to introduce it to
> the biomedical field :D.

One of the things we've been discussion is why more
highly parameterized distributions and compound distributions
aren't used more, and think it may largely be a matter of
convenience. If we can fit these in Stan, it'd be a huge
win not only for us, but for statistical practice in general!


> On the point of the Bessel function needed for the K-distribution: yes, it requires a modified (hyperbolic) Bessel
> function of the second kind.

Peter Li's knocked that out already (Bessel of first and second
kind, and modified Bessel of first and second kind, as well as
the rising/falling factorials in straight-up and log form).
It is a tradeoff and Stan's still more like a programming
language than a pure statistical modeling language in that
people will always rewrite models to make them faster.

I like to write out the simpler models, then gradually improve
them for speed. If you keep the earlier models around, it
provides a guide to others.

You can also add a little doc, but that's often difficult to
do without including so much side commentary that it's distracting.

> What you'd like to write is:
>
> log(gamma_cdf(thres, shape4, shape4 * X[i]));
>
> but we don't have gamma_cdf, so you used:
>
> log(inv_gamma_cdf(1/thres,shape4,shape4*X[i])
>
> N.B.: If an observation Z is known to be above 'thres', than 1/Z is known to be under 1/thres. The following statement
> corresponds to the log-probability of Z being *above* thres according to the gamma_cdf:
> log*1m*(gamma_cdf(thres, shape4, shape4 * X[i]))
>
> Therefore, the following corresponds to 1/Z being under 1/thres:
> log(inv_gamma_cdf(1/thres,shape4,shape4*X[i])
>
> Conversely, if Z is known to be under thres (1/Z is above 1/thres):
> log*1m*(inv_gamma_cdf(1/thres,shape4,shape4*X[i])

Thanks for the explanation. We've stepped squarely into Ben's territory now.

- Bob
Reply all
Reply to author
Forward
0 new messages