Non-NULL weights can be used to indicate that different observations have different dispersions (with the values in weights being inversely proportional to the dispersions); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations. For a binomial GLM prior weights are used to give the number of trials when the response is the proportion of successes: they would rarely be used for a Poisson GLM.
If non-NULL, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used. Non-NULL weights can be used to indicate that different observations have different variances (with the values in weights being inversely proportional to the variances); or equivalently, when the elements of weights are positive integers w_i, that each response y_i is the mean of w_i unit-weight observations (including the case that there are w_i observations equal to y_i and the data have been summarized).
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+unsubscribe@googlegroups.com.
On Jun 23, 2015, at 7:00 PM, Ben Goodrich <goodri...@gmail.com> wrote:
For now I guess we should follow Bob’s advice and use increment_log_prob() with weights, and when weights are specified in stan_regression, we spit out a warning and also print out the sum of the weights.
Other opinions?
Ben
I still don't see the problem.
I think we should allow non-integer weights. R returns an error if you give it negative weights, so we can do that too.
I agree that it’s not Bayesian; it’s what is sometimes called a quasi-likelihood in that it acts mathematically as a likelihood function but there is no generative model. Still, it is a well-defined target distribution and I think we should allow it. Cos people are gonna want to do it.
We should be able handle the weights without needing separate .stan files using using if clauses. For example. here's how it currently looks for discrete outcomes (right now this means the likelihood is bernoulli or poisson, but this will eventually be extended to others too):
if (has_weights == 0) { // unweighted log-likelihoods
if (family == 1) { // family = binomial
if (link == 1) y ~ bernoulli_logit(eta);
else {
vector[N] pi;
pi <- linkinv_binom(eta, link);
y ~ bernoulli(pi);
}
}
else { // family = poisson
if (link == 1) y ~ poisson_log(eta);
else {
vector[N] phi;
phi <- linkinv_pois(eta, link);
y ~ poisson(phi);
}
}
}
else { // weighted log-likelihoods
vector[N] summands;
if (family == 1) // bernoulli
summands <- pw_binom(y, eta, link);
else // poisson
summands <- pw_pois(y, eta, link);
increment_log_prob(dot_product(weights, summands));
}
(The linkinv_binom and linkinv_pois functions apply the appropriate inverse link function to the linear predictor, and the pw_binom and pw_pois functions compute the pointwise log-likelihoods.)
The cleaner the stan_regression files are, the easier it wil be for people to customize them.
Maybe what we should do is really improve the pedagogical content in the example models library (repository) so that the R package can point users there.
What would be cool is if stan_glm would print a link to a webpage with an example model using using the same outcome variable type, likelihood, and link as the user's model.