Adding weight to the likelihood as glm in R does

1,689 views
Skip to first unread message

andre.p...@googlemail.com

unread,
Jul 29, 2014, 6:15:56 PM7/29/14
to stan-...@googlegroups.com
Dear all,

I'd like to weight my samples same R glm function does with weights parameter. 
Thus I calculated the log-likelihood, multiplied it by the weights[i] vector and incremented
the log prob by hand. 

       ll <- poisson_log(counts[i],la); 
       increment_log_prob(weights[i] * ll); 

See attached file weight_example.R for a very short example. 

I'm not sure, if it is correct way. 
Happy to read your comments,

Andre


weight_example.R

Bob Carpenter

unread,
Jul 29, 2014, 6:28:20 PM7/29/14
to stan-...@googlegroups.com
It would help if you told us in formulas what you want
the posterior to be. I didn't understand the rather
ad-hoc looking use of weights in the glm() function in R:

Non-NULL weights can be used to indicate that different observations
have different dispersions (with the values inweights being inversely
proportional to the dispersions); or equivalently, when the elements of
weights are positive integers w_i, that each response y_i is the mean
of w_i unit-weight observations. For a binomial GLM prior weights are
used to give the number of trials when the response is the proportion
of successes: they would rarely be used for a Poisson GLM.

If you just want to treat each instance as the equivalent of
weights[i] fractional instances, then what you have is correct.

But instead of this:

la <- exp(a);
for(i in 1:N) {
ll <- poisson_log(counts[i],la);
increment_log_prob(weights[i] * ll);

you're better off for arithmetic statiblity using

for (i in 1:N)
increment_log_prob(weights[i] * poisson_log_log(counts[i], a));

That is, use the log-scale version of Poisson.

I'd also recommend tighter priors than normal(0,100) unless
your data is on that scale.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <weight_example.R>

Marco Inacio

unread,
Jul 29, 2014, 7:17:25 PM7/29/14
to stan-...@googlegroups.com
Are actually interested in exposure instead of weights? That is, suppose you are modeling the number of accidents in some roads as a Poisson random variable.

The exposure would be the period of the time each road was observed (maybe you observed one road for 1.2 year and the other for 2 years).

Just asking this since exposure is pretty common in Poisson models and if it's that your case you shouldn't be using the weights parameter in glm, but rather offset=log(exposures).

andre.p...@googlemail.com

unread,
Jul 30, 2014, 3:59:34 AM7/30/14
to stan-...@googlegroups.com
Dear all,

I'd like to adjust the weight as stated here:


(X^T W X) * beta = X^T * W * y with w_ii = sqrt(W_ii) as weight
matrix. 

I didn't see how this can be done in STAN. I just showed a 
poisson realization, because all the generalized linear models
only differ to the link function. Once we would know how to
implement the above we would like to be able to weigh 
all different kinds of distributions. 

The code showed was just my 2 cents workaround to the above
problem.

Andre


Bob Carpenter

unread,
Jul 30, 2014, 12:15:48 PM7/30/14
to stan-...@googlegroups.com
You mean in this section:

http://en.wikipedia.org/wiki/Weighted_least_squares#Weighted_least_squares

I couldn't quite understand the least-squares formulation.

Andrew might jump in with a more Bayesian model-based solution,
because the application looks to my untrained eye a lot like his pet
example, 8 schools.

The way to work out what happens in Stan is to just look
at the posterior. If you have a standard linear regression with
no priors and coefficient and noise scale parameters beta and sigma,
then the log posterior is proportional to

log p(beta,sigma|y,x) propto SUM_n log normal(y[n] | x[n] * beta, sigma)

= SUM_n -log(sigma) - [0.5 * (y[n] - x[n] * beta) / sigma)]^2


Where I'm using propto loosely on the log scale to mean equal up to
an additive constant that doesn't depend on beta or sigma.

So if you do

for (n in 1:N)
increment_log_prob(normal_log(y[n], x[n] * beta, sigma));

this is the posterior you get.

If you multiply the term added to the log prob by w[i], as in

for (n in 1:N)
increment_log_prob(w[n] * normal_log(y[n], x[n] * beta, sigma));

then the posterior will be

log p(beta, sigma | y, x, w) propto

SUM_n w[n] * { -log(sigma) - [0.5 * (y[n] - x[n] * beta) / sigma)]^2 }

If that's what you want, then you're good to go.

Even if you think it's right analytically, I would run some tests
on problems with known answers to make sure you're getting the
right answers. (Of course, that's tricky because the MLE isn't
necessarily close to the posterior mean until you get a lot of data.)

- Bob

Andrew Gelman

unread,
Jul 30, 2014, 1:11:19 PM7/30/14
to stan-...@googlegroups.com
If you want to do optimization, you can do this with no problem.  The code in the Stan model sets up the unnormalized log-posterior, which is the objective function that is being maximized.  So you get the weighted least squares solution.  If you want to do inference, you need a probability model, and in that case you can still do this but then you have to think a bit more carefully about what your model is.

 

andre.p...@googlemail.com

unread,
Jul 30, 2014, 1:15:02 PM7/30/14
to stan-...@googlegroups.com
Bob, 

yes I meant this section. But I thought for the normal prob. its something like:

 for (n in 1:N) 
    increment_log_prob(normal_log(y[n]*w[n], (x[n] * beta) * w[n], mean(w[n]*sigma)); 

Have seen in Matlab they use the inverse of the link function multiply both sides by
w[n] and then use the Least - Squares to minimize. I think this is too tricky.

Thanks for replying back,
Andre

Bob Carpenter

unread,
Jul 30, 2014, 1:22:55 PM7/30/14
to stan-...@googlegroups.com

On Jul 30, 2014, at 1:15 PM, andre.p...@googlemail.com wrote:

> Bob,
>
> yes I meant this section. But I thought for the normal prob. its something like:
>
> for (n in 1:N)
> increment_log_prob(normal_log(y[n]*w[n], (x[n] * beta) * w[n], mean(w[n]*sigma));
>

I don't understand the term mean(w[n] * sigma) because the argument is
just a scalar.

If you work through what that does (dropping the mean), then you get

-log(w[n] * sigma) - ((w[n] * y[n] - w[n] * x[n] * beta) / w[n] * 2 * sigma)^2

= -log(w[n]) - log(sigma) - w[n]^2 * ((y[n] - x[n] * beta) / 2 * sigma)^2

which isn't quite the same as multiplying the original log prob by w[n].

If you just want to weight the instances so that they count as w[n]
instances, then you do just want to multiply the original log prob.

If that's not what's going on (and I didn't understand the Wikipedia article),
then you have to figure out the posterior log probability you want and
code that up.

- Bob


Reply all
Reply to author
Forward
0 new messages