weighted regression

634 views
Skip to first unread message

Dirk Nachbar

unread,
Mar 23, 2017, 5:34:59 AM3/23/17
to Stan users mailing list
How would you implement a weighted model in stan, where certain obs have higher weights (more impact on parameters) than others?

Dirk

Dirk Nachbar

unread,
Mar 23, 2017, 8:44:23 AM3/23/17
to Stan users mailing list
I found this http://stats.stackexchange.com/questions/133553/bayesian-weighted-linear-regression which does give me a hint that I need to divide sigma by the weight

CoreySparks

unread,
Mar 23, 2017, 10:06:31 AM3/23/17
to stan-...@googlegroups.com
Are you using data from a complex survey? I've got a rpub document here http://rpubs.com/corey_sparks/157901 that illustrates using the brms package to include survey weights in the analysis. I believe the functions just weight the likelihood by the supplied weight. I typically normalize the weight first (divide by the mean weight) so as not to totally overpower the analysis. 

There's also a thread here https://groups.google.com/forum/#!topic/stan-users/wZUb1KclmB4 on using stan for survey analysis.

-Corey

Bob Carpenter

unread,
Mar 23, 2017, 2:55:22 PM3/23/17
to stan-...@googlegroups.com
Instead of, say

y ~ normal(x * beta, sigma);

in a regression, you want to first unfold

for (n in 1:N)
y[n] ~ normal(x[n] * beta, sigma);

then switch to incrementing

for (n in 1:N)
target += normal_lpdf(y[n] | x[n] * beta, sigma);

and then you can weight

for (n in 1:N)
target += w[n] * normal_lpdf(y[n] | x[n] * beta, sigma);

It's as if you had w[n] count rather than a count of 1 for
that item.

- Bob


> On Mar 23, 2017, at 5:34 AM, 'Dirk Nachbar' via Stan users mailing list <stan-...@googlegroups.com> wrote:
>
> How would you implement a weighted model in stan, where certain obs have higher weights (more impact on parameters) than others?
>
> Dirk
>
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jonah Gabry

unread,
Mar 23, 2017, 11:45:39 PM3/23/17
to Stan users mailing list
What Bob showed is technically how to do it but with these kinds of fixed weights on the log likelihood it's not a generative model and so you've stepped outside the fully Bayesian inference we recommend. Where do the weights come from in your case? Are these survey weights or something else?

Jonah

Jonah Gabry

unread,
Mar 23, 2017, 11:50:49 PM3/23/17
to Stan users mailing list
I should clarify that I was thinking about the survey weight case in particular. That is not a generative model for any data generating process. We typically recommend not using weights and instead conditioning on the relevant variables and then post-stratifying in generated quantities (or after running Stan).

Jonah

Dirk Nachbar

unread,
Mar 24, 2017, 5:37:34 AM3/24/17
to stan-...@googlegroups.com
Thank you all for the ideas.

The weights are my own, it's for a time series model where I want to over-weight recent obs.



On 24 March 2017 at 04:50, Jonah Gabry <jga...@gmail.com> wrote:
I should clarify that I was thinking about the survey weight case in particular. That is not a generative model for any data generating process. We typically recommend not using weights and instead conditioning on the relevant variables and then post-stratifying in generated quantities (or after running Stan).

Jonah

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/v4CoBWUehwU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

To post to this group, send email to stan-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--


Dirk Nachbar | Data Scientist - Attribution 360 | London | dnac...@google.com | +44-7826952162

Andrew Gelman

unread,
Mar 24, 2017, 12:33:25 PM3/24/17
to stan-...@googlegroups.com
Rather than over-weighting certain observations, I recommend that you alter your model to express what you want to express, that explains why you want these observations to count more.
A
You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.

To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Michael Peck

unread,
Mar 24, 2017, 1:46:56 PM3/24/17
to Stan users mailing list


On Thursday, March 23, 2017 at 10:45:39 PM UTC-5, Jonah Gabry wrote:
What Bob showed is technically how to do it but with these kinds of fixed weights on the log likelihood it's not a generative model and so you've stepped outside the fully Bayesian inference we recommend. Where do the weights come from in your case? Are these survey weights or something else?

Jonah


This is probably changing the subject a little bit, but it's quite common for astrophysical data to have at least notionally known variances that vary between observations. See (for example) https://arxiv.org/abs/1509.00908, which has a companion R package. Someday I may try to reproduce his model in Stan.

Yes, there is always a generative model  for the variance.

Andrew Gelman

unread,
Mar 24, 2017, 1:48:29 PM3/24/17
to stan-...@googlegroups.com
Yes, if variance varies between observations, this is very easy to model in Stan.

Michael Betancourt

unread,
Mar 24, 2017, 2:04:26 PM3/24/17
to stan-...@googlegroups.com
Yup.  And there are numerous time-series models where
more recent data is weighted more heavily than older data,
for example moving average models which are discussed
in the manual.
Reply all
Reply to author
Forward
0 new messages