weighted least squares, extract coef, and std of error term

49 views
Skip to first unread message

Tamas Papp

unread,
Aug 22, 2016, 7:27:00 AM8/22/16
to julia...@googlegroups.com
suppose I have variables y, x, and weights w, eg

x = randn(100)
σ = 0.1
w = rand(length(x))
y = x + σ*randn(length(x))./w

and I want to estimate

1. the coefficient and the intercept (something around 0, and 1)

2. and some simple estimate for the standard deviation of the error term
(eg something around 0.1).

What's the recommended way of doing this in Julia? GLM allows me to fit
the model, but I could not figure out how to do it with weights, and I
have a hard time extracting residuals etc to calculate an estimator for
sigma (maybe I am doing it wrong, but residuals is not supported for the
fit object).

Julia 0.5-rc2, using latest DataFrames, last released GLM.

To be clear, this does what I want:

function myfit(y, X, w)
W = Diagonal(sqrt(w))
WX = W*X
Wy = W*y
β = WX \ Wy
ϵ = Wy-WX*β
(β, sqrt(dot(ϵ, ϵ)/(size(X, 1)-size(X, 2))))
end

myfit(y, hcat(ones(length(x)), x), w)

but I want to learn the "standard" way of doing this.

Best,

Tamas

Milan Bouchet-Valat

unread,
Aug 22, 2016, 9:34:54 AM8/22/16
to julia...@googlegroups.com
Unfortunately, I don't think lm/LinearModel supports weights currently,
though the code is prepared in some places to handle them.

You can fit a generalized linear model with a normal distribution
instead, but weights will be interpreted as case weights, while
weighted least squares would correspond to inverse-variance/precision
weights. This is currently underdocumented. The code to do that would
be:

df = DataFrame(x=x, y=y, w=w)
fit(GeneralizedLinearModel, y ~ x, df, Normal(), IdentityLink(), wts=w)


Contributions would be welcome of course!


Regards

> Best,
>
> Tamas
>

Tamas Papp

unread,
Aug 22, 2016, 9:57:38 AM8/22/16
to julia...@googlegroups.com
Thanks. A clarifying question: GLM uses the weight to multiply the
likelihood for each observation, is this correct? (that's what it looks
like from sources).

Also, how should I go about extracting some estimate of the variance of
the error term from GLM? Is that supported?

Best,

Tamas
Reply all
Reply to author
Forward
0 new messages