measurement errors - where are they?

21 views
Skip to first unread message

josef...@gmail.com

unread,
Sep 9, 2016, 11:27:22 PM9/9/16
to pystatsmodels
distortion from measurement errors and correcting them

What happens if we only observe variables with measurement errors?
How can we correct the resulting biases if measurement error variances
or distribution are known?
What can we do if we only have repeated samples, a validation sample,
or instruments?

a simple example variance and correlation with known measurement error
variance to get a taste for what's going on:

small Monte Carlo

true variables are bivariate normal with unit variance and 0.5 correlation
measurement error variance = 0.2 and 0.4 for the two variables
we only observe the variables with measurement errors

reliability: true over observed variance is around (0.8, 0.7)

measurement error heteroscedasticity:
first 30% observations have half the measurement error of the remaining 70 %
sample size nobs = 100


>>> print(sig)
[[ 1. 0.5]
[ 0.5 1. ]]
>>> tau
array([ 0.2, 0.4])
>>> rel
array([ 0.83333333, 0.71428571])



items in MC results
variance 1, covariance 1,2, variance 2 and correlation coefficients
mean estimates from 1000 Monte Carlo random normal replications

true is [1, 0.5, 1, 0.5]

raw variances and correlations are biased,
variance estimate is too large, correlations is too small

cov, corr data
[ 1.19862791 0.49077207 1.38944785 0.37886929]

three correction methods based on method of moments:

cov, corr simple
[ 0.99862791 0.49077207 0.98944785 0.49468383]
cov, corr weighted1
[ 0.97962066 0.49032111 0.91916492 0.51638522]
cov, corr weighted2
[ 0.99788211 0.49032111 0.9894017 0.49460425]


methods:
simple is a one liner
the weighted methods are trying to increase efficiency by using
weighted instead of simple unweighted moment estimates (details are
"home made").
All of those look much better than using raw covariances and correlations.

The same will happen in a linear regression setting with biased
coefficient estimates when using OLS.
---

in case anyone is interested
https://github.com/statsmodels/statsmodels/issues/3187
are my comments about one week of trying to figure out what
measurement errors in statistics are all about.

I just learned enough to have a rough overview and so that I'm able to
review PRs that you could submit now :)

(personally:
Some corrected versions of OLS based on method of moments look
relatively easy, and I might start when I'm bored for a day or three.
I doubt that I will start with implementing measurement error
corrections for GLM and similar until we have the related IV versions
for endogeneity.)


I have no idea what R has to offer. But if anyone has suggestions what
methods or replication of other packages to put on the wish or todo
list, then any additional information to get a structure and
priorities into this would be useful.


Josef
too many topics
Reply all
Reply to author
Forward
0 new messages