Also, out of curiousity -- what approaches do people on the list tend to use
for missing data analysis? I have used EM for lower rates (when
appropriate) and Schaffer's NORM mulitple imputation program for datasets
with a larger percent of missing data. My experience with NORM was
positive, and the use of multiple imputation seems more defensible in terms
of handling larger rates if missing data, but the process of going
back-and-forth from SPSS to NORM and then combining results from multiple
imputations of the data was extremely cumbersome (hence my question on
limits to EM).
Christian M. Connell, Ph.D.
Postdoctoral Psychology Fellow, The Consultation Center
Yale University School of Medicine
ccon...@theconsultationcenter.org
> Is anyone familiar with references that address limitations of EM (Missing
> Data Analysis) with respect to level of missing data. I was under the
> impression that EM should not be used if rates were above 20%, but can not
> find any specific references that address the issue.
< snip, Q about other approaches people use. >
I consider it hazardous to replace Missing with some estimate.
The more complicated that it is to do and to describe, the more
hazardous it is, just because you can't keep track of odd influences.
I try to find a way around the Missing problem so that I can describe
where my numbers come from, and what biases they might include.
Computing a "composite score" is half a solution - it takes the
problem out of the computer program that wants "complete data".
Why estimate any missing? - You have to be careful that you don't skew
what you are attempting to test. Or estimate.
If I need "complete data" so that a particular algorithm (computer
program) will run, I think the limit has to depend on the particular
application. If 20% replacement of missing is okay, why not 30%? I
am not familiar with 20% as a guideline; I don't know what sort of
data that it should apply to. I think I would be wary about *any*
analysis on medical/clinical data that had over 10%, and 5% might
seem high for most.
Using what I know about the statistical technique, I try to consider,
for the data on hand, how robust the analysis will be. Then, it is
fine to go beyond that a-priori judgment, if possible: that is, it is
good to test the robustness by using other analyses that may be less
complete or less powerful or less informative. (Simplified model? a
test on ranked or dichotomized scores?)
What is really a no-no is when your Estimation procedure has "created"
the significant effect by counting the same evidence more than once.
Or, pretending to extra (not in the data) degrees of freedom.
I don't know how easy it is to learn to spot those.
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html
I believe there is a bug with SPSS's implementation of EM. The imputed
covariance matrix is okay, but the raw data it imputes has attenuated
variance (i.e., it is inconsistent with the covariance matrix). If you can
use the covariance matrix in your analysis (regression, SEM, etc.) this is
not a problem. If SPSS has fixed this problem, I would like to be so
informed.
Superior imputation software (free and better) is available at:
http://methcenter.psu.edu/mde.shtml
for norm and
http://www.jamesarbuckle.com/amos/applications/index.htm
for a beta program.
Alan Acock
--
***************************************************
The Acock's
Alan Acock's Address is ac...@home.com
Toni Acock's is antoni...@home.com
"Rich Ulrich" <wpi...@pitt.edu> wrote in message
news:c40d2tg640889hh1c...@4ax.com...