Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

How to replace missing value with mean in SPSS

3,262 views
Skip to first unread message

zhon...@aol.com

unread,
Oct 7, 2007, 10:04:52 PM10/7/07
to
Suppose we have 100 variables with some missing values for each of
them. I would like to use the mean to replace the missing values.
Suppose all missing values are coded as "9999" or a dot ".". We have
two ways to calculate means. (1) Calculate 100 means, one mean for one
variable and use each mean to replace the corresponding missing
values. (2) Calculate one column of means based on all 100 variables,
and then use each mean in the column to replace all missing values in
the corresponding case. I would like to do this by one click. We
should have two programs,
one program for one case.
Also, assume we have 1000 similar SPSS data sets with the same
problem. How do we write one Microsoft Windows program to do once for
all. Your kind help will be appreciated greatly!
-----
Frank

Bruce Weaver

unread,
Oct 8, 2007, 8:36:00 AM10/8/07
to

The MVA module has mean substitution. But before using that method, you
might want to take a look at these references posted by Frank Harrell
earlier this year in another group.


@Article{don06rev,
author = {Donders, A. Rogier T. and {van der Heijden},
Geert
J. M. G. and Stijnen, Theo and Moons, Karel G. M.},
title = {Review: {A} gentle introduction to imputation of
missing values},
journal = J Clin Epi,
year = 2006,
volume = 59,
pages = {1087-1091},
annote = {missing data;imputation;simple demonstration of
failure of indicator (new category) method}
}

@Article{hei06imp,
author = {{van der Heijden}, Geert J. M. G. and Donders,
A. Rogier T. and Stijnen, Theo and Moons, Karel G. M.},
title = {Imputation of missing values is superior to
complete
case analysis and the missing-indicator method in
multivariable diagnostic research: {A} clinical
example},
journal = J Clin Epi,
year = 2006,
volume = 59,
pages = {1102-1109},
annote = {missing data;imputation;invalidity of adding
extra
categories or missing value indicators;bias;precision;complete case
analysis;single imputation}

}


--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."

monia9PL

unread,
Oct 8, 2007, 10:13:52 AM10/8/07
to
Try to use Sax Basic (script language for SPSS similar to VBA). With
it you can open many files one after another and make some operations
on each of them.

2) compute v_mean = mean(first_variable_name to last_variable_name).
do repeat x = first_variable_name to last_variable_name.
if mis(x) x = v_mean.
end repeat.
del var v_mean x.
exe.

this is syntax code you can call from Sax Basic. Having a data set
open, you can "take" the names of first and last variable needed here.
If they are named in an intuitive way (ex. var1, var2...), you can
just put var1 to var100.

1) Try to click this out for 1 variable from the menu, paste the
syntax, use "do repeat" as above to loop it over variables and you
have the syntax you can call from your Sax Basic code.

Hope that helps and don't hesitate to ask if anything is unclear
Monika


On Oct 8, 2:36 pm, Bruce Weaver <bwea...@lakeheadu.ca> wrote:

> bwea...@lakeheadu.cawww.angelfire.com/wv/bwhomedir
> "When all else fails, RTFM."- Hide quoted text -
>
> - Show quoted text -


zhon...@aol.com

unread,
Oct 8, 2007, 12:29:04 PM10/8/07
to
> > - Show quoted text -- Hide quoted text -

>
> - Show quoted text -
I have tried, very good. Thank you very much for your help!
Frank


Felix...@gmx.de

unread,
Oct 9, 2007, 4:31:57 AM10/9/07
to
Google mean imputation and think again if you really want to use it!
Felix

monia9PL

unread,
Oct 9, 2007, 11:08:38 AM10/9/07
to
I'am happy I could help.

Felix is right, mean imputation is in many cases not really good
solution.

However, as I assume, you have a great lot data to process and can't
think out on every single variable.

An improvement of this solution, which I've seen in some articles is
to eliminate (delate or filer) variables and cases that have a
relatively big (>10, >20, >30%?)percentage of missings. If you are
going to do some analysis on your data, I would recommend doing so.

Regards
Monika

> > Frank- Hide quoted text -

zhon...@aol.com

unread,
Oct 9, 2007, 11:26:56 AM10/9/07
to
On Oct 9, 10:08 am, monia9PL <moleks...@gmail.com> wrote:
> I'am happy I could help.
>
> Felix is right, mean imputation is in many cases not really good
> solution.
>
> However, as I assume, you have a great lot data to process and can't
> think out on every single variable.
>
> An improvement of this solution, which I've seen in some articles is
> to eliminate (delate or filer) variables and cases that have a
> relatively big (>10, >20, >30%?)percentage of missings. If you are
> going to do some analysis on your data, I would recommend doing so.
>
> Regards
> Monika
>
> On Oct 9, 10:31 am, Felix_B...@gmx.de wrote:
>
Hi Monika,

Thank you very much for your help! I do appreciate it greatly.
But how to delete some variables with more than 30% missing.
If just one or two variables, it is easy to check. Suppose we have
hundreds of variables.

Frank

zhon...@aol.com

unread,
Oct 9, 2007, 11:30:35 AM10/9/07
to
On Oct 8, 9:13 am, monia9PL <moleks...@gmail.com> wrote:
> Try to use Sax Basic (script language for SPSS similar to VBA). With
> it you can open many files one after another and make some operations
> on each of them.
>
> 2) compute v_mean = mean(first_variable_name to last_variable_name).
> do repeat x = first_variable_name to last_variable_name.
> if mis(x) x = v_mean.
> end repeat.
> del var v_mean x.
> exe.
>

By the way, "del var v_mean x." should appear after exe., and with
another exe. after it.
Frank

0 new messages