Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Handling Missing Data in Linear Mixed & Generalized Linear Modeling

1,974 views

Skip to first unread message

Ryan

unread,

Apr 19, 2008, 7:47:36 PM4/19/08

Hi all,

Linear mixed can include incomplete cases. My understanding is that
the reason Linear Mixed can handle cases with missing data is b/c
solutions are based on MLE.

I have three questions:

1. When running linear mixed in SPSS, do you have to specify that you
want to include cases with missing data to be used or is this the
default/done automatically?

2. Does Generalized Linear Modeling/Generalized Estimating Equations
in SPSS handle missing data in a similar way as linear mixed when you
select MLE as the scale parameter estimation method?

3. Assuming the answer to 3 is yes, when running Generalized Linear
Modeling/Generalized Estimation Equations using MLE, do you need to
specify that you want incomplete cases to be used in the analysis or
is this done automatically?

Any thoughts/clarifications/references would be appeciated.

Thanks,

Ryan

Bruce Weaver

unread,

Apr 21, 2008, 8:38:12 AM4/21/08

On Apr 19, 7:47 pm, Ryan <Ryan.Andrew.Bl...@gmail.com> wrote:
> Hi all,
>
> Linear mixed can include incomplete cases. My understanding is that
> the reason Linear Mixed can handle cases with missing data is b/c
> solutions are based on MLE.

I think it is because the file structure is long rather than wide
(i.e., one row per data point, not one row per subject). Note that
you can trick SPSS into doing repeated measures ANOVA via UNIANOVA if
you restructure the data to a long file format. And in that case, you
can keep subjects who do not have complete data for the repeated
measures. UNIANOVA does not use MLE.

>
> I have three questions:
>
> 1. When running linear mixed in SPSS, do you have to specify that you
> want to include cases with missing data to be used or is this the
> default/done automatically?
>
> 2. Does Generalized Linear Modeling/Generalized Estimating Equations
> in SPSS handle missing data in a similar way as linear mixed when you
> select MLE as the scale parameter estimation method?
>
> 3. Assuming the answer to 3 is yes, when running Generalized Linear
> Modeling/Generalized Estimation Equations using MLE, do you need to
> specify that you want incomplete cases to be used in the analysis or
> is this done automatically?
>
> Any thoughts/clarifications/references would be appeciated.
>
> Thanks,
>
> Ryan

In order to answer those specific questions, I'd have to RTFM.

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
"When all else fails, RTFM."

Ryan

unread,

Apr 21, 2008, 9:27:00 AM4/21/08

> bwea...@lakeheadu.cawww.angelfire.com/wv/bwhomedir
> "When all else fails, RTFM."- Hide quoted text -
>
> - Show quoted text -

Thanks. When I stated "handle missing data," I meant it can use all
possible pieces of information (b/c the estimation procedure is a type
of MLE). I don't believe the structure of the dataset is the
underlying reason linear mixed can "handle cases with missing data."
You could restructure repeated measures data (short format to long
format) and run it in UNIANOVA, but UNIANOVA is based on partitioning
SS, which does not include incomplete cases in analyses.

Linear Mixed can handle different covariance structures, random
effects, and missing data b/c it uses MLE, I think. If I'm wrong I
welcome a correction. Consequently, if I'm not interested in
estimating random effects (e.g. subjects effects as random), or
repeated measures, but I have a small percentage of data missing at
random, I think Generalized Linear Modeling is the way to go. I would
really love to find out if Generalized Linear Modeling when using MLE
handles missing data the same way linear mixed can. I don't have the
latest manual, so I don't know if Generalized Linear Modeling uses all
pieces of information when estimating through MLE.

----
After writing the above paragraph I decided to test my theory in an
inelegant fashion. I ran data (nonrepeating/independent and some
missing) using both linear mixed, generalized linear modeling, and and
UNIANOVA. As expected, linear mixed and generalized linear modeling
provided identical results, while UNIANOVA was different.

If all three were the same (in terms of parameter estimates), I could
have concluded that linear mixed and generalized linear modeling were
not including incomplete cases (http://www.psy.vanderbilt.edu/faculty/
palmeri/P351-modeling/readings/myung-tutorial-mle.pdf). I realize this
would only occur with this type of data.

Perhaps this answers my question. I will hopefully be able to confirm
this believe after I purchase the latest manual.

Below is the syntax I used:

MIXED
scores BY group
/CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1)
SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0,
ABSOLUTE)
PCONVERGE(0.000001, ABSOLUTE)
/FIXED = group | SSTYPE(3)
/METHOD = ML .

* Generalized Linear Models.
GENLIN
scores
BY group
(ORDER=ASCENDING)
/MODEL
group
INTERCEPT=YES
DISTRIBUTION=NORMAL
LINK=IDENTITY
/CRITERIA SCALE=MLE COVB=MODEL
PCONVERGE=1E-006(ABSOLUTE)
SINGULAR=1E-012
ANALYSISTYPE=3 CILEVEL=95 LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

UNIANOVA
scores BY group
/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE
/CRITERIA = ALPHA(.05)
/DESIGN = group .

If anyone thinks/knows my logic is faulty, please tell me.

Ryan

unread,

Apr 21, 2008, 10:24:01 AM4/21/08

> Ryan- Hide quoted text -

>
> - Show quoted text -

Sorry for double posting, but it's necessary for a correction:

I stated

> After writing the above paragraph I decided to test my theory in an
> inelegant fashion. I ran data (nonrepeating/independent and some
> missing) using both linear mixed, generalized linear modeling, and and
> UNIANOVA. As expected, linear mixed and generalized linear modeling
> provided identical results, while UNIANOVA was different.

That is not true. The F (from linear mixed) and Wald Chi Square (from
Genearlized Linear Modeling) were identical, while F (from ANOVA) was
different. The parameter estimates were the same for all three tests,
but the standard errors were only the same for Mixed and Generalized.
If you'd like to test it, you need to include parameter estimates in
the above syntax.

>
> If all three were the same (in terms of parameter estimates), I could
> have concluded that linear mixed and generalized linear modeling were
> not including incomplete cases (http://www.psy.vanderbilt.edu/faculty/
> palmeri/P351-modeling/readings/myung-tutorial-mle.pdf).

This conclusion may not be true. I believe the standard errors would
be different b/n MLE and ANOVA with or without missing data so this
does not answer my original question.

> I realize this
> would only occur with this type of data.
>
> Perhaps this answers my question. I will hopefully be able to confirm
> this believe after I purchase the latest manual.

I spoke with someone who runs linear mixed frequently in SPSS, and I
was told that linear mixed automatically uses all pieces of
information (data from missing cases). Of course, this is not enough.
I need to purchase the manual or find a reference to bolster this
belief. If the default of linear mixed is to include incomplete cases,
then the answer to question 2 is "yes," and 3 is "no."

Ryan

Bruce Weaver

unread,

Apr 21, 2008, 10:48:18 AM4/21/08

On Apr 21, 9:27 am, Ryan <Ryan.Andrew.Bl...@gmail.com> wrote:

> On Apr 21, 8:38 am, Bruce Weaver <bwea...@lakeheadu.ca> wrote:
>
> > I think it is because the file structure is long rather than wide
> > (i.e., one row per data point, not one row per subject). Note that
> > you can trick SPSS into doing repeated measures ANOVA via UNIANOVA if
> > you restructure the data to a long file format. And in that case, you
> > can keep subjects who do not have complete data for the repeated
> > measures. UNIANOVA does not use MLE.
>

> Thanks. When I stated "handle missing data," I meant it can use all
> possible pieces of information (b/c the estimation procedure is a type
> of MLE). I don't believe the structure of the dataset is the
> underlying reason linear mixed can "handle cases with missing data."
> You could restructure repeated measures data (short format to long
> format) and run it in UNIANOVA, but UNIANOVA is based on partitioning
> SS, which does not include incomplete cases in analyses.

Perhaps I misunderstand what you mean by not including incomplete
cases in the analysis. When I restructure a repeated measures data
set from wide to long and use UNIANOVA to do the analysis, I get
exactly the same F-tests as I get with GLM Repeated Measures. If I
then punch some holes in the data set, any incomplete cases are
excluded altogether by GLM Repeated Measures, but not by UNIANOVA. IN
that situation, the two methods yield different results. There are
some simple examples here:

www.angelfire.com/wv/bwhomedir/spss/repmeas_ANOVA_with_long_file.SPS

>
> Linear Mixed can handle different covariance structures, random
> effects, and missing data b/c it uses MLE, I think. If I'm wrong I
> welcome a correction. Consequently, if I'm not interested in
> estimating random effects (e.g. subjects effects as random), or
> repeated measures, but I have a small percentage of data missing at
> random, I think Generalized Linear Modeling is the way to go. I would
> really love to find out if Generalized Linear Modeling when using MLE
> handles missing data the same way linear mixed can. I don't have the
> latest manual, so I don't know if Generalized Linear Modeling uses all
> pieces of information when estimating through MLE.

I thought the online help files (and syntax reference manual) had all
the same stuff as the printed manuals. But I may be wrong.

From what you said (i.e., nonrepeating/independent), and judging by
those commands, the data set has one row per subject, right? I'm
probably missing something, but I don't understand how that kind of
example will help answer your question. Surely, in the absence of
some sort of imputation, rows with missing data must be excluded in
all three analyses.

For general information, here are the help file entries on the MISSING
subcommand for all 3 procedures.

MISSING Subcommand (GENLIN command)
The MISSING subcommand specifies how missing values are handled.

• Cases with system missing values on any variable used by the GENLIN
procedure are excluded from the analysis.

• Cases must have valid data for the dependent variable or the events
and trials variables, any covariates, the OFFSET variable if it
exists, the SCALEWEIGHT variable if it exists, and any SUBJECT and
WITHINSUBJECT variables. Cases with missing values for any of these
variables are not used in the analysis.

• The CLASSMISSING keyword specifies whether user-missing values of
any factors are treated as valid.

EXCLUDE Exclude user-missing values among any factor or subpopulation
variables. Treat user-missing values for these variables as invalid
data. This is the default.
INCLUDE Include user-missing values among any factor or subpopulation
variables. Treat user-missing values for these variables as valid
data.

-------

MISSING Subcommand (MIXED command)
MISSING subcommand,MISSING subcommand,MISSING subcommand
The MISSING subcommand specifies the way to handle cases with user-
missing values.

• If this subcommand is not specified, the default is EXCLUDE.

• Cases, which contain system-missing values in one of the variables,
are always deleted.

• The keywords EXCLUDE and INCLUDE are mutually exclusive. Only one of
them can be specified at once.

EXCLUDE Exclude both user-missing and system-missing values. This is
the default.
INCLUDE User-missing values are treated as valid. System-missing
values cannot be included in the analysis.

------

MISSING Subcommand (UNIANOVA command)
By default, cases with missing values for any of the variables on the
UNIANOVA variable list are excluded from the analysis. The MISSING
subcommand allows you to include cases with user-missing values.

• If MISSING is not specified, the default is EXCLUDE.

• Pairwise deletion of missing data is not available in UNIANOVA.

• Keywords INCLUDE and EXCLUDE are mutually exclusive.

• If more than one MISSING subcommand is specified, only the last one
is in effect.

EXCLUDE Exclude both user-missing and system-missing values. This is
the default when MISSING is not specified.
INCLUDE User-missing values are treated as valid. System-missing
values cannot be included in the analysis.

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

Ryan

unread,

Apr 21, 2008, 2:02:15 PM4/21/08

> bwea...@lakeheadu.cawww.angelfire.com/wv/bwhomedir
> "When all else fails, RTFM."- Hide quoted text -
>

> - Show quoted text -- Hide quoted text -

>
> - Show quoted text -

Thanks for your thoughtful response, Bruce.

I'll respond to each section of your response.

> Perhaps I misunderstand what you mean by not including incomplete
> cases in the analysis. When I restructure a repeated measures data
> set from wide to long and use UNIANOVA to do the analysis, I get
> exactly the same F-tests as I get with GLM Repeated Measures. If I
> then punch some holes in the data set, any incomplete cases are
> excluded altogether by GLM Repeated Measures, but not by UNIANOVA. IN
> that situation, the two methods yield different results. There are
> some simple examples here:

I did not realize that. I've gone through the examples. Very helpful!
Thank you! Have you ever converted from short to long just to deal
with data missing at random in GLM? Have you published studies or seen
published studies in your area that have used this approach to deal
with missing data using ANOVA?

> From what you said (i.e., nonrepeating/independent), and judging by
> those commands, the data set has one row per subject, right? I'm
> probably missing something, but I don't understand how that kind of
> example will help answer your question. Surely, in the absence of
> some sort of imputation, rows with missing data must be excluded in
> all three analyses.
>

Yes. I see the fatal flaw in my test. If I wanted to test whether or
not linear mixed included "participants" with missing data, I would
need at least one row with full information pertaining for a
"participant" to be included in the analysis. I ran some data with
this type of dataset, and in fact found that both generalized linear
modeling and linear mixed included participants who had some missing
data were included in the analyses. And after going through the syntax
you sent me, it seems that GLM does the same thing.

So the ability to include cases with some missing data has nothing to
do with MLE, but has to do with the format of the dataset? If yes,
then if I'm interested in keeping participants with some data missing
at random, then I need to set up the dataset in long format and run
UNIANOVA, Linear Mixed, or Generalized Linear Modeling, provided I
meet the assumptions of these tests/answers my research question.

Here's the syntax to run these analyses for a simple repeated measures
dataset (when the file is in long format):

* Generalized Estimating Equations.
GENLIN
scores
BY time
(ORDER=ASCENDING)
/MODEL
time

INTERCEPT=YES
DISTRIBUTION=NORMAL
LINK=IDENTITY
/CRITERIA SCALE=MLE

PCONVERGE=1E-006(ABSOLUTE)
SINGULAR=1E-012
ANALYSISTYPE=3 CILEVEL=95
/REPEATED
SUBJECT=subjects
WITHINSUBJECT=time
SORT=YES CORRTYPE=INDEPENDENT
ADJUSTCORR=YES COVB=ROBUST

/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

MIXED
scores BY time

/CRITERIA = CIN(95) MXITER(100) MXSTEP(5) SCORING(1)
SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0,
ABSOLUTE)
PCONVERGE(0.000001, ABSOLUTE)

/FIXED = time | SSTYPE(3)
/METHOD = ML
/PRINT = SOLUTION
/REPEATED = time | SUBJECT(subjects) COVTYPE(DIAG) .

UNIANOVA
scores BY time subjects

/METHOD = SSTYPE(3)
/INTERCEPT = INCLUDE

/PRINT = PARAMETER
/CRITERIA = ALPHA(.05)
/DESIGN = time subjects .

I expected to get identical parameter estimates with these three
analyses if I assume the working corr matrix to be independent with no
random effects. Linear Mixed and Generalized Linear provided identical
parameter estimates, but UNIANOVA was slightly different. I thought
"MLE should equal ANOVA estimation with this type of model." Then I
realized the reason the parameter estimate in the ANOVA might be
different was b/c subjects was being estimated. After removing
subjects as a factor, then all parameter estimates equaled each
other.

Bottom line--

All three tests:

1. provided identical parameter estimates/slightly different standard
errors (when subjects is removed from UNIANOVA)
2. kept the same number of rows for analysis.
3. Had very close test statistic values and p values (when subjects is
in the model in UNIANOVA)

> For general information, here are the help file entries on the MISSING
> subcommand for all 3 procedures.

If I'm reading correctly, these subcommands in SPSS provide the option
to include values designated as missing values (e.g. "99") as valid
values. Thanks but this is not an interest of mine.

---------------------------------------
I wonder why the following article says that GLM does "listwise
deletion" while LMM does not? If you look under FAQ, you'll see an
answer that states that LMM can handle missing data while GLM cannot.

http://www2.chass.ncsu.edu/garson/pa765/multilevel.htm

It seems that I can actually use any of these three tests in long
format -UNIANOVA, Linear Mixed, or Generalized Linear Modeling and
obtain nearly identical results, and handle missing data in the same
way. Linear Mixed doesn't seem to provide an advantage to handling
missing data if you have equal time intervals, assume sphericity, and
have random effects. Or am I wrong?

Ryan

unread,

Apr 21, 2008, 2:04:04 PM4/21/08

> ...
>
> read more »- Hide quoted text -

>
> - Show quoted text -

Cont'd

Bruce Weaver

unread,

Apr 21, 2008, 3:29:18 PM4/21/08

On Apr 21, 2:02 pm, Ryan <Ryan.Andrew.Bl...@gmail.com> wrote:
> On Apr 21, 10:48 am, Bruce Weaver <bwea...@lakeheadu.ca> wrote:

> Thanks for your thoughtful response, Bruce.
>
> I'll respond to each section of your response.
>
> > Perhaps I misunderstand what you mean by not including incomplete
> > cases in the analysis. When I restructure a repeated measures data
> > set from wide to long and use UNIANOVA to do the analysis, I get
> > exactly the same F-tests as I get with GLM Repeated Measures. If I
> > then punch some holes in the data set, any incomplete cases are
> > excluded altogether by GLM Repeated Measures, but not by UNIANOVA. IN
> > that situation, the two methods yield different results. There are
> > some simple examples here:
>
> I did not realize that. I've gone through the examples. Very helpful!
> Thank you! Have you ever converted from short to long just to deal
> with data missing at random in GLM? Have you published studies or seen
> published studies in your area that have used this approach to deal
> with missing data using ANOVA?

No, I am not aware of any published studies that have done this.
FWIW, I think that nowadays, a multilevel model approach (using MIXED)
would be preferred over UNIANOVA with a long file format. But I'm
only just starting to get familiar with MIXED*, so cannot offer very
much help with that. There is a "multilevel" mailing list though, and
I believe that some members use SPSS MIXED.

* Jos Twisk's book, "Applied Multilevel Analysis: A Practical
Guide" (2006, Cambridge) has been very helpful. Having read it, I
hope I'll now be able to make more sense of other books, such as
Snijders & Bosker (1999).

--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir

0 new messages