On 10/10/2012 04:06 AM, Kevin Hallgren wrote:
> Hello,
>
> When using lavaan with FIML estimation and missing data, do the
> variables that are included in the dataset but not explicitly specified
> in the model get used as auxiliary variables to help with estimating
> missing values? Or do you need to use the auxiliary() function in
> semTools to include auxiliary variables?
>
> The lavaan manual says about the missing parameter: 'If |"direct"| or
> |"ml"| or |"fiml"| and the estimator is maximum likelihood, Full
> Information Maximum Likelihood (FIML) estimation is used using all
> available data in the data frame.' So I wasn't clear if they meant they
> use "all available data" to help give the best estimate for missing
> values (i.e., they are auxiliary), or if that just meant it doesn't use
> listwise deletion.
I agree this is not very clear indeed. But lavaan does NOT use auxiliary
variables. It only uses the (observed) variables that are included in
your model.
However -just to see if it makes a difference- you can easily trick
lavaan to use auxiliary variables: you add them to the model, but you
make sure they have no effect, for example:
model1 = 'T4_PHD ~ 1 + THERAPY + 0*NALTREXO + 0*THERAPY'
Here, both NALTREXO and THERAPY will be used as auxiliary variables.
Note, however, that we assume that all these variables are continuous;
we assume multivariate normality (and I'm not sure about THERAPY, but
GENDER should not be included!). Another caveat is that the degrees of
freedom will be off, but you can get them from the original analysis.
The same is true for the auxiliary() in semTools: only use continuous
variables!
> My code is a simple regression model:
>
> auxvars =
> c("NALTREXO","THERAPY","GENDER","AGE","DEPNDSX","T0_PHD","T4_PHD")
> model1 = 'T4_PHD ~ 1 + THERAPY'
> fit.fiml = sem(model1, data=raw.dataset[,auxvars], estimator="ML",
> missing="FIML")
> summary(fit.fiml)
>
> Although only the T4_PHD and THERAPY variables are used in the model, I
> would like the 5 additional auxiliary variables to help give the most
> accurate estimate of the model given missing data for T4_PHD.
In my understanding, auxiliary variables are not used when estimating
the model parameters! They are only used to fit the unrestricted (h1)
model (ie. the covariance matrix of the incomplete data). The latter is
only needed to compute the model test statistic. Your estimates (and
standard errors) will be fine without them.
Yves.