Collinearity problem in imputing a repeated measure covariate

46 views

Skip to first unread message

phayat...@g.ucla.edu

unread,

Jun 8, 2017, 2:36:18 PM6/8/17

to Missing Data

Hi Jonathan,

I have a question regarding imputing repeated measure covariates using MVN and/or MICE.

I'm working on a clustered randomized trial where 24 neighborhoods were randomly assigned to an intervention and control groups, and mothers and their new born children in each group have been followed up over 5 time-points.

The research question of interest is on assessing the intervention effect on different child's outcomes. For example one of the outcome that was measured over time is child's Height-for-age Z score (HAZ) measured at 5 time-points.

The analysis model that was previously done was a linear mixed model including fixed effects of mother HIV status (measured at 5 time-points), neighborhoods, and time-points; random intercept for child, and random slope for time.

The outcome and mother's HIV status have missing observations at all of the time-points as expected. I'm planning to do further analysis using multiple imputation (assuming MAR) under different scenarios (for example including or not including auxiliary variables in imputation models, etc).

Although I'm aware that it is more likely that missing HIV status is MNAR, but as a starting point I want to assume MAR for both HAZ and HIV status.

After exploring the data I realized that there are different pattern of missingness for HIV status over time; for example, there are some lost to follow-ups, some cases that were HIV positive at some points but missing middle time-points and then again positive at the final time-point.

So to me it was a little bit wired to impute missing HIV status for those cases that we can somehow identify their HIV status according to the prior or their the following time-points.

My concerns are as follows and I would appreciate if you could comment on them.

1. I'm not really sure whether this is good practice to replace missing observations for those cases that we can identify their HIV status over time. For example, if some one at time-points 1, 2, and 5 had HIV positive and had missing at time-points 3 and 4. I thought it makes more sense to do this rather than imputing these missing observations, since they may be imputed as HIV negative.

2. As I mentioned above the analysis included HIV status over time; so in order to have a compatible imputation model with the analysis model, I tried to include HIV status from time-point 1 to time-point 5 as distinct variables in the imputation model (HIV1-HIV5). As expected, these variables were highly collinear, so I received the error message regarding collinearity. I excluded HIV1 and HIV2 from the imputation model since their correlation was equal to 1. After this exclusion I run MICE (in Stata) and it worked well. However, the problem is that when I want to ran the analysis model on the imputed datasets ( the linear mixed model), the HIV status still have missing data at time-points 1 and 2. Does it make sense to run such model after MI that still has missing observations in the covariate?

I'm also aware of existing the two-fold FCS algorithm although I haven't used it yet and don't know how that would be useful in this situation.

I would like to know your opinion on imputing such covariate like the HIV status in my example ad to know how I can deal with such collinearity. Please feel free to provide me with a reference that you think would be useful to address the issue that I'm facing with.

Thanks for your help and time.

Best,

Panteha

Reply all

Reply to author

Forward

0 new messages