Use of auxiliary variables collected post-randomisation in multiple imputation model

59 views

Skip to first unread message

darren....@primoriscs.co.uk

unread,

Jan 29, 2017, 7:58:17 AM1/29/17

to Missing Data

Hi, is it OK to use auxiliary variables that are collected post-randomisation in a multiple imputation model? For example, if the main variable of interest is only measured at Week 0 (pre-treatment baseline) and Week 16, but other auxiliary variables are measured at intermediate visits (week 0, 4, 8, 12 and 16), is it valid to use this auxiliary variables at these post-randomisation visits in developing the multiple imputation model when 'filling in' subjects who have a missing Week 16 outcome for the main variable of interest. In my example, the main outcome variable is total body fat at Week 16 measured by MRI scan, and the auxiliary variable is body weight, recorded at 4-week intervals. It is expected that some subjects will drop out prior to Week 16, but these subjects will have body weight data recorded up until the point that they drop out. I want to make use of their body weight data in the multiple imputation model for total body fat at Week 16, as it is thought that body weight changes are correlated to total body fat changes. In addition, once a subject discontinues from the study, they are requested to come back to the clinic for an early termination visit, where they will have an MRI scan to record total body fat (main outcome variable). This early termination visit could be at any time between Week 0 and Week 16. Can this early termination visit data for the main outcome variable be used in any way in my multiple imputation model to inform what the outcome variable value might have been at Week 16?

So I am thinking my imputation model would be something like:

Total body fat at Week 16 = beta0 + beta1*treatment + beta2*total body fat week 0 + beta3*body weight week 0 + beta4*body weight Week 4 + beta5*body weight Week 8 + beta6*body weight week 12 + beta7*body weight week 16 + error

plus a term for the early termination value for the outcome variable if I can use this???

Assuming this is OK to do this, my next question is what do I do in cases where values for the auxiliary variables are missing? For example, subjects who drop out at Week 8 won't have body weight values for Week 12 or Week 16. Do I need to develop a model to fill in values for all the auxiliary variables first before I develop the MI model for my main outcome variable?

The study design is a 2-treatment parallel group study, with a test group and a placebo group.

I know you are not supposed to use covariates collected post-randomisation in the main analysis model for testing hypotheses of a treatment effect, hence my question.

Many thanks

Darren

Jonathan Bartlett

unread,

Feb 8, 2017, 3:41:50 PM2/8/17

to Missing Data

Hi Darren

1) is it OK to use auxiliary variables that are collected post-randomisation in a multiple imputation model? Yes. Doing so should improve the precision of your estimated treatment effect at the final time point due to the reduction in uncertainty about the missing values, and potentially make the MAR assumption more plausible.

2) regarding missingness in the auxiliary variables: yes if they have missing values you will need to impute these too. This isn't ordinarily an issue: the imputation model doesn't know which variables are auxiliaries and which are not. You will need to include in the imputation model(s) the variables you want to impute, the auxiliaries, and any other covariates (e.g. treatment group) which are involved in the analysis model and therefore must be allowed for in the imputation model

3) regarding using the early termination information: this is a bit more difficult I think. You need a model for how total body fat at week 16 is predicted by total body fat at an earlier discontinuation visit, but presumably you don't have any patients with both of these, they are either have one or the other. I'm not sure what the best way forward would be with this to be honest. One option would be to use an assumed relationship/model between these two measurements, probably with some effect for the time between the discontinuation and week 16, and then repeat the analysis using different values of the parameters in this assumed model. It would be easier if you had measurements of total body fat during the trial in those who complete to week 16, because then you could directly estimate the required model, but from your description it doesn't sound like you have this in your study.

Best wishes

Jonathan

Reply all

Reply to author

Forward

0 new messages