Multiple imputation over repeated observation data

122 views
Skip to first unread message

roland andersson

unread,
Apr 28, 2013, 10:55:45 AM4/28/13
to medstats
I have clinical data of patients examined one to three times with some
hours interval.

The data looks like this in the wide format
age, sex, diagnosis, time1, var1_1, var2_1, var3_1, time2, var1_2,
var2_2, var3_2, time3, var1_3, var2_3 etc

Some patients have only one examination, others up to three.

I have previously identified and used the latest examination and
imputed missing values for only that examination using all the
available data in all variables inkluding the repeat examination.

However I doubt that this takes full advantage of the dynamic effect
of the time. The problem is how MI or ICE (I use Stata) can recognise
and associate the time of the examination with the results of the
examinations at that time and see the dynamic change?

Any suggestion?

Roland andersson

Munyaradzi Dimairo

unread,
May 1, 2013, 4:21:19 AM5/1/13
to MedStats
You could also treat time as a cluster variable and consider the following approach

http://www.stata.com/support/faqs/statistics/clustering-and-mi-impute/

cheers

Munya



--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





--

"Learn from yesterday, live for today, hope for tomorrow. The important thing is not to stop questioning" ~ Albert Einstein

***********************************
Munyaradzi Dimairo
NIHR Research Fellow in Medical Statistics
University of Sheffield
School of Health and Related Research (ScHARR)
Clinical Trials Research Unit
Regent Court, 30 Regent Street
Sheffield
S1 4DA

Physical address:
Room 3.10, Innovation Centre

email: m.di...@sheffield.ac.uk 
          mdim...@gmail.com
Twitter: @mdimairo [views are my own, retweeting is not endorsement]
 fax: +44 (0) 114 222 0870
http://www.sheffield.ac.uk/scharr/sections/dts/ctru/  
***********************************

Frank Harrell

unread,
May 3, 2013, 8:42:15 AM5/3/13
to meds...@googlegroups.com
In many cases it is better to think of this not as an imputation problem but as a "use all available data" problem.  If missings are missing at random (i.e., missing completely at random given the non-missing data), and if you use a full likelihood approach (e.g., generalized least squares or mixed effects models), it does not help to impute missing data.  I prefer continuous-time models in both the mean structure (using e.g. regression spline in time) and in the correlation structure (e.g., AR1).  This makes maximum use of available data.  Note that non-full-likelihood methods such as GEE require multiple imputation to be used.

Last observation carried forward should be avoided at all costs.

Frank

John Sorkin

unread,
May 3, 2013, 8:59:31 AM5/3/13
to harr...@gmail.com, meds...@googlegroups.com
Professor Harrell,
I believe that your statement may overlook one important point. It is my understanding that while a full likelihood approach (and multiple imputation) will, when data are missing either at random or completely at random give unbiased estimates, multiple imputation has the added benefit of potentially increasing precision (i.e. reducing the size of the SE of the estimates). This being said, the degree of the difference in the SEs obtained by multiple imputation vs. full likelihood may be small as multiple imputation adds noise to each imputation to account for the uncertainty inherent in imputing values. I don't know if anyone has formally explored this question. Of course if the pattern of missing data is neither missing completely at random, nor missing at random, both a full likelihood approach and multiple imputation can lead to biased estimates.
I would be interested to hear your perspective on this issue.
John

>>> Frank Harrell <harr...@gmail.com> 5/3/2013 8:42 AM >>>

Confidentiality Statement:

This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Frank Harrell

unread,
May 3, 2013, 9:25:24 AM5/3/13
to meds...@googlegroups.com, harr...@gmail.com
Hi John,

The purpose of imputation is to make optimum use of non-missing data (vs. for example casewise deletion), not to restore information that was never there (exception: when a true surrogate measure is used to impute a sometimes missing variable).  So imputation does not increase precision in most cases, when compared against full models.  This issue has been explored in the repeated measures context in some recent papers, none of which I can put my finger on at the moment.  As you referenced, imputation can also help with computing the correct precision that is penalized for imputation uncertainty.  A fatal flaw in last observation carried forward is that it treats all imputed values as real values.

Imputation increases precision when compared against methods that exclude more data than imputation excludes.

Frank

John Whittington

unread,
May 3, 2013, 11:34:36 AM5/3/13
to meds...@googlegroups.com, harr...@gmail.com
At 06:25 03/05/2013 -0700, Frank Harrell wrote:
>The purpose of imputation is to make optimum use of non-missing data (vs.
>for example casewise deletion), not to restore information that was never
>there (exception: when a true surrogate measure is used to impute a
>sometimes missing variable). So imputation does not increase precision in
>most cases, when compared against full models.

Thanks, Frank. I've always had some difficulties with the whole concept of
imputation - and that's one of the most sensible/reasonable statements
about imputation that I've ever seen. So often there seems to be an
implication that imputation can somehow 'create' information over and above
that which actually exists! As for 'true surrogates' (or even just highly
correlated variables), if one has some missing data, but the other doesn't,
then I would be tempted to suggest that one might consider simply ignoring
the former. However,i f both have some (but usually not co-incident)
missing data, I can see that one can be used to impute the value of the
other, which may result in an improvement in precision.

>A fatal flaw in last observation carried forward is that it treats all
>imputed values as real values.

I understand at least some of the statistical/mathematical objections to
last-observation-carried-forward ('LOCF'). If, nevertheless, one were
using it, one presumably would at least not allow the imputed values to
increase the degrees of freedom, hence not totally treating the imputed
values as real.

However, if one moves away from the statistical/mathematical issues, I
can't help but feel that there are situations in which LOCF might be an
arguably valid approach to looking at 'worst case' scenarios. A common
situation in clinical trials is that, far from being missing at random,
most of the missing data is absent due to subjects having been withdrawn
from the trial (or defaulted from the trial) because of poor treatment
efficacy, reflected in 'bad' outcome variable values at the time of the
'last available observation'. One will nearly always want to analyse on an
'intention-to-treat' basis in such a situation and if one wants to be
'conservative' (i.e. considering the 'worst case' possibilities), LOCF
would seem to be one approach to that. Any method of imputation which
resulted in imputed values which appeared to be showing 'improvement' after
withdrawal/default from the trial because of 'poor efficacy' would seem to
be moving away from that 'worst-case/conservative' approach - and, even if
statistically justifiable, would have little real basis in terms of
clinical considerations.

>Imputation increases precision when compared against methods that exclude
>more data than imputation excludes.

Again, that seems to be a very reasonable and sensible statement.

Kind Regards,


John

----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------

Thompson,Paul

unread,
May 3, 2013, 11:38:32 AM5/3/13
to meds...@googlegroups.com, harr...@gmail.com
Yes, agree entirely. I remember going to a talk by a noted proponent of imputation, who demonstrated that his tool was able to "impute" the value for an entire wave in which no actual data value was observed for the item in question. While technically impressive, I was less enthralled by the notion of actually doing this.
--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


-----------------------------------------------------------------------
Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy

Frank Harrell

unread,
May 3, 2013, 12:05:21 PM5/3/13
to meds...@googlegroups.com, harr...@gmail.com
Hi John,

There have been some great papers written about LOCF especially by Craig Malincrodt.  I think it is very rate for LOCF to give the right answer for the treatment comparison, and I have NEVER seen an example where the user of LOCF computed the correct (larger) standard error to penalize for imputation.  So in that sense it is always wrong.

Frank

John Whittington

unread,
May 3, 2013, 12:15:24 PM5/3/13
to meds...@googlegroups.com, harr...@gmail.com
At 09:05 03/05/2013 -0700, Frank Harrell wrote:
>Hi John, There have been some great papers written about LOCF especially
>by Craig Malincrodt. I think it is very rate for LOCF to give the right
>answer for the treatment comparison, and I have NEVER seen an example
>where the user of LOCF computed the correct (larger) standard error to
>penalize for imputation. So in that sense it is always wrong.

I have to agree that if they 'always do it wrong' (i.e. don't penalise for
imputation), then it will always be wrong - but they could do it 'more
right'! I suspect that one of the problems is that people tend to create an
"LOCFd" data set, then feed it (as if it were all real data) into a
standard statistical package which does not offer them any (or any easy)
way of introducing the penalties they should be applying.

You are probably right that LOCF never gives the 'right answer'. However,
I was suggesting that, in the sort of situation I described, LOCF is likely
to be biased in the 'conservative direction' (clinically or
pharmaceutically speaking), whereas it looks to me as if other methods of
imputation would be at some risk of producing results which were
'non-conservative' in relation to 'the truth'.

Frank Harrell

unread,
May 3, 2013, 1:14:04 PM5/3/13
to meds...@googlegroups.com, harr...@gmail.com
Some of the papers discussing this in detail point out that the bias can sometimes be anti-conservative when intuitively you'd think it's conservative.  I highly recommend the following. -Frank

@Article{mal08rec,
  author = {Mallinckrodt, Craig H. and Lane, Peter W. and Schnell, Dan and Peng, Yahong and Mancuso, James P.},
  title = {Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials},
  journal = Drug Information Journal,
  year = 2008,
  volume = 42,
  pages = {303-319},
  annote = {missing data;longitudinal data;serial data;primary analysis;clinical trials;RCT;last observation carried forward;LOCF;excellent comprehensive review of deficiencies of LOCF and a push for model-based analysis;problems of full models such as mixed models when missing is not at random are actually worse with LOCF;explaination of biases of LOCF and its use of non-design-based endpoint or improper imputation;paper falsely assumed change scores are appropriate;emphasized saturated correlation and time model;LOCF's conservatism in one setting may be seen as anti-conservative in another, e.g., non-inferiority trial;did not address time zero response issue or ANCOVA}
}
@Article{jan06ana,
  author = {Jansen, Ivy and Beunckens, Caroline and Molenberghs,
Geert and Verbeke, Geert and Mallinckrodt, Craig},
  title = {Analyzing incomplete discrete longitudinal clinical
trial data},
  journal = Statistical Science,
  year = 2006,
  volume = 21,
  pages = {52-69},
  annote = {complete case analysis;ignorability;GEE;GLMM;missing
at random;missing completely at random;missing not at
random;sensitivity analysis;LOCF assumes unchanging profile after
dropout, an assumption too strong to hold in general}
}
@Article{bar06mul,
  author = {Barnes, Sunni A. and Lindborg, Stacy R. and Seaman,
John W.},
  title = {Multiple imputation techniques in small sample
clinical trials},
  journal = Stat in Med,
  year = 2006,
  volume = 25,
  pages = {233-245},
  annote = {multiple imputation;predictive mean matching;last
observation carried forward;bad performance of LOCF including high
bias and poor confidence interval coverage;simulation
setup;longitudinal data;serial data;RCT;dropout;assumed missing at
random (MAR);approximate Bayesian bootstrap;Bayesian least
squares;missing data;nice background summary;new completion score
method based on fitting a Poisson model for the number of completed
clinic visits and using donors and approximate Bayesian bootstrap}
}
@Article{beu05dir,
  author = {Beunckens, Caroline and Molenberghs, Geert and
Kenward, Michael G.},
  title = {Direct likelihood analysis versus simple forms of
imputation for missing data in randomized clinical trials},
  journal = Clinical Trials,
  year = 2005,
  volume = 2,
  pages = {379-386},
  annote = {dropouts;RCT;serial data;longitudinal data;bias in
LOCF;mixed models used for likelihood analysis}
}
@Article{tan05com,
  author = {Tang, Lingqi and Song, Juwon and Belin, Thomas
R. and Un\"utzer, J\"urgen},
  title = {A comparison of imputation methods in a longitudinal
randomized clinical trial},
  journal = Stat in Med,
  year = 2005,
  volume = 2005,
  pages = {2111-2128},
  annote = {missing data;hot deck;multiple
imputation;model-based imputation;predictive mean matching;LOCF and
available-case method had poor coverage;imputation under a
multivariate normal model did not produce correct coverage with highly
skewed distributions;hot deck consistently have good nominal coverage
and had CL widths 7 per cent larger on average than using multivariate
normal imputation;approximate Bayesian bootstrap;different imputation
methods used for item and for unit nonresponse;simulation setup;nice
graphics;errata 25:1095;2006}
}
@Article{obr05sem,
  author = {{O'Brien}, Peter C. and Zhang, David and Bailey,
Kent R.},
  title = {Semi-parametric and non-parametric methods for
clinical trials with incomplete data},
  journal = Stat in Med,
  year = 2005,
  volume = 24,
  pages = {341-358},
  annote = {missing data;clinical
trials;semi-parametric;non-parametric;``LOCF was observed to produce
markedly biased estimates and markedly inflated type I error rates
when censoring was unequal in the two treatment arms'';last rank
carried forward;LRCF;``mixed model repeated measures performed
similarly to cumulative change and LRCF and makes somewhat less
restrictive assumptions about missingness mechanisms'';cumulative
change model similar to Kaplan-Meier piecing together of
intervals;cumulative change and LRCF assume that ``censoring mechanism
may differ between treatment groups, but with treatment group the
distribution of the change in the endpoint from baseline to last
scheduled visit is assumed to be the same for completers and
non-completers'';errata 24:3385}
}
@Article{coo04mar,
  author = {Cook, Richard J. and Zeng, Leilei and Yi, Grace Y.},
  title = {Marginal analysis of incomplete longitudinal binary
data: {A} cautionary note on {LOCF} imputation},
  journal = Biometrics,
  year = 2004,
  volume = 60,
  pages = {820-828},
  annote = {dropout;GEE;imputation;longitudinal data
analysis;serial data;missing data;misspecified data;LOCF leads to
large biases in treatment effects, inflation of type I error, poor
coverage probability; Analyses based on all available data can result
in relatively small bias;``probability weighted analyses yield
consistent estimators subject to correct specification of the missing
data process''}
}
@Article{eng03imp,
  author = {Engels, Jean Mundahl and Diehr, Paula},
  title = {Imputation of missing longitudinal data: a
comparison of methods},
  journal = J Clin Epi,
  year = 2003,
  volume = 56,
  pages = {968-976},
  annote = {longitudinal data;repeated measures;within-subject
imputation vs. using baseline data vs. population group;natural
experiment that solved problems of simulated data because used real
data with real missingness pattern with known true value;true value
was a value observed after a missing response at a certain time, which
was made to be artificially missing;most subjects had such
measurements really missing;gold standard was ability to reproduce the
known value, not performance in the final response model (or group
comparison);LOCF;longitudinal imputation;next observation carried backward}
}

John Whittington

unread,
May 3, 2013, 1:43:59 PM5/3/13
to meds...@googlegroups.com, harr...@gmail.com
At 10:14 03/05/2013 -0700, Frank Harrell wrote:
>Some of the papers discussing this in detail point out that the bias can
>sometimes be anti-conservative when intuitively you'd think it's
>conservative. I highly recommend the following.

Many thanks, Frank. As you say, it's probably fairly counter-intuitive, so
I'll be interested to read some of them. I suppose some of it may come
down to the definition of 'conservative' being used in various situations.

roland andersson

unread,
May 4, 2013, 12:46:42 PM5/4/13
to medstats
Thank you for input on this thread

May I explain my situation. I am studying the clinical diagnosis of appendicitis. Patients are examined at arrival and some of them after 4-8 h of observation and some after a prolonged observation when a decision is made for surgery. I have data on 7 clinical variables plus duration of symptoms. Unfortunately there are som missing variables. The majority of them is neutrophil counts (a kind of white blood cells). I have WBC count as well as othe inflammatory variables which are all associated with the neutrophil count. 
I have previously used data from the last examination as this comes closest to the operation and therefore best describe the true status of the appendix. I have imputed the missing values (mainly neutrophils) from all available data of all examinations, ie te data is in wide format with up to three sets of all the variables. However I feel that such an imputation does not really take into consideration the dynamic effect of time, ie the duration of symptoms at one specific observation is not directly associated with one specific duration of time. There can be up to three duration of time and three sets of variable results and one missing from the last examination. 

So I feel that it would be better to impute the missing values from a dataset in long format. That would give the time variable a higher status, but than I would miss the information of all the preceeding examinations for that patient. 

I am not sure that the references that Munay Dimaro gave does what I want. 

Roland E Andersson



2013/5/3 Frank Harrell <harr...@gmail.com>

John Whittington

unread,
May 4, 2013, 1:00:51 PM5/4/13
to meds...@googlegroups.com
At 18:46 04/05/2013 +0200, roland andersson wrote:
>Unfortunately there are som missing variables. The majority of them is
>neutrophil counts (a kind of white blood cells). I have WBC count as well
>as othe inflammatory variables which are all associated with the
>neutrophil count.

Do I take it that you mean that you have no differential white count at all
for those with missing neutrophil counts? If you did have the rest of the
differential count (notably lymphocytes), you obviously could 'infer',
rather than impute the neutrophil counts (from WBC and lymphocte count)
with a tolerable degree of accuracy.

roland andersson

unread,
May 5, 2013, 1:24:21 PM5/5/13
to medstats
John

Correct. I do not have the differential count. Only WBC and neutrophil count. And missings are mot common for neutrophils. The problem is how to take account of both the dynamic (I can do that if I use multiple imputation using long format where all variables at one specific observation are related to duration of symptoms at that time) _and_ the results of the preceeding obserations for each patient (I could do that using multiple imputation using the wide data format).

Roland 


2013/5/4 John Whittington <Joh...@mediscience.co.uk>
--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--- You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+unsubscribe@googlegroups.com.

John Whittington

unread,
May 6, 2013, 12:20:19 PM5/6/13
to meds...@googlegroups.com
At 19:24 05/05/2013 +0200, roland andersson wrote:
>Correct. I do not have the differential count. Only WBC and neutrophil
>count. And missings are mot common for neutrophils. The problem is how to
>take account of both the dynamic (I can do that if I use multiple
>imputation using long format where all variables at one specific
>observation are related to duration of symptoms at that time) _and_ the
>results of the preceeding obserations for each patient (I could do that
>using multiple imputation using the wide data format).

Moving away from the statistical issues, do you have any reason to think
that neutrophil count might be of independent value in the differential
diagnosis of appendicitis? In relation to most of the differential
diagnoses, I would have thought that changes in total WBC would nearly
always be a reflection of changes in neutrophil count.

roland andersson

unread,
May 6, 2013, 1:28:28 PM5/6/13
to medstats
John

We have found that the proportion of neutrophils is a strong discriminator. This is because appendicitis is often associated with lymphopenia. With a WBC count of 10.0-10.9 (which is normal) the risk of appendicitis is 10% at <65% neutrophils and 35% at >69% neutrophils. So proportion of neutrophils is an independent predictor. 

Roland

 


2013/5/6 John Whittington <Joh...@mediscience.co.uk>
--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--- You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+unsubscribe@googlegroups.com.

Thomas Keller

unread,
May 6, 2013, 1:32:47 PM5/6/13
to meds...@googlegroups.com

Dear Roland,
 
some years ago I had to analyse similar data.
The main problem might be to obtain the true disease state: appendicitis y/n
Especially for those patients who were suspecious to have appendicities but were not operated (watchful waiting). This is a considerable amount of patients.
But even when operation was done the decision for true disease state is less simple as one could assume.
 
Another point: You should check for which patients neutrophil count is missing. I would assume MNAR.
 
Kind regards Thomas
 

John Whittington

unread,
May 6, 2013, 1:52:33 PM5/6/13
to meds...@googlegroups.com
At 19:28 06/05/2013 +0200, roland andersson wrote:
>John We have found that the proportion of neutrophils is a strong
>discriminator. This is because appendicitis is often associated with
>lymphopenia. With a WBC count of 10.0-10.9 (which is normal) the risk of
>appendicitis is 10% at <65% neutrophils and 35% at >69% neutrophils. So
>proportion of neutrophils is an independent predictor.

Interesting - although, from what you say, it sounds as if it's probably
the lymphocyte count, rather than the neutrophil count, that is the true
underlying independent predictor. However, I must be getting 'rusty',
because I thought that lymphopenia (or lymphocytopenia, as I usually call
it!) was also often associated with mesenteric adenitis (not to mention
urinary tract and other infections), which presumably is one of the main
differential diagnoses.

roland andersson

unread,
May 7, 2013, 12:56:54 AM5/7/13
to medstats
John

In fact the neutrophil/lymphocyte ratio has been proposed as a diagnostic variable for appendicitis. The problem is that you need yet another labexam to get it. We have problems to get acceptance for the neutrophils. At our lab the neutrophil count was done automatically for each WBC count but you had to ask for it to have them deliver the result.   

Here is one reference, there are many - http://www.ncbi.nlm.nih.gov/pubmed/23294069

Roland



2013/5/6 John Whittington <Joh...@mediscience.co.uk>
At 19:28 06/05/2013 +0200, roland andersson wrote:
--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

--- You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+unsubscribe@googlegroups.com.

roland andersson

unread,
May 7, 2013, 1:06:49 AM5/7/13
to medstats
Thomas

I agree partly about the true state. The histopathological criteria for appendicitis are controversial. We have used transmural inflammation but others accept mucosal inflammation or even pus in the lumen, which are commonly seen in asymptomatic patients (so is a normal state). All our specimen has been reexamined by one pathologist in order to get a uniform diagnosis. 

And spontaneous resolution is common (not everebody knows that) so some of the nonoperated may have had appendicitis. We are therefore focused on diagnosing advanced appendicitis (gangrenous and perforated) that probably need surgical treatment.

We have no indication that the neutrophil count is not MAR. But in the end you can never be perfect and have to use the data you have at hand. In the choice between multiple imputation and complete case analysis I think MI is most supported. In fact we have done both analyses and the MI gave the most conservative result which I guess is least biased. 

Roland


  


2013/5/6 Thomas Keller <tho...@gmx.de>
 

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules
 
---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages