Non-parametric analysis with multiple imputations

2,135 views
Skip to first unread message

Sam R

unread,
Sep 29, 2014, 11:11:12 AM9/29/14
to missin...@googlegroups.com
Hi,

I am very new to missing data approaches and multiple imputation so I apologise if this is a daft question!

I have a longitudinal dataset with monotone missing data.  The variable of interest is not normally distributed however the log transformation of the variable roughly is.  I am planning on imputing missing data after log-transforming the variable of interest (and/or by using non-transformed data with a hot deck approach).  

I want to analyse the multiply imputed data sets in SAS using a non-parametric Wilcoxon-Mann-Whitney test, and then combine the results, but I don't know if this is even theoretically possible as I can't find any documentation on using this type of analysis with multiple imputations.

Any help would be much appreciated....

Thanks in advance.

Sam


Jonathan Bartlett

unread,
Sep 29, 2014, 9:25:45 PM9/29/14
to missin...@googlegroups.com
Hi Sam

Not a daft question at all. Before I come to your question, you mentioned possibly transforming your non-normal variable prior to imputation. One thing to be aware of when you do this (and you may well be aware already given your second option to use a hot-deck type approach) is that when you impute using the transformed variable you change the assumed functional relationship between it and the other variables. For more on this, see the following paper by von Hippel: http://smr.sagepub.com/content/42/1/105.abstract

As you have probably discovered, Rubin's combination rules are for estimation of a parameter, and hypothesis testing / confidence intervals for a parameter (or multiple parameters). There have been a few approaches developed for combining p-values directly (see http://missingdata.lshtm.ac.uk/index.php?option=com_content&view=article&id=164:combining-p-values-from-multiple-imputations&catid=57:multiple-imputation&Itemid=98), but I think the approaches described in the two papers I listed on that page started off from an assumption that there is a target parameter being estimated, so I'm not sure whether it would be applicable in your case. I don't have Schafer's book to hand (also referred to on that page), but I recall that he has some discussion of applying rank based methods to multiply imputed datasets.

An alternative 'answer', that you might disagree with (!), is that arguably we should never perform analyses where we only report a p-value. We are almost always interested (or should be) in quantifying the direction and magnitude of some parameter in a model (possibly one with minimal assumptions), in which case you can make progress. For example, let's say you wanted to compare the medians of your two groups. Then so long as the sample size is not small, the sampling distribution of the sample median is approximately normal, with an analytical approximate formula for the standard error. Consequently you can apply Rubin's rules to the difference in group medians, and combine the standard errors etc as usual. If you happen to be a Stata user, Stata will fit a median (using the qreg command) to multiple imputed datasets, and thus do this all for you. I'm not sure whether mice in R will (yet).

Best wishes

Jonathan

Sam R

unread,
Sep 30, 2014, 6:05:29 AM9/30/14
to missin...@googlegroups.com
Hi Jonathan,

First of all thank you for such a quick response - I really appreciate it!

Very good point about log-transforming the data and changing the assumed relationship.  I also totally agree about quantifying a magnitude/direction. I am wanting to quantify the difference between two treatment medians and if possible the associated confidence interval of he median difference (I'm using the Hodges-Lehmann estimator to do this) - this is what I meant by "combining the results"... sorry that I wasn't more clear about that.  When you say about the sample median being approximately normal so long as the sample size is not small - Am correct in thinking that this means making a greater number of imputations to get a decent number of approximations and sample medians to work with?  Sadly I don't have access to Stata which by the sounds of it is a shame!

Thanks again
Sam

Jonathan Bartlett

unread,
Oct 2, 2014, 10:52:12 PM10/2/14
to missin...@googlegroups.com
Hi Sam

No, I meant the number of observations in your dataset when I said sample size, rather than the number of imputations. Rubin's rules essentially assume that if there were complete data, your estimator would be normally distributed, which for almost every estimator is true provided the sample size isn't small.

But I was wrongly thinking/remembering that there was a simple analytical variance estimator for the sample median, and upon looking I'm not sure that there is, because the sampling distribution (in large samples) has a variance 1/(4*f^2(theta)) where f(theta) is the density of the distribution and theta is the median. Therefore (I think) you need some estimate of the underlying density function in order to construct a standard error. I think there's probably quite a few different approaches to this (Stata uses some for example in its quantile regression command, and I'm sure R has similar methods), but it's not something I know anything about (yet at least!).

Best wishes
Jonathan

Katherine Ogurtsova

unread,
Jun 18, 2018, 11:50:13 AM6/18/18
to Missing Data
Dear Jonathan,

I found this discussion thread while looking for an answer if nonparametric tests, for example comparing medians, work together with multiple imputations. It's quite a lot of time passed since this discussion had been started. Probably, now you might know better if it's feasible to apply non-parametric tests to MI data. Have you found something in the literature since?

Bests,
Katherine

Vahe Panossian

unread,
Jan 2, 2019, 10:09:19 AM1/2/19
to Missing Data
Hello Katherine and Jonathan,

Having the same problem myself. Any solutions found?

Best,
Vahe
Reply all
Reply to author
Forward
0 new messages