Missing data - longitudinal dataset with varying timepoints

75 views
Skip to first unread message

Claire Davies

unread,
Nov 15, 2020, 9:29:02 AM11/15/20
to Missing Data

Hi Jonathan,

I'm was wondering if you have any resources pertaining to R code when analysing missing data in longitudinal datasets? My dataset has the following complexities, which I am unsure how to deal with:
- Missingness in covariates (for example Tanner Stages) - which can only be an integer and increase from Stage 1 to 5 as the child progresses through puberty
- Missingness in the dependent variable - each child has multiple clinic visits at different times, and the outcome variable is not always measured.

The data will be subsequently analysed using mixed models, potentially with a non-linear component as the dependent variable does not increase linearly with age (a representation of time), but rather increases and then drops just as the child nears the end of puberty.

Just wondering if you can point me in the right direction?

Thanks!

Assefa Legesse

unread,
Nov 16, 2020, 11:29:09 AM11/16/20
to missin...@googlegroups.com
I can give you multiple imputation codes in R for this dataset type what I do the last year on this cases.

--
You received this message because you are subscribed to the Google Groups "Missing Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to missing-data...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/missing-data/bbf10c84-7378-4c75-a2fc-6bd680456489n%40googlegroups.com.

Claire Davies

unread,
Nov 17, 2020, 7:54:50 AM11/17/20
to Missing Data
That would be great! Thanks so much. Could you attach it in txt file?

Assefa Legesse

unread,
Nov 18, 2020, 3:17:21 AM11/18/20
to missin...@googlegroups.com
Sorry dear 
I have SAS code for this case and have attached whole codes here below. But we can use the Amelia package in R to handle this missingness using multiple imputations.



--
With Kind Regards,

Mr. Assefa Legesse(MSc, in Bio-statistics)
Lecturer at Department of Epidemiology and Bio-statistics, Public Health Faculty
 
Institute of Health Science, Jimma University.
Jimma, Ethiopia
Alternative E-mail  assefal70@gmail.com
multipile_imputation.txt

Jonathan Bartlett

unread,
Nov 18, 2020, 3:57:05 AM11/18/20
to missin...@googlegroups.com
Just to add, two broad approaches to this are:
- format the data into 'wide' format, where you have distinct variables for each variable measured at each time point. And then impute using 'standard' MI software like mice in R. Since this approach does not know that the different variables are the same things being measured at different times, the imputation models used won't be assuming for example a linear effect of time/age on the variables.
- use a multi-level hierarchical imputation method, such as the jomo package in R. 

To handle a variable whose value can only increase over time by definition, one way of maybe handing this is to convert the variable into variables which measure the increase in the original variable between each visit. These 'increment' variables can be imputed using a method which only imputes values >=0, and then from these increment variables you could go back to the original variable scale in the imputed datasets.

I hope that helps.

Jonathan

Assefa Legesse

unread,
Nov 19, 2020, 1:51:09 AM11/19/20
to missin...@googlegroups.com

Claire Davies

unread,
Nov 21, 2020, 9:58:14 AM11/21/20
to Missing Data
Thanks so much Jonathan - that's really helpful.

I've managed to get the imputation to work using the jomo package and mitml package. I have a quick question though regarding the variable that can only increase in time - you mentioned I should convert it into variables that measure the increase per visit. However, my data is unbalanced, and each child comes to the clinic at a different time - so I wouldn't be able to create distinct variables that record the difference in Tanner Stage between each visit. Unless I'm getting confused somewhere? Are there other methods for handling variables that can only increase over time?

Thanks so much!

Jonathan Bartlett

unread,
Nov 24, 2020, 8:52:07 AM11/24/20
to missin...@googlegroups.com
Hi Claire

Ok. The different timing of visits between child would indeed make my proposal a bit difficult to use. I'm not really aware of any approaches or literature that have looked at missing data in variables that can only increase over time I'm afraid. It's not clear to me that there is an obvious solution given current methods unfortunately. Sorry!

Best wishes
Jonathan

Reply all
Reply to author
Forward
0 new messages