Calculate Duration Omitting Overlapping Dates

115 views
Skip to first unread message

Tom Oommen

unread,
Aug 15, 2014, 12:44:22 PM8/15/14
to lubr...@googlegroups.com
Hello Lubridate team,
Sorry I'm very new to R and I'm not a data expert.  I'm trying to calculate a duration omitting overlapping dates. I suspect lubridate is the answer.
My data set looks like this:

patientnumber rxnumber            startdate         stopdate
100                   1                       1/1/2014          1/5/2014
100                   2                       1/1/2014          1/5/2014
100                   3                       1/20/2014         1/22/2014
200                   4                         2/14/2014        2/14/2014
200                   5                         2/15/2014        2/20/2014


I'd like to calculate obtain a value for patient 100 of 8 (5 + 3) and 7 for patient 200 (1 +6)  to calculate a total exposure for each patient.

The way I think I need to approach this is. Calculate the minimum start date, and maximum stop date for each patient then use a counter variable to count starting from the minimum start date.
If the counter variable overlaps with one of the intervals then add one and move along. If it doesn't, just move along until the max stop date is reached.  

I just don't know how to code this.  This would be the most complex coding I've done in R and the first time I'd use a loop.

Any help would be appreciated,
Tom

Garrett Grolemund

unread,
Aug 15, 2014, 3:37:47 PM8/15/14
to lubr...@googlegroups.com
Tom,

I'm not sure I understand what you want to do. For example, I do not know where the 5 + 3 comes from for patient 100.

If one step in your analysis is to determine the min start and max stop date for each patient, then I suggest that you learn how to use the dplyr package (or the plyr) package to do group-wise operations. dplyr will save you a lot of headaches and time versus using loops. And dplyr has a very detailed vignette (tutorial) here http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

Good luck,
Garrett



--
You received this message because you are subscribed to the Google Groups "lubridate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lubridate+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tom Oommen

unread,
Aug 15, 2014, 9:55:00 PM8/15/14
to lubr...@googlegroups.com
Hi Garrett,
Appreciate the insight. I suspect if I transpose the data this may be easier. I'll take a look at the vignette.

Hope the below makes my question clearer (sorry for the new data)



Assume patient # is constant
rx#     rxstart      rxend      duration overlap
1      2013-03-26     2013-03-26     1       0
2      2013-03-27     2013-03-27     1       0
3      2013-03-27     2013-03-27     1       1
4      2013-03-27     2013-03-30     4       1
5      2013-03-28     2013-03-28     1       1

Total Duration = 8
Total Overlapping Days = 3
Total days excluding overlap = 5


Does this make abit more sense?

Garrett Grolemund

unread,
Aug 15, 2014, 11:50:40 PM8/15/14
to lubr...@googlegroups.com
Tom,

Thanks for the clarification. I might be a little dense because I'm still not sure I get it. Do you want to calculate the number of days that pass from when a patient first enters the dataset until they leave the data set?

In that case, you can use dplyr to extract the start and stop date for each patient and use lubridate to count the days. 

You can return the days as a duration object with
as.duration(interval(startdate, enddate))

You can return the days as a number with
interval(startdate, enddate) / ddays(1)

If you need to avoid counting gaps between the start date and end date for each patient, things will be much more complicated, and the approach that you suggest might work best if you have the time to let it run.

Garrett

p.s. You probably know this already, but lubridate has a vignette that can help too, http://cran.r-project.org/web/packages/lubridate/vignettes/lubridate.html. It highlights some functions that work with intervals that you might find useful: int_overlaps, int_start, int_end, int_flip, int_shift, int_aligns, union, intersect, setdiff, and %within%

Tom Oommen

unread,
Aug 16, 2014, 12:34:34 PM8/16/14
to lubr...@googlegroups.com
Hi Garret,
I apologize I'm not doing the best at explaining what I want. The lubridate vignette was helpful, I actually hadn't seen that. 
I want to calculate a total days drug exposure.  I'm able to count the total days but if a patient receives 2 therapies concurrently for 5 days and I sum both the duration it actually looks like they got 10 days. 

I actually found a SAS paper on exactly what I want to do last night. Now I just need to get an understanding of SAS.

Thanks again for your help.
Best,
Tom
Reply all
Reply to author
Forward
0 new messages