dat2<- read.table(text="
patient_id t scores
1 0 1.6
1 1 2.6
1 2 2.2
1 3 1.8
2 0 2.3
2 2 2.5
2 4 2.6
2 5 1.5
",sep="",header=TRUE)
library(plyr)
dat2New<-ddply(dat2,.(patient_id),summarize,t=seq(min(t),max(t)))
res<-join(dat2New,dat2,type="full")
res1<-do.call(rbind,lapply(split(res,res$patient_id),function(x) {x1<-x[x$t!=0,];do.call(rbind,lapply(split(x1,((x1$t-1)%/%3)+1),function(y) {y1<-if(any(y$t==1)) rbind(x[x$t==0,],y) else y; data.frame(patient_id=unique(y1$patient_id),scores=mean(y1$scores,na.rm=TRUE))}) ) }))
row.names(res1)<-1:nrow(res1)
res1$period<-with(res1,ave(patient_id,patient_id,FUN=seq))
res1
# patient_id scores period
#1 1 2.05 1
#2 2 2.40 1
#3 2 2.05 2
A.K.
________________________________
From: GUANGUAN LUO <guang...@gmail.com>
To: arun <smartp...@yahoo.com>
Sent: Tuesday, May 7, 2013 11:29 AM
Subject: Re: how to calculate the mean in a period of time?
Yes , as you have said, probably , it's not continuous.
2013/5/7 arun <smartp...@yahoo.com>
Hi,
>Your question is not clear. You mentioned to calculate the mean of 3 months, but infact you added the scores for t=0,1,2,3 as first 3 months, then possibly 4,5,6 as the next. So, it is not exactly three months. Isn't it?
>
>
>Dear R experts,
>sorry to trouble you again.
>My data is like this now :
>patient_id t scores
>1 0 1.6
>1 1 2.6
>1 2 2.2
>1 3 1.8
>2 0 2.3
>2 2 2.5
>2 4 2.6
>2 5 1.5
>
>I want to calculate the mean of period of 3 months, just get a table like this
>
>patient_id period scores
>1 1 2.05 (1.6+2.6+2.2+1.8)/4
>2 1 2.4 (2.3+2.5)/2
>2 2 2.05 (2.6+1.5)/2
>
>thank you in avance
>
______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
This is what I got:
dat1<- read.table(text="
patient_id number responsed_at delais scores1 scores2 scores3
1 1 2010-05-26 NA 2.6 0.5 0.7
1 2 2010-07-07 42 2.5 NA NA
1 3 2010-08-14 38 2.3 NA NA
1 4 2010-10-01 48 2.5 0.7 0.6
1 5 2010-12-01 61 2.5 NA NA
2 1 2011-07-19 NA 2.5 0.8 0.5
2 2 2011-09-22 65 2.6 NA NA
2 3 2011-10-26 34 2.7 NA NA
3 1 2011-07-17 NA 2.8 0.5 0.6
3 2 2011-10-30 103 2.6 NA NA
3 3 2011-12-23 54 2.5 NA NA
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat1$idx<-with(dat1,ifelse(is.na(delais)|delais<45 & delais>20, 1,ifelse(delais<60 & delais>=45,2,ifelse(delais<=90 & delais>=60,3,NA))))
library(zoo)
res<-do.call(rbind,lapply(split(dat1,dat1$patient_id),function(x) {x$idx[as.logical(cumsum(is.na(x$idx)))]<-NA; x1<-x[!is.na(x$idx),]; x1[,6:8]<-na.locf(x1[,6:8]);x2<-x1[rep(seq_len(nrow(x1)),x1$idx),]; x2$delais[duplicated(x2$delais,fromLast=TRUE)]<-0; x2$t<-seq(0,nrow(x2)-1,1);x2}))
row.names(res)<- 1:nrow(res)
res1<- res[,c(1:3,9,4:7)]
res1
# patient_id number responsed_at t delais scores1 scores2 scores3
#1 1 1 2010-05-26 0 NA 2.6 0.5 0.7
#2 1 2 2010-07-07 1 42 2.5 0.5 0.7
#3 1 3 2010-08-14 2 38 2.3 0.5 0.7
#4 1 4 2010-10-01 3 0 2.5 0.7 0.6
#5 1 4 2010-10-01 4 48 2.5 0.7 0.6
#6 1 5 2010-12-01 5 0 2.5 0.7 0.6
#7 1 5 2010-12-01 6 0 2.5 0.7 0.6
#8 1 5 2010-12-01 7 61 2.5 0.7 0.6
#9 2 1 2011-07-19 0 NA 2.5 0.8 0.5
#10 2 2 2011-09-22 1 0 2.6 0.8 0.5
#11 2 2 2011-09-22 2 0 2.6 0.8 0.5
#12 2 2 2011-09-22 3 65 2.6 0.8 0.5
#13 2 3 2011-10-26 4 34 2.7 0.8 0.5
#14 3 1 2011-07-17 0 NA 2.8 0.5 0.6
A.K.
________________________________
From: GUANGUAN LUO <guang...@gmail.com>
To: arun <smartp...@yahoo.com>
Sent: Thursday, May 16, 2013 12:05 PM
Subject: Re: how to calculate the mean in a period of time?
Hello, AK,
now i have a problem really complicated for me,
Now my table is like this:
patient_id number responsed_at delais scores1 scores2 scores3
1 1 2010-05-26 NA 2.6 0.5 0.7
1 2 2010-07-07 42 2.5 NA NA
1 3 2010-08-14 38 2.3 NA NA
1 4 2010-10-01 48 2.5 0.7 0.6
1 5 2010-12-01 61 2.5 NA NA
2 1 2011-07-19 NA 2.5 0.8 0.5
2 2 2011-09-22 65 2.6 NA NA
2 3 2011-10-26 34 2.7 NA NA
3 1 2011-07-17 NA 2.8 0.5 0.6
3 2 2011-10-30 103 2.6 NA NA
3 3 2011-12-23 54 2.5 NA NA
explications: delais = the date of "responsed_at" - the date of precedent "responsed_at"
scores1 is measured every month
scores2 and 3 are measured every three months
first thing is : if the 20<delais <45, this count one month of delais
if the 45<=delais <60, this count two month of delais,so add one line with the copy of the precedent line,and change the delais to 0
if the 60 <= delais <=90, this count three month of delais, so add two lines with the copy of the precedent line,and change these two delais to 0
if the delais >90, delete all the following lines
and add a column "t", "t" means the month, "t" is in order
second thing is :
I want to replace NA of scores2 and scores3 with the precedent scores
and finally get a table like this:
patient_id number responsed_at t delais scores1 scores2 scores3
1 1 2010-05-26 0 NA 2.6 0.5 0.7 # scores2 and 3 are mesured every 3
1 2 2010-07-07 1 42 2.5 0.5 0.7 ## months,replace the following with precedent numbers
1 3 2010-08-14 2 38 2.3 0.5 0.7 # copy this line
1 3 2010-08-14 3 0 2.3 0.5 0.7 # add one line here and change delais to 0
1 4 2010-10-01 4 48 2.5 0.7 0.6
1 4 2010-10-01 5 0 2.5 0.7 0.6
1 4 2010-10-01 6 0 2.5 0.7 0.6
1 5 2010-12-01 7 61 2.5 0.7 0.6
2 1 2011-07-19 0 NA 2.5 0.8 0.5
2 1 2011-07-19 1 0 2.5 0.8 0.5
2 1 2011-07-19 2 0 2.5 0.8 0.5
2 2 2011-09-22 3 65 2.6 0.8 0.8
2 3 2011-10-26 4 34 2.7 0.8 0.8
3 1 2011-07-17 0 NA 2.8 0.5 0.6
(3 2 2011-10-30 103 2.6 0.5 0.6)# delete these 2 line
(3 3 2011-12-23 54 2.5 0.5 0.6) ## and delete the following lines of this patient because the delais is 103 which superior to 90
Do you know how can i get this?
Thank you so much
GG
2013/5/9 GUANGUAN LUO <guang...@gmail.com>
Hello,
>Because according to the database, the first time of collecting the data is just for an entry, an inclusion of the patients. What i should focus on is the difference between two month. If i don't include the t0, i think i will lose the information. I don't know if this is pertinent.
>Thank you very much for your help, AK
>
>GG
>
>
>
>2013/5/9 arun <smartp...@yahoo.com>
>
>HI GG,
>>I did the code according the calculation shown by you. For example, in the first case, you had scores of (0+1+2+3)/4, which is not 3 months. It is 3+ initial month. After that it is only 3 months that is repeating. So, my code is doing this:
>> 1st group (the scores)
>>(0+1+2+3)/4
>>2nd group( scores)
>>(5+6+7)/3
>>3rd group
>>(8+9+10)/3
>>
>>Here, the months are used only for illustration. I am taking the mean of the scores.
>>
>>So, if you are only doing every 3 months, why do you need to combine the initial scores with the first 3 months? Anyway, I did what you asked for.
>>
>>
>>Regards,
>>Arun
>>
>>
>>________________________________
>>From: GUANGUAN LUO <guang...@gmail.com>
>>To: arun <smartp...@yahoo.com>
>>Sent: Thursday, May 9, 2013 4:19 AM
>>
>>Subject: Re: how to calculate the mean in a period of time?
>>
>>
>>
>>Hi,Thank you so much for your help. I have seen your code. It's very complicated. The time 0 is the beginning of collecting all the information.
>>
>>The time 1 is the first month when follow the patients, time 2 is the second month ect. If i want to calculate the means of every 3 month do you think i should do like this?
>>
>>GG
Hi,
Try this:
dat1$idx<-with(dat1,ifelse(is.na(delais)|delais<45 &
delais>20, 1,ifelse(delais<60 &
delais>=45,2,ifelse(delais<=90 & delais>=60,3,NA))))
dat1$idx1<-c(dat1$idx[-head(dat1$idx,1)],1)
library(zoo)
res1<-do.call(rbind,lapply(split(dat1,dat1$patient_id),function(x) {x$idx[as.logical(cumsum(is.na(x$idx)))]<-NA; x1<-x[!is.na(x$idx),]; x1[,6:8]<-na.locf(x1[,6:8]);x1$idx1[is.na(x1$idx1)]<-1; x2<-x1[rep(seq_len(nrow(x1)),x1$idx1),]; x2$delais[duplicated(x2$delais,fromLast=FALSE)]<-0; x2$t<-seq(0,nrow(x2)-1,1);x2[,-c(8:9)]}))
row.names(res1)<- 1:nrow(res1)
res2<- res1[,c(1:3,8,4:7)]
res2
# patient_id number responsed_at t delais scores1 scores2 scores3
#1 1 1 2010-05-26 0 NA 2.6 0.5 0.7
#2 1 2 2010-07-07 1 42 2.5 0.5 0.7
#3 1 3 2010-08-14 2 38 2.3 0.5 0.7
#4 1 3 2010-08-14 3 0 2.3 0.5 0.7
#5 1 4 2010-10-01 4 48 2.5 0.7 0.6
#6 1 4 2010-10-01 5 0 2.5 0.7 0.6
#7 1 4 2010-10-01 6 0 2.5 0.7 0.6
#8 1 5 2010-12-01 7 61 2.5 0.7 0.6
#9 2 1 2011-07-19 0 NA 2.5 0.8 0.5
#10 2 1 2011-07-19 1 0 2.5 0.8 0.5
#11 2 1 2011-07-19 2 0 2.5 0.8 0.5
#12 2 2 2011-09-22 3 65 2.6 0.8 0.5
#13 2 3 2011-10-26 4 34 2.7 0.8 0.5
#14 3 1 2011-07-17 0 NA 2.8 0.5 0.6
A.K.
________________________________
From: GUANGUAN LUO <guang...@gmail.com>
To: arun <smartp...@yahoo.com>
Sent: Friday, May 17, 2013 9:33 AM
Subject: Re: how to calculate the mean in a period of time?
Hello,
Thank you for your help
the lines added to the tables are the precedent lines but not the followed lines, if i just change x2<-x1[rep(seq_len(nrow(x1)-1), is that ok? and so the delais should be changed too, isn't it?
GG
2013/5/17 arun <smartp...@yahoo.com>
Hi,
>No problem.
>
>Arun
>
>
>
>
>
>
>________________________________
>From: GUANGUAN LUO <guang...@gmail.com>
>To: arun <smartp...@yahoo.com>
>Sent: Friday, May 17, 2013 4:19 AM
>
>Subject: Re: how to calculate the mean in a period of time?
>
>
>
>Ah, yes, that is the wrong thing i have written. Thank you so much. the output which you have got is right.
>Thanks a lot.
>GG
>
>
>2013/5/16 arun <smartp...@yahoo.com>
dat1<- read.table(text="
patient_id number responsed_at delais scores1 scores2 scores3
1 1 2010-05-26 NA 2.6 0.5 0.7
1 2 2010-07-07 42 2.5 NA NA
1 3 2010-08-14 38 2.3 NA NA
1 4 2010-10-01 48 2.5 0.7 0.6
1 5 2010-12-01 61 2.5 NA NA
2 1 2011-07-19 NA 2.5 0.8 0.5
2 2 2011-09-22 65 2.6 NA NA
2 3 2011-10-26 34 2.7 NA NA
3 1 2011-07-17 NA 2.8 0.5 0.6
3 2 2011-10-30 103 2.6 NA NA
3 3 2011-12-23 54 2.5 NA NA
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat1$idx<-with(dat1,ifelse(is.na(delais)|delais<45 &
delais>20, 1,ifelse(delais<60 &
delais>=45,2,ifelse(delais<=90 & delais>=60,3,NA))))
#deleted lines
dat1[as.logical(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x))))),-8]
# patient_id number responsed_at delais scores1 scores2 scores3
#10 3 2 2011-10-30 103 2.6 NA NA
#11 3 3 2011-12-23 54 2.5 NA NA
nrow(dat1[as.logical(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x))))),-8])
#[1] 2
#or
table(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x)))))[2]
#1
#2
#added lines
datSub<-dat1[!as.logical(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x))))),]
#count
sum(with(datSub,idx[idx>1 & !is.na(idx)])-1)
#[1] 5
A.K.
________________________________
From: GUANGUAN LUO <guang...@gmail.com>
To: arun <smartp...@yahoo.com>
Sent: Wednesday, May 22, 2013 11:30 AM
Subject: Re: how to calculate the mean in a period of time?
Hi Arun,
I meant to say "number of observations you have deleted".
In the table, you have deleted some lines which have a delais > 90 days, and added some lines copied from the precedent lines. Can I calculate the number of lines that you have deleted and the number of lines that you have added?
GG
2013/5/22 arun <smartp...@yahoo.com>
Hi GG,
>It is not clear what you meant by "number of observations you have detected". Is it the number of observations in the initial dataset for each patient_id? Number of observations added also means what? In the final result, we are taking the means for each 3 observations keeping the t="0" along with the first cluster for each Patient_id.
>
>
>
>
>
>________________________________
>From: GUANGUAN LUO <guang...@gmail.com>
>To: arun <smartp...@yahoo.com>
>Sent: Wednesday, May 22, 2013 11:13 AM
>
>Subject: Re: how to calculate the mean in a period of time?
>
>
>
>Hello A.K,
>for this reconstructed table, can i calculate the number of the observations you have delected and the number of the observations that you have added?
>Thanks