Re: [R] how to calculate the mean in a period of time?

2 views
Skip to first unread message

arun

unread,
May 7, 2013, 3:08:53 PM5/7/13
to GUANGUAN LUO, R help
Hi,
Your question is still not clear.
May be this helps:

dat2<- read.table(text="
patient_id      t         scores
1                      0                1.6
1                      1                2.6
1                      2                 2.2
1                      3                 1.8
2                      0                  2.3
2                       2                 2.5
2                      4                  2.6
2                       5                 1.5
",sep="",header=TRUE)

library(plyr)
 dat2New<-ddply(dat2,.(patient_id),summarize,t=seq(min(t),max(t)))
 res<-join(dat2New,dat2,type="full")
res1<-do.call(rbind,lapply(split(res,res$patient_id),function(x) {x1<-x[x$t!=0,];do.call(rbind,lapply(split(x1,((x1$t-1)%/%3)+1),function(y) {y1<-if(any(y$t==1)) rbind(x[x$t==0,],y) else y; data.frame(patient_id=unique(y1$patient_id),scores=mean(y1$scores,na.rm=TRUE))}) ) }))
 row.names(res1)<-1:nrow(res1)
res1$period<-with(res1,ave(patient_id,patient_id,FUN=seq))
 res1
#  patient_id scores period
#1          1   2.05      1
#2          2   2.40      1
#3          2   2.05      2


A.K.

________________________________
From: GUANGUAN LUO <guang...@gmail.com>
To: arun <smartp...@yahoo.com>
Sent: Tuesday, May 7, 2013 11:29 AM
Subject: Re: how to calculate the mean in a period of time?

Yes , as you have said, probably , it's not continuous.


2013/5/7 arun <smartp...@yahoo.com>

Hi,
>Your question is not clear.  You mentioned to calculate the mean of 3 months, but infact you added the scores for t=0,1,2,3 as first 3 months, then possibly 4,5,6 as the next.  So, it is not exactly three months.  Isn't it?
>
>
>Dear R experts,
>sorry to trouble you again.
>My data is like this now :
>patient_id      t         scores
>1                      0                1.6
>1                      1                2.6
>1                      2                 2.2
>1                      3                 1.8
>2                      0                  2.3
>2                       2                 2.5
>2                      4                  2.6
>2                       5                 1.5
>
>I want to calculate the mean of period of 3 months, just get a table like this
>
>patient_id     period     scores
>1                            1           2.05                      (1.6+2.6+2.2+1.8)/4
>2                            1               2.4                     (2.3+2.5)/2
>2                            2               2.05                    (2.6+1.5)/2
>
>thank you in avance
>

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

unread,
May 16, 2013, 2:38:32 PM5/16/13
to GUANGUAN LUO, R help
Hi,
The output you showed is not clear especially the for the scores3,
  2                         2           2011-09-22    3     65            2.6       0.8            0.8
2                        3             2011-10-26   4      34            2.7     0.8            0.8
3                        1             2011-07-17    0     NA           2.8      0.5             0.6
In the input data, the scores3 column didn't had 0.8.


This is what I got:
dat1<- read.table(text="
patient_id      number      responsed_at      delais      scores1   scores2   scores3
 1                      1            2010-05-26         NA           2.6       0.5            0.7  
 1                      2             2010-07-07         42            2.5       NA           NA       
 1                       3            2010-08-14         38            2.3       NA           NA  
 1                       4            2010-10-01          48            2.5       0.7           0.6
 1                       5            2010-12-01          61            2.5       NA           NA
2                        1            2011-07-19          NA           2.5       0.8            0.5
2                         2           2011-09-22          65            2.6       NA            NA
2                        3             2011-10-26          34            2.7      NA            NA
3                        1             2011-07-17         NA           2.8      0.5             0.6
3                        2            2011-10-30          103           2.6      NA             NA
3                         3              2011-12-23        54            2.5      NA             NA
",sep="",header=TRUE,stringsAsFactors=FALSE)

 dat1$idx<-with(dat1,ifelse(is.na(delais)|delais<45 & delais>20, 1,ifelse(delais<60 & delais>=45,2,ifelse(delais<=90 & delais>=60,3,NA))))
library(zoo)
res<-do.call(rbind,lapply(split(dat1,dat1$patient_id),function(x) {x$idx[as.logical(cumsum(is.na(x$idx)))]<-NA; x1<-x[!is.na(x$idx),]; x1[,6:8]<-na.locf(x1[,6:8]);x2<-x1[rep(seq_len(nrow(x1)),x1$idx),]; x2$delais[duplicated(x2$delais,fromLast=TRUE)]<-0; x2$t<-seq(0,nrow(x2)-1,1);x2}))


row.names(res)<- 1:nrow(res)
 res1<- res[,c(1:3,9,4:7)]
res1
#   patient_id number responsed_at t delais scores1 scores2 scores3
#1           1      1   2010-05-26 0     NA     2.6     0.5     0.7
#2           1      2   2010-07-07 1     42     2.5     0.5     0.7
#3           1      3   2010-08-14 2     38     2.3     0.5     0.7
#4           1      4   2010-10-01 3      0     2.5     0.7     0.6
#5           1      4   2010-10-01 4     48     2.5     0.7     0.6
#6           1      5   2010-12-01 5      0     2.5     0.7     0.6
#7           1      5   2010-12-01 6      0     2.5     0.7     0.6
#8           1      5   2010-12-01 7     61     2.5     0.7     0.6
#9           2      1   2011-07-19 0     NA     2.5     0.8     0.5
#10          2      2   2011-09-22 1      0     2.6     0.8     0.5
#11          2      2   2011-09-22 2      0     2.6     0.8     0.5
#12          2      2   2011-09-22 3     65     2.6     0.8     0.5
#13          2      3   2011-10-26 4     34     2.7     0.8     0.5
#14          3      1   2011-07-17 0     NA     2.8     0.5     0.6
A.K.

________________________________
From: GUANGUAN LUO <guang...@gmail.com>
To: arun <smartp...@yahoo.com>

Sent: Thursday, May 16, 2013 12:05 PM


Subject: Re: how to calculate the mean in a period of time?

Hello, AK,
now i have a problem really complicated for me,

Now my table is like this:

patient_id      number      responsed_at      delais      scores1   scores2   scores3
 1                      1            2010-05-26         NA           2.6       0.5            0.7   
 1                      2             2010-07-07         42            2.5       NA           NA        
 1                       3            2010-08-14         38            2.3       NA           NA   
 1                       4            2010-10-01          48            2.5       0.7           0.6
 1                       5            2010-12-01          61            2.5       NA           NA
2                        1            2011-07-19          NA           2.5       0.8            0.5
2                         2           2011-09-22          65            2.6       NA            NA
2                        3             2011-10-26          34            2.7      NA            NA
3                        1             2011-07-17         NA           2.8      0.5             0.6
3                        2            2011-10-30          103           2.6      NA             NA
3                         3              2011-12-23        54            2.5      NA             NA

explications: delais = the date of "responsed_at"  - the date of precedent "responsed_at"
scores1 is measured every month
scores2 and 3 are measured every three months

first thing is :   if the 20<delais <45, this count one month of delais
if the 45<=delais <60, this count two month of delais,so add one line with the copy of the precedent line,and change the delais to 0

if the 60 <= delais <=90, this count three month of delais, so add two lines with the copy of the precedent line,and change these two delais to 0

if the delais >90, delete all the following lines

and add a column "t", "t" means the month, "t" is in order

second thing is :
I want to replace NA of scores2 and scores3  with the precedent scores

and finally get a table like this:

patient_id      number      responsed_at  t    delais      scores1   scores2   scores3
 1                      1            2010-05-26    0     NA           2.6       0.5            0.7    # scores2 and 3 are mesured every 3
 1                      2             2010-07-07    1     42            2.5       0.5           0.7     ## months,replace the following with       precedent  numbers
 1                       3            2010-08-14     2    38            2.3       0.5           0.7    # copy this line
1                        3             2010-08-14    3    0             2.3        0.5           0.7   # add one line here and change delais to 0
 1                       4            2010-10-01    4     48            2.5       0.7           0.6
1                        4            2010-10-01    5     0              2.5       0.7           0.6
1                        4            2010-10-01    6     0               2.5       0.7           0.6
 1                       5            2010-12-01    7      61            2.5       0.7           0.6
2                        1            2011-07-19    0      NA           2.5       0.8            0.5
2                        1            2011-07-19    1      0           2.5       0.8            0.5
2                        1            2011-07-19    2      0           2.5       0.8            0.5
2                         2           2011-09-22    3     65            2.6       0.8            0.8
2                        3             2011-10-26   4      34            2.7     0.8            0.8
3                        1             2011-07-17    0     NA           2.8      0.5             0.6
(3                        2            2011-10-30          103           2.6      0.5            0.6)# delete these 2 line                              
(3                         3              2011-12-23        54            2.5      0.5             0.6) ## and delete the following lines of this patient because the delais is 103 which superior to 90

Do you know how can i get this?
Thank you so much

GG


2013/5/9 GUANGUAN LUO <guang...@gmail.com>

Hello,
>Because according to the database, the first time of collecting the data is just for an entry, an inclusion of the patients. What i should focus on is the difference between two month. If i don't include the t0, i think i will lose the information. I don't know if this is pertinent.
>Thank you very much for your help, AK

>GG
>
>

>2013/5/9 arun <smartp...@yahoo.com>
>
>HI GG,
>>I did the code according the calculation shown by you.  For example, in the first case, you had scores of  (0+1+2+3)/4, which is not 3 months.  It is 3+ initial month.  After that it is only 3 months that is repeating.  So, my code is doing this:
>>   1st group (the scores)
>>(0+1+2+3)/4
>>2nd group( scores)
>>(5+6+7)/3
>>3rd group
>>(8+9+10)/3
>>
>>Here, the months are used only for illustration.  I am taking the mean of the scores.
>>
>>So, if you are only doing every 3 months, why do you need to combine the initial scores with the first 3 months?  Anyway, I did what you asked for.
>>
>>
>>Regards,
>>Arun


>>
>>
>>________________________________
>>From: GUANGUAN LUO <guang...@gmail.com>
>>To: arun <smartp...@yahoo.com>

>>Sent: Thursday, May 9, 2013 4:19 AM


>>
>>Subject: Re: how to calculate the mean in a period of time?
>>
>>
>>

>>Hi,Thank you so much for your help. I have seen your code. It's very complicated. The time 0 is the beginning of collecting all the information.
>>
>>The time 1 is the first month when follow the patients, time 2 is the second month ect. If i want to calculate the means of every 3 month do you think i should do like this?
>>
>>GG

arun

unread,
May 17, 2013, 10:36:52 AM5/17/13
to GUANGUAN LUO, R help

Hi,
Try this:


 dat1$idx<-with(dat1,ifelse(is.na(delais)|delais<45 &
delais>20, 1,ifelse(delais<60 &
delais>=45,2,ifelse(delais<=90 & delais>=60,3,NA))))

dat1$idx1<-c(dat1$idx[-head(dat1$idx,1)],1)

library(zoo)
res1<-do.call(rbind,lapply(split(dat1,dat1$patient_id),function(x) {x$idx[as.logical(cumsum(is.na(x$idx)))]<-NA; x1<-x[!is.na(x$idx),]; x1[,6:8]<-na.locf(x1[,6:8]);x1$idx1[is.na(x1$idx1)]<-1; x2<-x1[rep(seq_len(nrow(x1)),x1$idx1),]; x2$delais[duplicated(x2$delais,fromLast=FALSE)]<-0; x2$t<-seq(0,nrow(x2)-1,1);x2[,-c(8:9)]}))
 row.names(res1)<- 1:nrow(res1)
 res2<- res1[,c(1:3,8,4:7)]
 res2


#   patient_id number responsed_at t delais scores1 scores2 scores3
#1           1      1   2010-05-26 0     NA     2.6     0.5     0.7
#2           1      2   2010-07-07 1     42     2.5     0.5     0.7
#3           1      3   2010-08-14 2     38     2.3     0.5     0.7

#4           1      3   2010-08-14 3      0     2.3     0.5     0.7


#5           1      4   2010-10-01 4     48     2.5     0.7     0.6

#6           1      4   2010-10-01 5      0     2.5     0.7     0.6
#7           1      4   2010-10-01 6      0     2.5     0.7     0.6


#8           1      5   2010-12-01 7     61     2.5     0.7     0.6
#9           2      1   2011-07-19 0     NA     2.5     0.8     0.5

#10          2      1   2011-07-19 1      0     2.5     0.8     0.5
#11          2      1   2011-07-19 2      0     2.5     0.8     0.5


#12          2      2   2011-09-22 3     65     2.6     0.8     0.5
#13          2      3   2011-10-26 4     34     2.7     0.8     0.5
#14          3      1   2011-07-17 0     NA     2.8     0.5     0.6
A.K.

________________________________
From: GUANGUAN LUO <guang...@gmail.com>
To: arun <smartp...@yahoo.com>

Sent: Friday, May 17, 2013 9:33 AM


Subject: Re: how to calculate the mean in a period of time?

Hello,
Thank you for your help
the lines added to the tables are the precedent lines but not the followed lines, if i just change x2<-x1[rep(seq_len(nrow(x1)-1), is that ok? and so the delais should be changed too, isn't it?


GG


2013/5/17 arun <smartp...@yahoo.com>

Hi,
>No problem.


>
>Arun
>
>
>
>
>
>
>________________________________
>From: GUANGUAN LUO <guang...@gmail.com>
>To: arun <smartp...@yahoo.com>

>Sent: Friday, May 17, 2013 4:19 AM


>
>Subject: Re: how to calculate the mean in a period of time?
>
>
>

>Ah, yes, that is the wrong thing i have written. Thank you so much. the output which you have got is right.
>Thanks a lot.
>GG
>
>
>2013/5/16 arun <smartp...@yahoo.com>

arun

unread,
May 22, 2013, 12:55:08 PM5/22/13
to GUANGUAN LUO, R help
HI GG,
I thought you were referring about the solution that I sent today.  Sorry, I didn't check it.
TRy this:


dat1<- read.table(text="
patient_id      number      responsed_at      delais      scores1   scores2   scores3
 1                      1            2010-05-26         NA           2.6       0.5            0.7 
 1                      2             2010-07-07         42            2.5       NA           NA      
 1                       3            2010-08-14         38            2.3       NA           NA 
 1                       4            2010-10-01          48            2.5       0.7           0.6
 1                       5            2010-12-01          61            2.5       NA           NA
2                        1            2011-07-19          NA           2.5       0.8            0.5
2                         2           2011-09-22          65            2.6       NA            NA
2                        3             2011-10-26          34            2.7      NA            NA
3                        1             2011-07-17         NA           2.8      0.5             0.6
3                        2            2011-10-30          103           2.6      NA             NA
3                         3              2011-12-23        54            2.5      NA             NA
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat1$idx<-with(dat1,ifelse(is.na(delais)|delais<45 &
delais>20, 1,ifelse(delais<60 &
delais>=45,2,ifelse(delais<=90 & delais>=60,3,NA))))

#deleted lines
dat1[as.logical(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x))))),-8]
#   patient_id number responsed_at delais scores1 scores2 scores3
#10          3      2   2011-10-30    103     2.6      NA      NA
#11          3      3   2011-12-23     54     2.5      NA      NA

 nrow(dat1[as.logical(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x))))),-8])
#[1] 2
#or
table(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x)))))[2]
#1
#2

#added lines
datSub<-dat1[!as.logical(with(dat1,ave(idx,patient_id,FUN=function(x) cumsum(is.na(x))))),]
#count
 sum(with(datSub,idx[idx>1 & !is.na(idx)])-1)
#[1] 5

A.K.

________________________________
From: GUANGUAN LUO <guang...@gmail.com>
To: arun <smartp...@yahoo.com>

Sent: Wednesday, May 22, 2013 11:30 AM


Subject: Re: how to calculate the mean in a period of time?

Hi Arun,
I meant to say "number of observations you have deleted".
In the table, you have deleted some lines which have a delais > 90 days, and added some lines copied from the precedent lines. Can I calculate the number of lines that you have deleted and the number of lines that you have added?

GG


2013/5/22 arun <smartp...@yahoo.com>

Hi GG,
>It is not clear what you meant by "number of observations you have detected".  Is it the number of observations in the initial dataset for each patient_id? Number of observations added also means what?  In the final result, we are taking the means for each 3 observations keeping the t="0"  along with the first cluster for each Patient_id.


>
>
>
>
>
>________________________________
>From: GUANGUAN LUO <guang...@gmail.com>
>To: arun <smartp...@yahoo.com>

>Sent: Wednesday, May 22, 2013 11:13 AM


>
>Subject: Re: how to calculate the mean in a period of time?
>
>
>

>Hello A.K,
>for this reconstructed table, can i calculate the number of the observations you have delected and the number of the observations that you have added?
>Thanks

arun

unread,
May 23, 2013, 11:34:38 AM5/23/13
to GUANGUAN LUO, R help
HI GG,
I should had checked with multiple t=0 only rows.
Apologies!
Check if this work: (Changed the thread name as the solution applies to that problem)

dat2<- read.csv("dat6.csv",header=TRUE,sep="\t",row.names=1)
str(dat2)
#'data.frame':    3896 obs. of  3 variables:
# $ patient_id: int  2 2 2 2 2 2 2 2 2 2 ...
# $ t         : int  0 1 2 3 4 5 6 7 8 9 ...
# $ basdai    : num  2.83 4.05 3.12 3.12 2.42 ...
 
library(plyr)
 dat2New<-ddply(dat2,.(patient_id),summarize,t=seq(min(t),max(t)))
 res<-join(dat2New,dat2,type="full")


 lst1<-lapply(split(res,res$patient_id),function(x) {x1<-x[x$t!=0,];do.call(rbind,lapply(split(x1,((x1$t-1)%/%3)+1),function(y) {y1<-if(any(y$t==1)) rbind(x[x$t==0,],y) else y; data.frame(patient_id=unique(y1$patient_id),t=head(y1$t,1),basdai=mean(y1$basdai,na.rm=TRUE))}) ) })

dat3<-dat2[unlist(with(dat2,tapply(t,patient_id,FUN=function(x) x==0 & length(x)==1)),use.names=FALSE),]
 head(dat3,3)
#    patient_id t basdai
#143         10 0  5.225
#555         37 0  2.450
#627         42 0  6.950

 lst2<-split(dat3,seq_len(nrow(dat3)))
 lst1[lapply(lst1,length)==0]<-mapply(rbind,lst1[lapply(lst1,length)==0],lst2,SIMPLIFY=FALSE)
res1<-do.call(rbind,lst1)
 row.names(res1)<- 1:nrow(res1)
 res2<- res1[,-2]
res2$period<-with(res2,ave(patient_id,patient_id,FUN=seq_along))
 #res2
#selected rows
res2[c(48:51,189:192,210:215),]
#    patient_id   basdai period
#48           9 3.625000      8
#49          10 5.225000      1 #t=0 only row
#50          11 6.018750      1
#51          11 6.000000      2
#189         36 6.166667      1
#190         37 2.450000      1 #t=0 only row
#191         38 3.100000      1
#192         38 3.575000      2
#210         41 1.918750      1
#211         41 4.025000      2
#212         41 2.975000      3
#213         41 1.725000      4
#214         42 6.950000      1 #t=0 only row
#215         44 4.300000      1

A.K.






________________________________
From: GUANGUAN LUO <guang...@gmail.com>
To: arun <smartp...@yahoo.com>
Sent: Thursday, May 23, 2013 9:50 AM
Subject: Re: how to calculate the mean in a period of time?



Hello, Arun, sorry to trouble you again,
I tried your method and i found that for patient_id==10 et patient_id==37 ect, the scores are repeated 51 times, I don't understand why this occured.

Thank you so much.

GG
Reply all
Reply to author
Forward
0 new messages