Re: [R] random effects model

93 views
Skip to first unread message

rex2013

unread,
Jan 6, 2013, 4:55:53 AM1/6/13
to r-h...@r-project.org
Hi A.K

Regarding my question on comparing normal/ obese/overweight with blood
pressure change, I did finally as per the first suggestion of stacking the
data and creating a normal category . This only gives me a obese not obese
14, but when I did with the wide format hoping to get a
obese14,normal14,overweight 14 Vs hibp 21, i could not complete any of the
models.
This time I classified obese=1 & overweight=1 as obese itself.

Can you tell me if I can use the geese or geeglm function with this data
eg: : HIBP~ time* Age
Here age is a factor with 3 levels, time: 2 levels, HIBP = yes/no.

It says geese/geeglm: contrast can be applied only with factor with 2 or
more levels. What is the way to overcome this. Can I manipulate the data to
make it work.

I need to know if the demogrphic variables affect change in blood pressure
status over time?

How to get the p values with gee model?

Thanks
On Thu, Jan 3, 2013 at 5:06 AM, arun kirshna [via R] <
ml-node+s789...@n4.nabble.com> wrote:

> HI Rex,
> If I take a small subset from your whole dataset, and go through your
> codes:
> BP_2b<-read.csv("BP_2b.csv",sep="\t")
> BP.sub<-BP_2b[410:418,c(1,8:11,13)] #deleted the columns that are not
> needed
> BP.stacknormal<- subset(BP.subnew,Obese14==0 & Overweight14==0)
> BP.stackObese <- subset(BP.subnew,Obese14==1)
> BP.stackOverweight <- subset(BP.subnew,Overweight14==1)
> BP.stacknormal$Categ<-"Normal14"
> BP.stackObese$Categ<-"Obese14"
> BP.stackOverweight$Categ <- "Overweight14"
> BP.newObeseOverweightNormal<-rbind(BP.stacknormal,BP.stackObese,BP.stackOverweight)
>
> BP.newObeseOverweightNormal
> # CODEA Obese14 Overweight14 Overweight21 Obese21 hibp21 Categ
> #411 541 0 0 0 0 0 Normal14
> #415 545 0 0 1 1 1 Normal14
> #418 549 0 0 1 0 0 Normal14
> #413 543 1 0 1 1 0 Obese14
> #417 548 0 1 1 0 0 Overweight14
> BP.newObeseOverweightNormal$Categ<-
> factor(BP.newObeseOverweightNormal$Categ)
> BP.stack3 <-
> reshape(BP.newObeseOverweightNormal,idvar="CODEA",timevar="time",sep="_",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21")),v.names=c("Obese","Overweight"),direction="long")
>
> library(car)
> BP.stack3$time<-recode(BP.stack3$time,"1=14;2=21")
> BP.stack3 #Here Normal14 gets repeated even at time==21. Given that you
> are using the "Categ" and "time" #columns in the analysis, it will give
> incorrect results.
> # CODEA hibp21 Categ time Obese Overweight
> #541.1 541 0 Normal14 14 0 0
> #545.1 545 1 Normal14 14 0 0
> #549.1 549 0 Normal14 14 0 0
> #543.1 543 0 Obese14 14 1 0
> #548.1 548 0 Overweight14 14 0 1
> #541.2 541 0 Normal14 21 0 0
> #545.2 545 1 Normal14 21 1 1
> #549.2 549 0 Normal14 21 0 1
> #543.2 543 0 Obese14 21 1 1
> #548.2 548 0 Overweight14 21 0 1
> #Even if I correct the above codes, this will give incorrect
> results/(error as you shown) because the response variable (hibp21) gets
> #repeated when you reshape it from wide to long.
>
> The correct classification might be:
> BP_2b<-read.csv("BP_2b.csv",sep="\t")
> BP.sub<-BP_2b[410:418,c(1,8:11,13)]
> BP.subnew<-reshape(BP.sub,idvar="CODEA",timevar="time",sep="",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21")),v.names=c("Obese","Overweight"),direction="long")
>
> BP.subnew$time<-recode(BP.subnew$time,"1=14;2=21")
> BP.subnew<-na.omit(BP.subnew)
>
> BP.subnew$Categ[BP.subnew$Overweight==1 & BP.subnew$time==14 &
> BP.subnew$Obese==0]<-"Overweight14"
> BP.subnew$Categ[BP.subnew$Overweight==1 & BP.subnew$time==21 &
> BP.subnew$Obese==0]<-"Overweight21"
> BP.subnew$Categ[BP.subnew$Obese==1 & BP.subnew$time==14 &
> BP.subnew$Overweight==0]<-"Obese14"
> BP.subnew$Categ[BP.subnew$Obese==1 & BP.subnew$time==21 &
> BP.subnew$Overweight==0]<-"Obese21"
> BP.subnew$Categ[BP.subnew$Overweight==1 & BP.subnew$time==21&
> BP.subnew$Obese==1]<-"ObeseOverweight21"
> BP.subnew$Categ[BP.subnew$Overweight==1 & BP.subnew$time==14&
> BP.subnew$Obese==1]<-"ObeseOverweight14"
> BP.subnew$Categ[BP.subnew$Overweight==0 & BP.subnew$Obese==0
> &BP.subnew$time==14]<-"Normal14"
> BP.subnew$Categ[BP.subnew$Overweight==0 & BP.subnew$Obese==0
> &BP.subnew$time==21]<-"Normal21"
>
> BP.subnew$Categ<-factor(BP.subnew$Categ)
> BP.subnew$time<-factor(BP.subnew$time)
> BP.subnew
> # CODEA hibp21 time Obese Overweight Categ
> #541.1 541 0 14 0 0 Normal14
> #543.1 543 0 14 1 0 Obese14
> #545.1 545 1 14 0 0 Normal14
> #548.1 548 0 14 0 1 Overweight14
> #549.1 549 0 14 0 0 Normal14
> #541.2 541 0 21 0 0 Normal21
> #543.2 543 0 21 1 1 ObeseOverweight21
> #545.2 545 1 21 1 1 ObeseOverweight21
> #548.2 548 0 21 0 1 Overweight21
> #549.2 549 0 21 0 1 Overweight21
>
> #NOw with the whole dataset:
> BP.sub<-BP_2b[,c(1,8:11,13)] #change here and paste the above lines:
> head(BP.subnew)
> # CODEA hibp21 time Obese Overweight Categ
> #3.1 3 0 14 0 0 Normal14
> #7.1 7 0 14 0 0 Normal14
> #8.1 8 0 14 0 0 Normal14
> #9.1 9 0 14 0 0 Normal14
> #14.1 14 1 14 0 0 Normal14
> #21.1 21 0 14 0 0 Normal14
>
> tail(BP.subnew)
> # CODEA hibp21 time Obese Overweight Categ
> #8485.2 8485 0 21 1 1 ObeseOverweight21
> #8506.2 8506 0 21 0 1 Overweight21
> #8520.2 8520 0 21 0 0 Normal21
> #8529.2 8529 1 21 1 1 ObeseOverweight21
> #8550.2 8550 0 21 1 1 ObeseOverweight21
> #8554.2 8554 0 21 0 0 Normal21
>
> summary(lme.1 <- lme(hibp21~time+Categ+ time*Categ,
> data=BP.subnew,random=~1|CODEA, na.action=na.omit))
> #Error in MEEM(object, conLin, control$niterEM) :
> #Singularity in backsolve at level 0, block 1
> #May be because of the reasons I mentioned above.
>
> #YOu didn't mention the library(gee)
> BP.gee8 <- gee(hibp21~time+Categ+time*Categ,
> data=BP.subnew,id=CODEA,family=binomial,
> corstr="exchangeable",na.action=na.omit)
> #Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
> #Error in gee(hibp21 ~ time + Categ + time * Categ, data = BP.subnew, id =
> CODEA, :
> #rank-deficient model matrix
> With your codes, it might have worked, but the results may be inaccurate
> # After running your whole codes:
> BP.gee8 <- gee(hibp21~time+Categ+time*Categ,
> data=BP.stack3,id=CODEA,family=binomial,
> corstr="exchangeable",na.action=na.omit)
> #Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
> #running glm to get initial regression estimate
> # (Intercept) time CategObese14
> # -2.456607e+01 9.940875e-15 2.087584e-13
> # CategOverweight14 time:CategObese14 time:CategOverweight14
> # 2.087584e-13 -9.940875e-15 -9.940875e-15
> #Error in gee(hibp21 ~ time + Categ + time * Categ, data = BP.stack3, id =
> CODEA, :
> # Cgee: error: logistic model for probability has fitted value very close
> to 1.
> #estimates diverging; iteration terminated.
>
> In short, I think it would be better to go with the suggestion in my
> previous email with adequate changes in "Categ" variable (adding
> ObeseOverweight14, ObeseOverweight21 etc) as I showed here.
>
> A.K.
>
>
>
>
>
>
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654438.html
> To unsubscribe from random effects model, click here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4654346&code=dXNoYS5uYXRoYW5AZ21haWwuY29tfDQ2NTQzNDZ8MjAyMjE1NDI0>
> .
> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654776.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

arun

unread,
Jan 6, 2013, 2:23:05 PM1/6/13
to rex2013, R help
Hi,

I am  not very familiar with the geese/geeglm().  Is it from library(geepack)?
Regarding your question:
"
Can you tell me if I can use the geese or geeglm function with this data
eg: : HIBP~ time* Age
Here age is a factor with 3 levels, time: 2 levels, HIBP = yes/no.

From your original data:
BP_2b<-read.csv("BP_2b.csv",sep="\t")
head(BP_2b,2)
#  CODEA Sex MaternalAge Education Birthplace AggScore IntScore Obese14
#1     1  NA           3         4          1       NA       NA      NA
#2     3   2           3         3          1        0        0       0
 # Overweight14 Overweight21 Obese21 hibp14 hibp21
#1           NA           NA      NA     NA     NA
#2            0            1       0      0      0

If I understand your new classification:
BP.stacknormal<- subset(BP_2b,Obese14==0 & Overweight14==0 & Obese21==0 & Overweight21==0)
BP.stackObese <- subset(BP_2b,(Obese14==1& Overweight14==0 & Obese14==1&Overweight14==1)|(Obese14==1&Overweight14==1 & Obese21==1 & Overweight21==0)|(Obese14==1&Overweight14==0 & Obese21==0 & Overweight21==0)|(Obese14==0 & Overweight14==0 & Obese21==1 & Overweight21==0)|(Obese14==0 & Overweight14==0 & Obese21==1 & Overweight21==1)|(Obese14==0 & Overweight14==1 & Obese21==1 &Overweight21==1)|(Obese14==1& Overweight14==1 & Obese21==1& Overweight21==1)) #check whether there are more classification that fits to #Obese
 BP.stackOverweight <- subset(BP_2b,(Obese14==0 & Overweight14==1 & Obese21==0 & Overweight21==1)|(Obese14==0 &Overweight14==1 & Obese21==0 & Overweight21==0)|(Obese14==0 & Overweight14==0 & Obese21==0 & Overweight21==1))
BP.stacknormal$Categ<-"Normal"
BP.stackObese$Categ<-"Obese"
BP.stackOverweight$Categ <- "Overweight"
 BP.newObeseOverweightNormal<-na.omit(rbind(BP.stacknormal,BP.stackObese,BP.stackOverweight))
 nrow(BP.newObeseOverweightNormal)
#[1] 1581
BP.stack3 <- reshape(BP.newObeseOverweightNormal,idvar="CODEA",timevar="time",sep="_",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21"),c("hibp14","hibp21")),v.names=c("Obese","Overweight","hibp"),direction="long")
library(car)
BP.stack3$time<-recode(BP.stack3$time,"1=14;2=21")
head(BP.stack3,2)
  #  CODEA Sex MaternalAge Education Birthplace AggScore IntScore  Categ time
#8.1     8   2           4         4          1        0        0 Normal   14
#9.1     9   1           3         6          2        0        0 Normal   14
  #  Obese Overweight hibp
#8.1     0          0    0

Now, your formula: (HIBP~time*Age), is it MaternalAge?
If it is, it has three values
unique(BP.stack3$MaternalAge)
#[1] 4 3 5
and for time (14,21) # If it says that geese/geeglm, contrasts could be applied with factors>=2 levels, what is the problem?
If you take "Categ" variable, it also has 3 levels (Normal, Obese, Overweight).

 BP.stack3$MaternalAge<-factor(BP.stack3$MaternalAge)
 BP.stack3$time<-factor(BP.stack3$time)

library(geepack)
For your last question about how to get the p-values:
# Using one of the example datasets:
data(seizure)
     seiz.l <- reshape(seizure,
                       varying=list(c("base","y1", "y2", "y3", "y4")),
                       v.names="y", times=0:4, direction="long")
     seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),]
     seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2)
     seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1)
     m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
                 data=seiz.l, corstr="exch", family=poisson)
     summary(m1)
    
 summary(m1)$mean["p"]
#                    p
#(Intercept) 0.0000000
#x           0.3347040
#trt         0.9011982
#x:trt       0.6236769


#If you need the p-values of the scale
   summary(m1)$scale["p"]
 #                   p
#(Intercept) 0.0254634

Hope it helps.

A.K.

rex2013

unread,
Jan 7, 2013, 6:15:23 AM1/7/13
to r-h...@r-project.org
Hi A.K

Below is the comment I get, not sure why.

BP.sub3 is the stacked data without the missing values.

BP.geese3 <- geese(HiBP~time*MaternalAge,data=BP.sub3,id=CODEA,
family=binomial, corstr="unstructured", na.action=na.omit)Error in
`contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

Even though age has 3 levels; time has 14 years & 21 years; HIBP is a
binary response outcome.

2) When you mentioned summary(m1)$mean["p"] what did the p mean? i
used this in one of the gee command, it produced NA as answer?

Many thanks
> From: rex2013 <[hidden email]<http://user/SendEmail.jtp?type=node&node=4654795&i=0>>
>
> To: [hidden email] <http://user/SendEmail.jtp?type=node&node=4654795&i=1>
> Cc:
> Sent: Sunday, January 6, 2013 4:55 AM
> Subject: Re: [R] random effects model
>
> Hi A.K
>
> Regarding my question on comparing normal/ obese/overweight with blood
> pressure change, I did finally as per the first suggestion of stacking the
> data and creating a normal category . This only gives me a obese not obese
> 14, but when I did with the wide format hoping to get a
> obese14,normal14,overweight 14 Vs hibp 21, i could not complete any of the
> models.
> This time I classified obese=1 & overweight=1 as obese itself.
>
> Can you tell me if I can use the geese or geeglm function with this data
> eg: : HIBP~ time* Age
> Here age is a factor with 3 levels, time: 2 levels, HIBP = yes/no.
>
> It says geese/geeglm: contrast can be applied only with factor with 2 or
> more levels. What is the way to overcome this. Can I manipulate the data
> to
> make it work.
>
> I need to know if the demogrphic variables affect change in blood pressure
> status over time?
>
> How to get the p values with gee model?
>
> Thanks
> On Thu, Jan 3, 2013 at 5:06 AM, arun kirshna [via R] <
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4654795&i=2>>
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4654795&i=3>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4654795&i=4>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654795.html
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654826.html

arun

unread,
Jan 7, 2013, 2:14:44 PM1/7/13
to rex2013, R help
HI,

Regarding question:2) Have you checked summary(m1)?
data(seizure)
     ## Diggle, Liang, and Zeger (1994) pp166-168, compare Table 8.10
     seiz.l <- reshape(seizure,
                       varying=list(c("base","y1", "y2", "y3", "y4")),
                       v.names="y", times=0:4, direction="long")
     seiz.l <- seiz.l[order(seiz.l$id, seiz.l$time),]
     seiz.l$t <- ifelse(seiz.l$time == 0, 8, 2)
     seiz.l$x <- ifelse(seiz.l$time == 0, 0, 1)
     m1 <- geese(y ~ offset(log(t)) + x + trt + x:trt, id = id,
                 data=seiz.l, corstr="exch", family=poisson)
     summary(m1)
#Call:
#geese(formula = y ~ offset(log(t)) + x + trt + x:trt, id = id,
 #   data = seiz.l, family = poisson, corstr = "exch")
#
#Mean Model:
# Mean Link:                 log
# Variance to Mean Relation: poisson

 #Coefficients:
  #             estimate    san.se       wald         p                  
#(Intercept)  1.34760922 0.1573571 73.3423807 0.0000000
#x            0.11183602 0.1159304  0.9306116 0.3347040
#trt          0.02753449 0.2217878  0.0154127 0.9011982
#x:trt       -0.10472579 0.2134448  0.2407334 0.6236769
--------------------------------------------------------------------
#p is the colname with the p values for the Coefficients
summary(m1)$mean["p"]
#                    p
#(Intercept) 0.0000000
#x           0.3347040
#trt         0.9011982
#x:trt       0.6236769


You didn't say specifically whether you got NA's in example data or your actual data.  I am getting the p-values in R 2.15.


1) I think it should work with binary response variable.
Another example in the same documentation:
 data(ohio)
str(ohio)
#'data.frame':    2148 obs. of  4 variables:
# $ resp : int  0 0 0 0 0 0 0 0 0 0 ...
# $ id   : int  0 0 0 0 1 1 1 1 2 2 ...
# $ age  : int  -2 -1 0 1 -2 -1 0 1 -2 -1 ...
# $ smoke: int  0 0 0 0 0 0 0 0 0 0 ...
It is not even factors


     fit <- geese(resp ~ age + smoke + age:smoke, id=id, data=ohio,
                  family=binomial, corstr="exch", scale.fix=TRUE)
 summary(fit)$mean["p"]
 #                    p
#(Intercept) 0.00000000
#age         0.01523698
#smoke       0.09478252
#age:smoke   0.42234200


# also tested after converting to factors

ohio1<-ohio
ohio1$smoke<-factor(ohio1$smoke)
 ohio1$age<-factor(ohio1$age)


str(ohio1)
#'data.frame':    2148 obs. of  4 variables:
# $ resp : int  0 0 0 0 0 0 0 0 0 0 ...
# $ id   : int  0 0 0 0 1 1 1 1 2 2 ...
# $ age  : Factor w/ 4 levels "-2","-1","0",..: 1 2 3 4 1 2 3 4 1 2 ...
# $ smoke: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
fit1<-geese(resp~age+smoke+age:smoke,id=id,data=ohio1,family=binomial,corstr="exch",scale.fix=TRUE)
 summary(fit1)$mean["p"]
#                      p
#(Intercept)  0.00000000
#age-1        0.60555454
#age0         0.45322698
#age1         0.01187725
#smoke1       0.86262269
#age-1:smoke1 0.17239050
#age0:smoke1  0.32223942
#age1:smoke1  0.36686706


A.K.

----- Original Message -----
From: rex2013 <usha....@gmail.com>
To: r-h...@r-project.org
Cc:

arun

unread,
Jan 7, 2013, 4:28:36 PM1/7/13
to rex2013, R help
HI,

Just to add:
fit3<-geese(hibp~MaternalAge*time,id=CODEA,data=BP.stack5,family=binomial,corstr="exch",scale.fix=TRUE) #works
 summary(fit3)$mean["p"]
#                             p
#(Intercept)         0.00000000
#MaternalAge4        0.49099242
#MaternalAge5        0.04686295
#time21              0.86164351
#MaternalAge4:time21 0.59258221
#MaternalAge5:time21 0.79909832

fit4<-geese(hibp~MaternalAge*time,id=CODEA,data=BP.stack5,family=binomial,corstr="unstructured",scale.fix=TRUE)  #when the correlation structure is changed to "unstructured"
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
 # contrasts can be applied only to factors with 2 or more levels
#In addition: Warning message:
#In is.na(rows) : is.na() applied to non-(list or vector) of type 'NULL'


Though, it works with data(Ohio)

fit1<-geese(resp~age+smoke+age:smoke,id=id,data=ohio1,family=binomial,corstr="unstructured",scale.fix=TRUE)
 summary(fit1)$mean["p"]
#                      p
#(Intercept)  0.00000000
#age-1        0.60555454
#age0         0.45322698
#age1         0.01187725
#smoke1       0.86262269
#age-1:smoke1 0.17239050
#age0:smoke1  0.32223942
#age1:smoke1  0.36686706



By checking:
 with(BP.stack5,table(MaternalAge,time))
#           time
#MaternalAge   14   21
  #        3 1104  864
   #       4  875  667
    #     5   67   53 #less number of observations


 BP.stack6 <- BP.stack5[order(BP.stack5$CODEA, BP.stack5$time),]
 head(BP.stack6)  # very few IDs with  MaternalAge==5
#       X CODEA Sex MaternalAge Education Birthplace AggScore IntScore
#1493 3.1     3   2           3         3          1        0        0
#3202 3.2     3   2           3         3          1        0        0
#1306 7.1     7   2           4         6          1        0        0
#3064 7.2     7   2           4         6          1        0        0
#1    8.1     8   2           4         4          1        0        0
#2047 8.2     8   2           4         4          1        0        0
 #         Categ time Obese Overweight hibp
#1493 Overweight   14     0          0    0
#3202 Overweight   21     0          1    0
#1306      Obese   14     0          0    0
#3064      Obese   21     1          1    0
#1        Normal   14     0          0    0
#2047     Normal   21     0          0    0
BP.stack7<-BP.stack6[BP.stack6$MaternalAge!=5,]
 BP.stack7$MaternalAge<-factor(as.numeric(as.character(BP.stack7$MaternalAge)
 fit5<-geese(hibp~MaternalAge*time,id=CODEA,data=BP.stack7,family=binomial,corstr="unstructured",scale.fix=TRUE)
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
 # contrasts can be applied only to factors with 2 or more levels

 with(BP.stack7,table(MaternalAge,time))  #It looks like the combinations are still there
#           time
#MaternalAge   14   21
 #         3 1104  864
   #       4  875  667

It works also with corstr="ar1".   Why do you gave the option "unstructured"?
A.K.






----- Original Message -----
From: rex2013 <usha....@gmail.com>
To: r-h...@r-project.org
Cc:

arun

unread,
Jan 7, 2013, 6:40:01 PM1/7/13
to Usha Gurunathan, R help
HI,
BP.stack5 is the one without missing values.
na.omit(....).  Otherwise, I have to use the option na.action=.. in the ?geese() statement

You need to read about the correlation structures.  IN unstructured option, more number of parameters needs to be estimated,  In repeated measures design, when the underlying structure is not known, it would be better to compare using different options (exchangeable is similar to compound symmetry) and select the one which provide the least value for AIC or BIC.  Have a look at

http://stats.stackexchange.com/questions/21771/how-to-perform-model-selection-in-gee-in-r
It's not clear to me  "reference to write about missing values".   
A.K.




----- Original Message -----
From: Usha Gurunathan <usha....@gmail.com>
To: arun <smartp...@yahoo.com>
Cc:
Sent: Monday, January 7, 2013 6:12 PM
Subject: Re: [R] random effects model

Hi AK

2)I shall try putting exch. and check when I get home. Btw, what is
BP.stack5? is it with missing values or only complete cases?

I guess I am still not clear about the unstructured and exchangeable
options, as in which one is better.

1)Rgding the summary(p): NA thing, I tried putting one of my gee equation.

Can you suggest me a reference to write about" missing values and the
implications for my results"

Thanks.

rex2013

unread,
Jan 8, 2013, 5:29:07 PM1/8/13
to r-h...@r-project.org
Hi

Thanks a lot, the corstr "exchangeable"does work. Didn't strike to me
for so long. Does the AIC value come out with the gee output?

By reference, I meant reference to a easy-read paper or web address
that can give me knowledge about implications of missing data.

Ta.

On 1/8/13, arun kirshna [via R]
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654902.html
>
> To unsubscribe from random effects model, visit
> http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4654346&code=dXNoYS5uYXRoYW5AZ21haWwuY29tfDQ2NTQzNDZ8MjAyMjE1NDI0




--
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654986.html

arun

unread,
Jan 8, 2013, 9:18:47 PM1/8/13
to rex2013, R help
HI,

In your dataset, the "exchangeable" or "compound symmetry" may work as there are only two levels for time.  In experimental data analysis involving a factor time with more than 2 levels, randomization of combination of levels of factors applied to the subject/plot etc. gets affected as time is unidirectional.  I guess your data is observational, and with two time levels, it may not hurt to use "CS" as option, though, it would help if you check different options. 

In the link I sent previously, QIC was used.  http://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other
I am not sure whether AIC/BIC is better than QIC or viceversa.

You could sent email to the maintainer of geepack (Jun Yan <jun...@uconn.edu>).
Regarding the reference links,
You can check this link "www.jstatsoft.org/v15/i02/paper" .  Other references are in the paper.
"
4.3. Missing values (waves)
In case of missing values, the GEE estimates are consistent if the values are missing com-
pletely at random (Rubin 1976). The geeglm function assumes by default that observations
are equally separated in time. Therefore, one has to inform the function about different sep-
arations if there are missing values and other correlation structures than the independence or
exchangeable structures are used. The waves arguments takes an integer vector that indicates
that two observations of the same cluster with the values of the vector of k respectively l have
a correlation of rkl ."

rex2013

unread,
Jan 11, 2013, 2:16:41 AM1/11/13
to r-h...@r-project.org
Hi AK

Regarding the missing values, I would like to find out the patterns of
missing values in my data set. I know the overall values for each variable.

using

colSums(is.na(df))

but what I wanted is to find out the percentages
with each level of the variable with my dataset, as in if there is more
missing data in females or males etc?.

I installed "mi" package, but unable to produce a plot with it( i would
also like to produce a plot). I searched the responses in the relevant
sections in r but could n't find an answer.

Thanks,





On Wed, Jan 9, 2013 at 12:31 PM, arun kirshna [via R] <
ml-node+s789...@n4.nabble.com> wrote:

> HI,
>
> In your dataset, the "exchangeable" or "compound symmetry" may work as
> there are only two levels for time. In experimental data analysis
> involving a factor time with more than 2 levels, randomization of
> combination of levels of factors applied to the subject/plot etc. gets
> affected as time is unidirectional. I guess your data is observational,
> and with two time levels, it may not hurt to use "CS" as option, though, it
> would help if you check different options.
>
> In the link I sent previously, QIC was used.
> http://stats.stackexchange.com/questions/577/is-there-any-reason-to-prefer-the-aic-or-bic-over-the-other
>
> I am not sure whether AIC/BIC is better than QIC or viceversa.
>
> You could sent email to the maintainer of geepack (Jun Yan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4654996&i=0>>).
>
> Regarding the reference links,
> You can check this link "www.jstatsoft.org/v15/i02/paper" . Other
> references are in the paper.
> "
> 4.3. Missing values (waves)
> In case of missing values, the GEE estimates are consistent if the values
> are missing com-
> pletely at random (Rubin 1976). The geeglm function assumes by default
> that observations
> are equally separated in time. Therefore, one has to inform the function
> about different sep-
> arations if there are missing values and other correlation structures than
> the independence or
> exchangeable structures are used. The waves arguments takes an integer
> vector that indicates
> that two observations of the same cluster with the values of the vector of
> k respectively l have
> a correlation of rkl ."
>
> Hope it helps.
> A.K.
>
>
>
>
> ----- Original Message -----
> From: rex2013 <[hidden email]<http://user/SendEmail.jtp?type=node&node=4654996&i=1>>
>
> To: [hidden email] <http://user/SendEmail.jtp?type=node&node=4654996&i=2>
> Cc:
> Sent: Tuesday, January 8, 2013 5:29 PM
> Subject: Re: [R] random effects model
>
> Hi
>
> Thanks a lot, the corstr "exchangeable"does work. Didn't strike to me
> for so long. Does the AIC value come out with the gee output?
>
> By reference, I meant reference to a easy-read paper or web address
> that can give me knowledge about implications of missing data.
>
> Ta.
>
> On 1/8/13, arun kirshna [via R]
> <[hidden email] <http://user/SendEmail.jtp?type=node&node=4654996&i=3>>
> wrote:
>
> >
> >
> > HI,
> > BP.stack5 is the one without missing values.
> > na.omit(....). Otherwise, I have to use the option na.action=.. in the
> > ?geese() statement
> >
> > You need to read about the correlation structures. IN unstructured
> option,
> > more number of parameters needs to be estimated, In repeated measures
> > design, when the underlying structure is not known, it would be better
> to
> > compare using different options (exchangeable is similar to compound
> > symmetry) and select the one which provide the least value for AIC or
> BIC.
> > Have a look at
> >
> >
> http://stats.stackexchange.com/questions/21771/how-to-perform-model-selection-in-gee-in-r
> > It's not clear to me "reference to write about missing values".
> > A.K.
> >
> >
> >
> >
> > ----- Original Message -----
> > From: Usha Gurunathan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4654996&i=4>>
>
> > To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4654996&i=5>>
>
> > Cc:
> > Sent: Monday, January 7, 2013 6:12 PM
> > Subject: Re: [R] random effects model
> >
> > Hi AK
> >
> > 2)I shall try putting exch. and check when I get home. Btw, what is
> > BP.stack5? is it with missing values or only complete cases?
> >
> > I guess I am still not clear about the unstructured and exchangeable
> > options, as in which one is better.
> >
> > 1)Rgding the summary(p): NA thing, I tried putting one of my gee
> equation.
> >
> > Can you suggest me a reference to write about" missing values and the
> > implications for my results"
> >
> > Thanks.
> >
> >
> >
> > On 1/8/13, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4654996&i=6>>
> >> From: rex2013 <[hidden email]<http://user/SendEmail.jtp?type=node&node=4654996&i=7>>
>
> >> To: [hidden email]<http://user/SendEmail.jtp?type=node&node=4654996&i=8>
> >> Cc:
> >> Sent: Monday, January 7, 2013 6:15 AM
> >> Subject: Re: [R] random effects model
> >>
> >> Hi A.K
> >>
> >> Below is the comment I get, not sure why.
> >>
> >> BP.sub3 is the stacked data without the missing values.
> >>
> >> BP.geese3 <- geese(HiBP~time*MaternalAge,data=BP.sub3,id=CODEA,
> >> family=binomial, corstr="unstructured", na.action=na.omit)Error in
> >> `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> >> contrasts can be applied only to factors with 2 or more levels
> >>
> >> Even though age has 3 levels; time has 14 years & 21 years; HIBP is a
> >> binary response outcome.
> >>
> >> 2) When you mentioned summary(m1)$mean["p"] what did the p mean? i
> >> used this in one of the gee command, it produced NA as answer?
> >>
> >> Many thanks
> >>
> >>
> >>
> >> On Mon, Jan 7, 2013 at 5:26 AM, arun kirshna [via R] <
> >> [hidden email] <http://user/SendEmail.jtp?type=node&node=4654996&i=9>>
> >> [hidden email] <http://user/SendEmail.jtp?type=node&node=4654996&i=10>mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
> >
> > ______________________________________________
> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4654996&i=11>mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
> > _______________________________________________
> > If you reply to this email, your message will be added to the discussion
> > below:
> >
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654902.html
> >
> > To unsubscribe from random effects model, visit
> >
>
> View this message in context:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654986.html
>
> Sent from the R help mailing list archive at Nabble.com.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4654996&i=12>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4654996&i=13>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4654996.html
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655206.html

arun

unread,
Jan 11, 2013, 9:29:27 AM1/11/13
to rex2013, R help
Hi,

To get the missing values, you could use:
set.seed(15)
dat1<-data.frame(Gender=rep(c("M","F"),each=10), V1=sample(c(1:20,NA),20,replace=TRUE), V2=sample(c(20:30,NA),20,replace=TRUE))
library(reshape2)
 dcast(melt(dat1,id="Gender"),Gender~variable,function(x) sum(is.na(x)))
#  Gender V1 V2
#1      F  1  0
#2      M  2  1

#or
do.call(rbind,lapply(split(dat1,dat1$Gender),function(x) colSums(is.na(x[-1]))))
 # V1 V2
#F  1  0
#M  2  1

Regarding the "mi" package, I never used it before.  BTW, you didn't mention what you are going to plot.
Hope it helps.
A.K.



----- Original Message -----
From: rex2013 <usha....@gmail.com>
To: r-h...@r-project.org
Cc:

rex2013

unread,
Jan 11, 2013, 8:09:19 PM1/11/13
to r-h...@r-project.org
Can you please explain the syntax?

What I am trying to achieve is something of the sort( below is just an
example):

eg: Males hv 32% missing data as compared to 48% missing data amongst
females. 60% missing data from primary educated compared to 5% in those
completed uni.etc.

Thanks a lot

On Sat, Jan 12, 2013 at 5:09 AM, arun kirshna [via R] <
ml-node+s7896...@n4.nabble.com> wrote:

> Hi,
>
> To get the missing values, you could use:
> set.seed(15)
> dat1<-data.frame(Gender=rep(c("M","F"),each=10),
> V1=sample(c(1:20,NA),20,replace=TRUE),
> V2=sample(c(20:30,NA),20,replace=TRUE))
> library(reshape2)
> dcast(melt(dat1,id="Gender"),Gender~variable,function(x) sum(is.na(x)))
> # Gender V1 V2
> #1 F 1 0
> #2 M 2 1
>
> #or
> do.call(rbind,lapply(split(dat1,dat1$Gender),function(x) colSums(is.na(x[-1]))))
>
> # V1 V2
> #F 1 0
> #M 2 1
>
> Regarding the "mi" package, I never used it before. BTW, you didn't
> mention what you are going to plot.
> Hope it helps.
> A.K.
>
>
>
> ----- Original Message -----
> From: rex2013 <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655274&i=0>>
>
> To: [hidden email] <http://user/SendEmail.jtp?type=node&node=4655274&i=1>
> Cc:
> Sent: Friday, January 11, 2013 2:16 AM
> Subject: Re: [R] random effects model
>
> Hi AK
>
> Regarding the missing values, I would like to find out the patterns of
> missing values in my data set. I know the overall values for each
> variable.
>
> using
>
> colSums(is.na(df))
>
> but what I wanted is to find out the percentages
> with each level of the variable with my dataset, as in if there is more
> missing data in females or males etc?.
>
> I installed "mi" package, but unable to produce a plot with it( i would
> also like to produce a plot). I searched the responses in the relevant
> sections in r but could n't find an answer.
>
> Thanks,
>
>
>
>
>
> On Wed, Jan 9, 2013 at 12:31 PM, arun kirshna [via R] <
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4655274&i=2>>
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4655274&i=3>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4655274&i=4>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655274.html
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655313.html

Usha Gurunathan

unread,
Jan 12, 2013, 1:42:27 AM1/12/13
to arun, R help
Hi

I do want percentages of the categories inthe whole data set. But, I am a
bit unclear about this syntax. Can you explain please. This is the error
message I got with your script?

Error: could not find function "Copy.of.BP_2". Not sure what, because
the data frame was already loaded.


Also

I was trying out package: vmv( after installing)

as data(df,package, package="vmv")

In data(Copy.of.BP_2c, package = "vmv") :
data set ‘Copy.of.BP_2c’ not found

tablemissing(df, sort by="variable")

Error in tablemissing(Copy.of.BP_2, sortby = "Sex") :
object 'tabfinal' not found

## Same problem with "vim" package.

## What mistake could I have done?

Thanks.




On Sat, Jan 12, 2013 at 3:11 PM, arun <smartp...@yahoo.com> wrote:

> HI,
>
> If you want to find out the percentage of missing values in the whole
> dataset in females and males:
> set.seed(51)
>
> dat1<-data.frame(Gender=rep(c("M","F"),each=10),V1=sample(c(1:3,NA),20,replace=TRUE),V2=sample(c(21:24,NA),20,replace=TRUE))
> unlist(lapply(lapply(split(dat1,dat1$Gender),function(x)
> (nrow(x[!complete.cases(x[,-1]),])/nrow(x))*100),function(x)
> paste(x,"%",sep="")))
> # F M
> #"20%" "70%"
>
> #If it is to find the percentage of missing values for each variable in
> females and males:
> res<-do.call(rbind,lapply(split(dat1,dat1$Gender),function(x)
> paste((colSums(is.na(x[,-1]))/nrow(x))*100,"%",sep="")))
> colnames(res)<-colnames(dat1)[-1]
> res
> # V1 V2
> #F "0%" "20%"
> #M "50%" "20%"
> A.K.
>
>
>
>
>
> ----- Original Message -----
> From: rex2013 <usha....@gmail.com>
> To: r-h...@r-project.org
> Cc:
[[alternative HTML version deleted]]

arun

unread,
Jan 12, 2013, 12:11:41 AM1/12/13
to rex2013, R help
HI,

If you want to find out the percentage of missing values in the whole dataset in females and males:
 set.seed(51)
 dat1<-data.frame(Gender=rep(c("M","F"),each=10),V1=sample(c(1:3,NA),20,replace=TRUE),V2=sample(c(21:24,NA),20,replace=TRUE))
 unlist(lapply(lapply(split(dat1,dat1$Gender),function(x) (nrow(x[!complete.cases(x[,-1]),])/nrow(x))*100),function(x) paste(x,"%",sep="")))
#    F     M
#"20%" "70%"

#If it is to find the percentage of missing values for each variable in females and males:
 res<-do.call(rbind,lapply(split(dat1,dat1$Gender),function(x) paste((colSums(is.na(x[,-1]))/nrow(x))*100,"%",sep="")))
 colnames(res)<-colnames(dat1)[-1]
 res
#  V1    V2  
#F "0%"  "20%"
#M "50%" "20%"
A.K.





----- Original Message -----
From: rex2013 <usha....@gmail.com>
To: r-h...@r-project.org
Cc:

arun

unread,
Jan 12, 2013, 12:28:37 PM1/12/13
to Usha Gurunathan, R help
HI,

Regarding the script I sent to you earlier, it should be modified a bit according to your dataset.  For example, the variable "Sex" was in column 2.  Also, your data had some missing values for the "Sex" column, which I removed before running the code. 
With respect to the Errors that you showed here, you were using different package and I don't have a clue what you were trying to do.

I applied the same code to your dataset.  The results are as follows:

BP_2b<-read.csv("BP_2b.csv",sep="\t")

nrow(BP_2b)
#[1] 6898
BP_2bSexNoMV<-BP_2b[!is.na(BP_2b$Sex),]
 nrow(BP_2bSexNoMV)
#[1] 6260
 unique(BP_2bSexNoMV$Sex)
#[1] 2 1
library(car)
BP_2bSexNoMV$Sex<-recode(BP_2bSexNoMV$Sex,"1='Male';2='Female'") #assuming that 1=Male and 2=Female
 unique(BP_2bSexNoMV$Sex)
#[1] "Female" "Male" 

#whole dataset
 unlist(lapply(lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) (nrow(x[!complete.cases(x[,-2]),])/nrow(x))*100),function(x) paste(x,"%",sep="")))
 #            Female                Male
#"72.6552179656539%" "74.4740099009901%"
#splitting the above into components:

 lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) head(x,2))
#$Female


#  CODEA    Sex MaternalAge Education Birthplace AggScore IntScore Obese14

#2     3 Female           3         3          1        0        0       0
#3     4 Female           3         6          1       NA       NA      NA


#  Overweight14 Overweight21 Obese21 hibp14 hibp21

#2            0            1       0      0      0

#3           NA           NA      NA     NA     NA

#$Male


#  CODEA  Sex MaternalAge Education Birthplace AggScore IntScore Obese14

#6     9 Male           3         6          2        0        0       0
#8    11 Male           3         4          1        0        0      NA


 # Overweight14 Overweight21 Obese21 hibp14 hibp21

#6            0            0       0      0      0
#8           NA            0       0     NA      0

 lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x)nrow(x))
#$Female
#[1] 3028
#
#$Male
#[1] 3232
lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) nrow(x[!complete.cases(x[,-2]),])) #rows which have at least one NA
#$Female
#[1] 2200
#
#$Male
#[1] 2407

 lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) head(x[!complete.cases(x[,-2]),],2))
#$Female


#   CODEA    Sex MaternalAge Education Birthplace AggScore IntScore Obese14

#3      4 Female           3         6          1       NA       NA      NA
#10    13 Female           3         4          2       NA       NA      NA


 #  Overweight14 Overweight21 Obese21 hibp14 hibp21

#3            NA           NA      NA     NA     NA
#10           NA           NA      NA     NA     NA

#$Male


#  CODEA  Sex MaternalAge Education Birthplace AggScore IntScore Obese14

#8    11 Male           3         4          1        0        0      NA
#9    12 Male           3         4          2       NA       NA      NA


#  Overweight14 Overweight21 Obese21 hibp14 hibp21

#8           NA            0       0     NA      0
#9           NA           NA      NA     NA     NA

lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) (nrow(x[!complete.cases(x[,-2]),])/nrow(x))*100) #gives the percentage of rows of missing #values from the overall rows for Males and Females
#$Female
#[1] 72.65522
#
#$Male
#[1] 74.47401

#iF you want the percentage from the total number rows in Males and Females (without NA's in the the Sex column)
 lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) (nrow(x[!complete.cases(x[,-2]),])/nrow(BP_2bSexNoMV))*100)
#$Female
#[1] 35.14377
#
#$Male
#[1] 38.45048
unlist(lapply(lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) (nrow(x[!complete.cases(x[,-2]),])/nrow(BP_2bSexNoMV))*100),function(x) paste(x,"%",sep="")))#paste the "%" to the numbers
#             Female                Male
#"35.1437699680511%" "38.4504792332268%"


#If it is to find the percentage of missing values for each variable in females and males:

res<-do.call(rbind,lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x) paste((colSums(is.na(x[,-2]))/nrow(BP_2bSexNoMV))*100,"%",sep=""))) #If #you want percentage of missing values per category, replace by nrow(x)
 colnames(res)<-colnames(BP_2bSexNoMV)[-2]
res
#       CODEA MaternalAge Education           Birthplace        
#Female "0%"  "0%"        "1.03833865814696%" "1.18210862619808%"
#Male   "0%"  "0%"        "1.10223642172524%" "1.3258785942492%"
 #      AggScore            IntScore            Obese14           
#Female "15.2076677316294%" "15.2076677316294%" "24.0894568690096%"
#Male   "16.5015974440895%" "16.5015974440895%" "25.814696485623%"
#       Overweight14        Overweight21        Obese21           
#Female "24.0894568690096%" "23.3865814696486%" "23.3865814696486%"
#Male   "25.814696485623%"  "29.1533546325879%" "29.1533546325879%"
#       hibp14              hibp21            
#Female "24.1693290734824%" "31.3418530351438%"
#Male   "25.8466453674121%" "35.223642172524%"

A.K.

________________________________


From: Usha Gurunathan <usha....@gmail.com>
To: arun <smartp...@yahoo.com>

Cc: R help <r-h...@r-project.org>
Sent: Saturday, January 12, 2013 1:42 AM


Subject: Re: [R] random effects model


 Hi

 I do want  percentages of the categories inthe whole data set. But, I am a bit unclear about this syntax. Can you explain please. This is the error message I got with your script?

Error: could not find function "Copy.of.BP_2". Not sure what, because the data frame was already loaded.
Also

I was trying out package: vmv( after installing)

as data(df,package, package="vmv")

In data(Copy.of.BP_2c, package = "vmv") : data set ‘Copy.of.BP_2c’ not foundtablemissing(df, sort by="variable")

arun

unread,
Jan 12, 2013, 11:36:15 AM1/12/13
to Usha Gurunathan, R help
Hello,

Are you sure that you got the error message with#


unlist(lapply(lapply(split(dat1,dat1$Gender),function(x)
(nrow(x[!complete.cases(x[,-1]),])/nrow(x))*100),function(x)
paste(x,"%",sep="")))

#or


do.call(rbind,lapply(split(dat1,dat1$Gender),function(x) paste((colSums(is.na(x[,-1]))/nrow(x))*100,"%",sep="")))

From ur reply, it seemed like you were trying different codes:


as data(df,package, package="vmv")


A.K.


________________________________


From: Usha Gurunathan <usha....@gmail.com>
To: arun <smartp...@yahoo.com>

Cc: R help <r-h...@r-project.org>
Sent: Saturday, January 12, 2013 1:42 AM


Subject: Re: [R] random effects model


 Hi

 I do want  percentages of the categories inthe whole data set. But, I am a bit unclear about this syntax. Can you explain please. This is the error message I got with your script?

Error: could not find function "Copy.of.BP_2". Not sure what, because the data frame was already loaded.
Also

I was trying out package: vmv( after installing)

as data(df,package, package="vmv")

In data(Copy.of.BP_2c, package = "vmv") : data set ‘Copy.of.BP_2c’ not foundtablemissing(df, sort by="variable")

arun

unread,
Jan 12, 2013, 2:47:51 PM1/12/13
to Usha Gurunathan, R help

Hi,
I tried ur dataset with vmv.
library(vmv)
tablemissing(BP_2bSexNoMV[BP_2bSexNoMV$Sex=="Male",],sortby="variable")
tablemissing(BP_2bSexNoMV[BP_2bSexNoMV$Sex=="Female",],sortby="variable")

I am attaching the plots i got with it.  I didn't look at the details to change the font labels in the plot.
Hope this helps.

A.K.

________________________________


From: Usha Gurunathan <usha....@gmail.com>
To: arun <smartp...@yahoo.com>

Cc: R help <r-h...@r-project.org>
Sent: Saturday, January 12, 2013 1:42 AM


Subject: Re: [R] random effects model


 Hi

 I do want  percentages of the categories inthe whole data set. But, I am a bit unclear about this syntax. Can you explain please. This is the error message I got with your script?

Error: could not find function "Copy.of.BP_2". Not sure what, because the data frame was already loaded.
Also

I was trying out package: vmv( after installing)

as data(df,package, package="vmv")

In data(Copy.of.BP_2c, package = "vmv") : data set ‘Copy.of.BP_2c’ not foundtablemissing(df, sort by="variable")

Rex2013MVplot.pdf

Usha Gurunathan

unread,
Jan 12, 2013, 5:59:40 PM1/12/13
to arun, R help
Hi AK
That works. I was trying to get similar results from any other package.
Being a beginner, I was not sure how to modify the syntax to get my output.

lapply(split(BP_2bSexNoMV,BP_
2bSexNoMV$Sex),function(x) (nrow(x[!complete.cases(x[,-2]),])/nrow(x))*100)
#gives the percentage of rows of missing #values from the overall rows for
Males and Females
#$Female
#[1] 72.65522
#
#$Male
#[1] 74.47401

#iF you want the percentage from the total number rows in Males and Females
(without NA's in the the Sex column)
lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x)
(nrow(x[!complete.cases(x[,-2]),])/nrow(BP_2bSexNoMV))*100)
#$Female
#[1] 35.14377
#
#$Male
#[1] 38.45048

How do I interpret the above 2 difft results? 72.66% of values were missing
among female participants?? Can you pl. clarify.

Many thanks.

On Sun, Jan 13, 2013 at 3:28 AM, arun <smartp...@yahoo.com> wrote:

> lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x)
> (nrow(x[!complete.cases(x[,-2]),])/nrow(x))*100) #gives the percentage of
> rows of missing #values from the overall rows for Males and Females
> #$Female
> #[1] 72.65522
> #
> #$Male
> #[1] 74.47401
>
> #iF you want the percentage from the total number rows in Males and
> Females (without NA's in the the Sex column)
> lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x)
> (nrow(x[!complete.cases(x[,-2]),])/nrow(BP_2bSexNoMV))*100)
> #$Female
> #[1] 35.14377
> #
> #$Male
> #[1] 38.45048
>

arun

unread,
Jan 12, 2013, 6:17:57 PM1/12/13
to Usha Gurunathan, R help
Hi,
Yes, you are right.  72.655222% was those missing among females.  35.14377% of values in females are missing from among the whole dataset (combined total of Males+Females data after removing the NAs from the variable "Sex"). 
A.K.



________________________________
From: Usha Gurunathan <usha....@gmail.com>
To: arun <smartp...@yahoo.com>
Cc: R help <r-h...@r-project.org>
Sent: Saturday, January 12, 2013 5:59 PM
Subject: Re: [R] random effects model


arun

unread,
Jan 13, 2013, 11:54:59 AM1/13/13
to Usha Gurunathan, R help
Hi,

For your first question,
M1<-as.table(rbind(c(825,2407),c(828,2200)))
 dimnames(M1)<- list(gender=c("Male","Female"), MV=c("Study","NonStudy/missing"))
M1
#        MV
#gender   Study NonStudy/missing
 # Male     825             2407
 # Female   828             2200


Xsq<-chisq.test(M1)
Xsq

#    Pearson's Chi-squared test with Yates' continuity correction

#data:  M1
#X-squared = 2.5684, df = 1, p-value = 0.109

I will take a look at your second question later.
A.K.






________________________________
From: Usha Gurunathan <usha....@gmail.com>
To: arun <smartp...@yahoo.com>
Sent: Sunday, January 13, 2013 1:51 AM
Subject: Re: [R] random effects model


HI AK

Thanks a lot  for explaining that.

1. With the chi sq. ( in order to find out if the diffce is significant between groups) do I have create a separate excel file and make a dataframe.How do I go about it?





On Sun, Jan 13, 2013 at 1:22 PM, arun <smartp...@yahoo.com> wrote:

HI,
>    
>table(BP_2b$Sex) #original dataset
>#   1    2
>#3232 3028
> nrow(BP_2b)
>#[1] 6898
> nrow(BP_2bSexNoMV)
>#[1] 6260
> 6898-6260
>#[1] 638 #these rows were removed from the BP_2b to create BP_2bSexNoMV
>BP_2bSexMale<-BP_2bSexNoMV[BP_2bSexNoMV$Sex=="Male",]
> nrow(BP_2bSexMale)
>#[1] 3232
> nrow(BP_2bSexMale[!complete.cases(BP_2bSexMale),]) #Missing rows with Male
>#[1] 2407
> nrow(BP_2bSexMale[complete.cases(BP_2bSexMale),]) #Non missing rows with Male
>#[1] 825
>
>
>You did the chisquare test on the new dataset with 6260 rows, right.
>I removed those 638 rows because these doesn't belong to either male or female, but you want the % of missing value per male or female.  So, I thought this will bias the results.  If you want to include the missing values, you could do it, but I don't know where you would put that missing values as it cannot be classified as belonging specifically to males or females.  I hope you understand it.
>
>Sometimes, the maintainer's respond a bit slow.  You have to sent an email reminding him again.
>
>Regarding the vmv package, you could email Waqas Ahmed Malik (ma...@math.uni-augsburg.de) regarding options for changing the title and the the font etc.
>You could also use this link (http://www.r-bloggers.com/visualizing-missing-data-2/ ) to plot missing value (?plot.missing()).  I never used that package, but you could try.  Looks like it gives more information.
>
>A.K.
>
>
>
>
>
>
>
>
>________________________________
>From: Usha Gurunathan <usha....@gmail.com>
>To: arun <smartp...@yahoo.com>
>Sent: Saturday, January 12, 2013 9:05 PM
>
>Subject: Re: [R] random effects model
>
>
>Hi A.K
>
>So it is number of females missing/total female participants enrolled: 72.65%
>Number of females missing/total (of males+ females)  participants enrolled : 35.14%
>
>The total no. with the master data: Males: 3232, females: 3028 ( I got this before removing any missing values)
>
>with table(Copy.of.BP_2$ Sex)  ## BP
>
>
>If I were to write a table (  and do a chi sq. later), 
>
>as Gender            Study                    Non study/missing     Total
>      Male              825 (25.53%)             2407 (74.47%)       3232 (100%)
>    Female           828 (27.35%)             2200 (72.65%)       3028 ( 100%)
>     Total              1653                          4607                      6260    
>
>
>The problem is when I did 
>>colSums(is.na(Copy.of.BP_2), the sex category showed N=638.
>
>I cannot understand the discrepancy.Also, when you have mentioned to remove NA, is that not a missing value that needs to be included in the total number missing. I am a bit confused. Can you help?
>
>## I tried sending email to gee pack maintainer at the ID with R site, mail didn't go through??
>
>Many thanks

arun

unread,
Jan 13, 2013, 1:33:43 PM1/13/13
to Usha Gurunathan, R help



HI,

I think I mentioned to you before that when you reshape the
columns excluding the response variable, response variable gets repeated
(in this case hibp14 or hibp21) and creates the error"


I run your code, there are obvious problems in the code so I didn't reach up to BP.gee

BP_2b<-read.csv("BP_2b.csv",sep="\t")
BP.stack3 <- reshape(BP_2b,idvar="CODEA",timevar="time",sep="_",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21")),v.names=c("Obese","Overweight"),times=factor(c(1,2)),direction="long")

BP.stack3 <- transform(BP.stack3,CODEA=factor(CODEA),Sex=factor(Sex,labels=c("Male","Female")),MaternalAge=factor(MaternalAge,labels=c("39years or less","40-49 years","50 years or older")),Education=factor(Education,labels=c("Primary/special","Started secondary","Completed grade10", "Completed grade12", "College","University")),Birthplace=factor(Birthplace,labels=c("Australia","Other English-speaking","Other")))
 BP.stack3$Sex <- factor(BP.stack3$Sex,levels=levels(BP.stack3$Sex)[c(2,1)])
 BP.sub3a <-  subset(BP.stack3,subset=!(is.na(Sex)| is.na(Education)|is.na(Birthplace)|is.na(Education)|is.na(hibp14)| is.na(hibp21)))  
 nrow(BP.sub3a)
#[1] 3364
 BP.sub5a <- BP.sub3a[order(BP.sub3a$CODEA),] # your code was BP.sub5a <- BP.sub3a[order(BP.sub5a$CODEA),] 
                                                                                                                                                                         ^^^^^ was not defined before
#Next line
BPsub3$Categ[BPsub6$Overweight==1&BPsub3$time==1&BPsub3$Obese==0]<- "Overweight14"  #It should be BP.sub3 and what is BPsub6, it was not defined previously.
#Error in BPsub3$Categ[BPsub6$Overweight == 1 & BPsub3$time == 1 & BPsub3$Obese ==  :
  #object 'BPsub3' not found






A.K.


________________________________
From: Usha Gurunathan <usha....@gmail.com>
To: arun <smartp...@yahoo.com>
Sent: Sunday, January 13, 2013 1:51 AM
Subject: Re: [R] random effects model


HI AK

Thanks a lot  for explaining that.

1. With the chi sq. ( in order to find out if the diffce is significant between groups) do I have create a separate excel file and make a dataframe.How do I go about it?

I have resent a mail to Jun Yan at a difft email ad( first add.didn't work, mail not delivered).

2. With my previous query ( reg. Obese/Overweight/ Normal at age 14 Vs change of blood pressure status at 21), even though I had compromised without the age-specific regression, but I am still keen to explore why the age-specific regression didn't work despite it looking okay. I have given below the syntax. If you get time, could you kindly look at it and see if it could work by any chance? Apologies for persisting with this query.


BP.stack3 <-
reshape(Copy.of.BP_2,idvar="CODEA",timevar="time",sep="_",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21")),v.names=c("Obese","Overweight"),times=factor(c(1,2)),direction="long
BP.stack3
head(BP.stack3)
tail(BP.stack3)

 names(BP.stack3)[c(2,3,4,5,6,7)] <-
c("Sex","MaternalAge","Education","Birthplace","AggScore","IntScore")

BP.stack3 <-
transform(BP.stack3,CODEA=factor(CODEA),Sex=factor(Sex,labels=c("Male","Female")),MaternalAge=factor(MaternalAge,labels=c("39years
or less","40-49 years","50 years or
older")),Education=factor(Education,labels=c("Primary/special","Started
secondary","Completed grade10", "Completed grade12",
"College","University")),Birthplace=factor(Birthplace,labels=c("Australia","Other
English-speaking","Other")))

table(BP.stack3$Sex)  
BP.stack3$Sex <-
factor(BP.stack3$Sex,levels=levels(BP.stack3$Sex)[c(2,1)])

levels(BP.stack3$Sex)
BP.sub3a <-  subset(BP.stack3,subset=!(is.na(Sex)| is.na(Education)|is.na(Birthplace)|is.na(Education)|is.na(hibp14)| is.na(hibp21)))   
summary(BP.sub3a)
BP.sub5a <- BP.sub3a[order(BP.sub5a$CODEA),] 
 BPsub3$Categ[BPsub6$Overweight==1&BPsub3$time==1&BPsub3$Obese==0]
<- "Overweight14"
BPsub3$Categ[BPsub6$Overweight==1&BPsub3$time==2&BPsub3$Obese==0]
<- "Overweight21"
BPsub3$Categ[BPsub3$Obese==1&BPsub3$time==1&BPsub3$Overweight==0|BPsub3$Obese==1&BPsub3$time==1&BPsub3$Overweight==1
] <- "Obese14"
BPsub3$Categ[BPsub3$Obese==0&BPsub3$time==1&BPsub3 BPsub3$Categ[BPsub6$Overweight==1&BPsub3$time==1&BPsub3$Obese==0]
<- "Overweight14"$Overweight==0]
<- "Normal14"
BPsub3$Categ[BPsub3$Obese==0&BPsub3$time==2&BPsub3$Overweight==0]
<- "Normal21"
BPsub3$Categ[BPsub3$Obese==1&BPsub3$time==2&BPsub3$Overweight==0|BPsub3$Obese==1&BPsub3$time==2&BPsub3$Overweight==1]
<- "Obese21"



BPsub3$Categ <- factor(BPsub3$Categ)
BPsub3$time <- factor(BPsub3$time)
summary(BPsub3$Categ)
BPsub7 <- subset(BPsub6,subset=!(is.na(Categ)))
BPsub7$time <-
recode(BPsub7$time,"1=14;2=21")
BPsub7$hibp14 <- factor(BPsub7$hibp14)
levels(BPsub7$hibp14)
levels(BPsub7$Categ)
names(BPsub7)
head(BPsub7)    ### this was looking quite okay.

tail(BPsub7)
str(BPsub7)

library(gee)

BP.gee <- geese(hibp14~ time*Categ,
data=BPsub7,id=CODEA,family=binomial,
corstr="exchangeable",na.action=na.omit)

Thanks again.



On Sun, Jan 13, 2013 at 1:22 PM, arun <smartp...@yahoo.com> wrote:

HI,
>    
>table(BP_2b$Sex) #original dataset
>#   1    2
>#3232 3028
> nrow(BP_2b)
>#[1] 6898
> nrow(BP_2bSexNoMV)
>#[1] 6260
> 6898-6260
>#[1] 638 #these rows were removed from the BP_2b to create BP_2bSexNoMV
>BP_2bSexMale<-BP_2bSexNoMV[BP_2bSexNoMV$Sex=="Male",]
> nrow(BP_2bSexMale)
>#[1] 3232
> nrow(BP_2bSexMale[!complete.cases(BP_2bSexMale),]) #Missing rows with Male
>#[1] 2407
> nrow(BP_2bSexMale[complete.cases(BP_2bSexMale),]) #Non missing rows with Male
>#[1] 825
>
>
>You did the chisquare test on the new dataset with 6260 rows, right.
>I removed those 638 rows because these doesn't belong to either male or female, but you want the % of missing value per male or female.  So, I thought this will bias the results.  If you want to include the missing values, you could do it, but I don't know where you would put that missing values as it cannot be classified as belonging specifically to males or females.  I hope you understand it.
>
>Sometimes, the maintainer's respond a bit slow.  You have to sent an email reminding him again.
>
>Regarding the vmv package, you could email Waqas Ahmed Malik (ma...@math.uni-augsburg.de) regarding options for changing the title and the the font etc.
>You could also use this link (http://www.r-bloggers.com/visualizing-missing-data-2/ ) to plot missing value (?plot.missing()).  I never used that package, but you could try.  Looks like it gives more information.
>
>A.K.
>
>
>
>
>
>
>
>
>________________________________
>From: Usha Gurunathan <usha....@gmail.com>
>To: arun <smartp...@yahoo.com>
>Sent: Saturday, January 12, 2013 9:05 PM
>
>Subject: Re: [R] random effects model
>
>
>Hi A.K
>
>So it is number of females missing/total female participants enrolled: 72.65%
>Number of females missing/total (of males+ females)  participants enrolled : 35.14%
>
>The total no. with the master data: Males: 3232, females: 3028 ( I got this before removing any missing values)
>
>with table(Copy.of.BP_2$ Sex)  ## BP
>
>
>If I were to write a table (  and do a chi sq. later), 
>
>as Gender            Study                    Non study/missing     Total
>      Male              825 (25.53%)             2407 (74.47%)       3232 (100%)
>    Female           828 (27.35%)             2200 (72.65%)       3028 ( 100%)
>     Total              1653                          4607                      6260    
>
>
>The problem is when I did 
>>colSums(is.na(Copy.of.BP_2), the sex category showed N=638.
>
>I cannot understand the discrepancy.Also, when you have mentioned to remove NA, is that not a missing value that needs to be included in the total number missing. I am a bit confused. Can you help?
>
>## I tried sending email to gee pack maintainer at the ID with R site, mail didn't go through??
>
>Many thanks
>
>
>
>
>
>
>On Sun, Jan 13, 2013 at 9:17 AM, arun <smartp...@yahoo.com> wrote:
>

rex2013

unread,
Jan 14, 2013, 3:04:30 AM1/14/13
to r-h...@r-project.org
Sorry

I have corrected the mistakes:

BP.stack3 <-
reshape(Copy.of.BP_2,idvar="CODEA",timevar="time",sep="_",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21")),v.names=c("Obese","Overweight"),times=factor(c(1,2)),direction="long")
BP.stack3

head(BP.stack3)
tail(BP.stack3)
names(BP.stack3)[c(2,3,4,5,6,7)] <-
c("Sex","MaternalAge","Education","Birthplace","AggScore","IntScore")
names(BP.stack3)[1:7]
BP.stack3 <-
transform(BP.stack3,CODEA=factor(CODEA),Sex=factor(Sex,labels=c("Male","Female")),MaternalAge=factor(MaternalAge,labels=c("39years
or less","40-49 years","50 years or
older")),Education=factor(Education,labels=c("Primary/special","Started
secondary","Completed grade10", "Completed grade12",
"College","University")),Birthplace=factor(Birthplace,labels=c("Australia","Other
English-speaking","Other")))
str(BP.stack3)

table(BP.stack3$Sex)
BP.stack3$Sex <- factor(BP.stack3$Sex,levels=levels(BP.stack3$Sex)[c(2,1)])
levels(BP.stack3$Sex)

BPsub6 <- subset(BP.stack3,subset=!(is.na(Sex)| is.na(Education)|is.na
(Birthplace)|is.na(Education)|is.na(hibp14)| is.na(hibp21)))
summary(BPsub6)
BPsub6$Categ[BPsub6$Overweight==1&BPsub6$time==1&BPsub6$Obese==0] <-
"Overweight14"
BPsub6$Categ[BPsub6$Overweight==1&BPsub6$time==2&BPsub6$Obese==0] <-
"Overweight21"
BPsub6$Categ[BPsub6$Obese==1&BPsub6$time==1&BPsub6$Overweight==0|BPsub6$Obese==1&BPsub6$time==1&BPsub6$Overweight==1
] <- "Obese14"
BPsub6$Categ[BPsub6$Obese==0&BPsub6$time==1&BPsub6$Overweight==0] <-
"Normal14"
BPsub6$Categ[BPsub6$Obese==0&BPsub6$time==2&BPsub6$Overweight==0] <-
"Normal21"
BPsub6$Categ[BPsub6$Obese==1&BPsub6$time==2&BPsub6$Overweight==0|BPsub6$Obese==1&BPsub6$time==2&BPsub6$Overweight==1]
<- "Obese21"

BPsub6$Categ <- factor(BPsub6$Categ)
BPsub6$time <- factor(BPsub6$time)
summary(BPsub6$Categ)
BPsub7 <- subset(BPsub6,subset=!(is.na(Categ)))
BPsub7 <- BPsub7[order(BPsub7$CODEA),]

BPsub7$hibp14 <- factor(BPsub7$hibp14)
levels(BPsub7$hibp14)
levels(BPsub7$Categ)
names(BPsub7)
head(BPsub7)
tail(BPsub7)
str(BPsub7)

library(gee)
BP.gee8 <- gee(hibp14~ time*Categ, data=BPsub7,id=CODEA,family=binomial,
corstr="exchangeable",na.action=na.omit)
summary(BP.gee8)


## Can you try this out please? I am not clear where the defect is with
model? One other previous model had no correlation between obese 14 and
time. With this one, i cannot find anything wrong as such, but still wont
work.

Thanks
> From: Usha Gurunathan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=0>>
>
> To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=1>>
> On Sun, Jan 13, 2013 at 1:22 PM, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=2>>
> >Regarding the vmv package, you could email Waqas Ahmed Malik ([hidden
> email] <http://user/SendEmail.jtp?type=node&node=4655440&i=3>) regarding
> options for changing the title and the the font etc.
> >You could also use this link (
> http://www.r-bloggers.com/visualizing-missing-data-2/ ) to plot missing
> value (?plot.missing()). I never used that package, but you could try.
> Looks like it gives more information.
> >
> >A.K.
> >
> >
> >
> >
> >
> >
> >
> >
> >________________________________
> >From: Usha Gurunathan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=4>>
>
> >To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=5>>
> >On Sun, Jan 13, 2013 at 9:17 AM, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=6>>
> wrote:
> >
> >Hi,
> >>Yes, you are right. 72.655222% was those missing among females.
> 35.14377% of values in females are missing from among the whole dataset
> (combined total of Males+Females data after removing the NAs from the
> variable "Sex").
> >>
> >>A.K.
> >>
> >>
> >>
> >>________________________________
> >>From: Usha Gurunathan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=7>>
>
> >>To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=8>>
>
> >>Cc: R help <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=9>>
> >>On Sun, Jan 13, 2013 at 3:28 AM, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655440&i=10>>
> wrote:
> >>
> >>lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x)
> (nrow(x[!complete.cases(x[,-2]),])/nrow(x))*100) #gives the percentage of
> rows of missing #values from the overall rows for Males and Females
> >>>#$Female
> >>>#[1] 72.65522
> >>>#
> >>>#$Male
> >>>#[1] 74.47401
> >>>
> >>>#iF you want the percentage from the total number rows in Males and
> Females (without NA's in the the Sex column)
> >>> lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x)
> (nrow(x[!complete.cases(x[,-2]),])/nrow(BP_2bSexNoMV))*100)
> >>>#$Female
> >>>#[1] 35.14377
> >>>#
> >>>#$Male
> >>>#[1] 38.45048
> >>
> >
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4655440&i=11>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655440.html
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655456.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

Usha Gurunathan

unread,
Jan 14, 2013, 6:32:27 AM1/14/13
to arun, R help
Hi AK

I have been trying to create some plots. All being categorical variables, I
am not getting any luck with plots. The few ones that have worked are below:

barchart(~table(HiBP)|Obese,data=BP.sub3) ## BP.sub3 is the stacked data
without missing values

barchart(~table(HiBP)|Overweight,data=BP.sub3)

plot(jitter(hibp14,factor=2)~jitter(Obese14,factor=2),col="gray",cex=0.7,
data=Copy.of.BP_2) ## Copy.of.BP_2 is the original wide format

## not producing any good plots with mixed models as well.
summary(lme.3 <- lme(HiBP~time, data=BP.sub3,random=~1|CODEA,
na.action=na.omit))
anova(lme.3)
head(ranef(lme.3))
print(plot(ranef(lme.3))) ##

Thanks for any help.
[[alternative HTML version deleted]]

arun

unread,
Jan 14, 2013, 2:24:41 PM1/14/13
to rex2013, R help
Hi,

I get the error message.
BP.gee8<- gee(hibp14~time*Categ,data=BPsub7,id=CODEA,family=binomial,corstr="exchangeable",na.action=na.omit)
#Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
#Error in gee(hibp14 ~ time * Categ, data = BPsub7, id = CODEA, family = binomial,  :
 # rank-deficient model matrix
#Reason I mentioned before.

If you want to see what is happening, you could check this:
dat1<-structure(list(CODEA = c(3L, 7L, 8L), IntScore = c(0L, 0L, 0L
), Obese14 = c(1L, 0L, 1L), Overweight14 = c(1L, 1L, 1L), Obese21 = c(1L,
1L, 0L), Overweight21 = c(1L, 1L, 0L), hibp14 = c(1L, 0L, 1L),
    hibp21 = c(0L, 1L, 0L)), .Names = c("CODEA", "IntScore",
"Obese14", "Overweight14", "Obese21", "Overweight21", "hibp14",
"hibp21"), class = "data.frame", row.names = c(NA, -3L))
reshape(dat1,idvar="CODEA",timevar="time",sep="_",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21")),v.names=c("Obese","Overweight"),times=factor(c(1,2)),direction="long") #your code
#    CODEA IntScore hibp14 hibp21 time Obese Overweight   #here hibp14/hibp21 (response variable) gets replicated i.e. hibp14 values are the same for time==1 and time==2 whereas Obese, Overweigh #values change
#3.1     3        0      1      0    1     1          1
#7.1     7        0      0      1    1     0          1
#8.1     8        0      1      0    1     1          1
#3.2     3        0      1      0    2     1          1
#7.2     7        0      0      1    2     1          1
#8.2     8        0      1      0    2     0          0
A.K.




----- Original Message -----
From: rex2013 <usha....@gmail.com>
To: r-h...@r-project.org
Cc:

arun

unread,
Jan 14, 2013, 5:54:10 PM1/14/13
to Usha Gurunathan, R help
HI,

BP_2b<-read.csv("BP_2b.csv",sep="\t")
BP_2bNM<-na.omit(BP_2b)

BP.stack3 <- reshape(BP_2bNM,idvar="CODEA",timevar="time",sep="",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21"),c("hibp14","hibp21")),v.names=c("Obese","Overweight","HiBP"),times=factor(c(1,2)),direction="long")
library(car)
BP.stack3$Obese<- recode(BP.stack3$Obese,"1='Obese';0='Not Obese'")
BP.stack3$Overweight<- recode(BP.stack3$Overweight,"1='Overweight';0='Not Overweight'")

library(ggplot2)
ggplot(BP.stack3,aes(x=factor(HiBP),fill=Obese))+geom_bar(position="fill")
ggplot(BP.stack3,aes(x=factor(HiBP),fill=Overweight))+geom_bar(position="fill")

You could try lmer() from lme4. 
library(lme4)
fm1<-lmer(HiBP~time+(1|CODEA), family=binomial,data=BP.stack3) #check codes, not sure
print(dotplot(ranef(fm1,post=TRUE),
              scales = list(x = list(relation = "free")))[[1]])
qmt1<- qqmath(ranef(fm1, postVar=TRUE))
print(qmt1[[1]])

A.K.





________________________________
From: Usha Gurunathan <usha....@gmail.com>
To: arun <smartp...@yahoo.com>
Cc: R help <r-h...@r-project.org>
Sent: Monday, January 14, 2013 6:32 AM
Subject: Re: [R] random effects model


Hi AK

I have been trying to create some plots. All being categorical variables, I am not getting any luck with plots. The few ones that have worked are below:

barchart(~table(HiBP)|Obese,data=BP.sub3) ## BP.sub3 is the stacked data without missing values

barchart(~table(HiBP)|Overweight,data=BP.sub3)

plot(jitter(hibp14,factor=2)~jitter(Obese14,factor=2),col="gray",cex=0.7, data=Copy.of.BP_2)  ## Copy.of.BP_2 is the original wide format

## not producing any good plots with mixed models as well.
summary(lme.3 <- lme(HiBP~time, data=BP.sub3,random=~1|CODEA, na.action=na.omit))
anova(lme.3)
head(ranef(lme.3))
print(plot(ranef(lme.3))) ##

Thanks for any help.





On Mon, Jan 14, 2013 at 4:33 AM, arun <smartp...@yahoo.com> wrote:


>
>
>HI,
>
>I think I mentioned to you before that when you reshape the
>columns excluding the response variable, response variable gets repeated
>(in this case hibp14 or hibp21) and creates the error"
>
>
>I run your code, there are obvious problems in the code so I didn't reach up to BP.gee
>
>
>BP_2b<-read.csv("BP_2b.csv",sep="\t")
>BP.stack3 <- reshape(BP_2b,idvar="CODEA",timevar="time",sep="_",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21")),v.names=c("Obese","Overweight"),times=factor(c(1,2)),direction="long")
>
>
>BP.stack3 <- transform(BP.stack3,CODEA=factor(CODEA),Sex=factor(Sex,labels=c("Male","Female")),MaternalAge=factor(MaternalAge,labels=c("39years or less","40-49 years","50 years or older")),Education=factor(Education,labels=c("Primary/special","Started secondary","Completed grade10", "Completed grade12", "College","University")),Birthplace=factor(Birthplace,labels=c("Australia","Other English-speaking","Other")))
>
> BP.stack3$Sex <- factor(BP.stack3$Sex,levels=levels(BP.stack3$Sex)[c(2,1)])
>
> BP.sub3a <-  subset(BP.stack3,subset=!(is.na(Sex)| is.na(Education)|is.na(Birthplace)|is.na(Education)|is.na(hibp14)| is.na(hibp21)))  
>BP.sub3a <-  subset(BP.stack3,subset=!(is.na(Sex)| is.na(Education)|is.na(Birthplace)|is.na(Education)|is.na(hibp14)| is.na(hibp21)))   
>>Regarding the vmv package, you could email Waqas Ahmed Malik (ma...@math.uni-augsburg.de) regarding options for changing the title and the the font etc.
>>You could also use this link (http://www.r-bloggers.com/visualizing-missing-data-2/ ) to plot missing value (?plot.missing()).  I never used that package, but you could try.  Looks like it gives more information.

Usha Gurunathan

unread,
Jan 15, 2013, 6:31:55 AM1/15/13
to arun, R help
Hi AK

Got an error message with

library(ggplot2)>
ggplot(BP.stack1,aes(x=factor(HiBP),fill=Obese))+geom_bar(position="fill")Error
in rename(x, .base_to_ggplot, warn_missing = FALSE) :
could not find function "revalue">
ggplot(BP.stack1,aes(x=factor(HiBP),fill=Overweight))+geom_bar(position="fill")Error
in rename(x, .base_to_ggplot, warn_missing = FALSE) :
could not find function "revalue"
I got the dot plot, thanks for that.

I have attached some plots, not sure how to interpret, they had
unusual patterns.Is it because of missing data? I tried removing the
missing data too. They still appeared the same. Do I need to transform
the data?


Thanks in advance.
QQ plot HIBP ~Overweight.png
Fitted vs residuals HiBP ~time+Sex+Maternal Age, random intercept.png
QQ plot hibp21 BP.sub4.png
HiBP ~Overweight qq plots residuals,random effects.png

arun

unread,
Jan 15, 2013, 8:22:09 AM1/15/13
to Usha Gurunathan, R help


Hi,
Check these links:
http://comments.gmane.org/gmane.comp.lang.r.ggplot2/6527
https://groups.google.com/forum/#!msg/ggplot2/nfVjxL0DXnY/5zf50zCeZuMJ
A.K.

________________________________
From: Usha Gurunathan <usha....@gmail.com>
To: arun <smartp...@yahoo.com>
Cc: R help <r-h...@r-project.org>
Sent: Tuesday, January 15, 2013 6:31 AM
Subject: Re: [R] random effects model


Hi AK

Got an error message with
library(ggplot2) > ggplot(BP.stack1,aes(x=factor(HiBP),fill=Obese))+geom_bar(position="fill") Error in rename(x, .base_to_ggplot, warn_missing = FALSE) :  could not find function "revalue" > ggplot(BP.stack1,aes(x=factor(HiBP),fill=Overweight))+geom_bar(position="fill") Error in rename(x, .base_to_ggplot, warn_missing = FALSE) :  could not find function "revalue"
>>BP.sub3a <-  subset(BP.stack3,subset=!(is.na(Sex)| is.na(Education)|is.na(Birthplace)|is.na(Education)|is.na(hibp14)| is.na(hibp21)))   
>>>Regarding the vmv package, you could email Waqas Ahmed Malik (ma...@math.uni-augsburg.de) regarding options for changing the title and the the font etc.
>>>You could also use this link (http://www.r-bloggers.com/visualizing-missing-data-2/ ) to plot missing value (?plot.missing()).  I never used that package, but you could try.  Looks like it gives more information.

rex2013

unread,
Jan 16, 2013, 5:06:54 AM1/16/13
to r-h...@r-project.org
Hi

I tried removing the missing values and installing "plyr". Still error
message appears with ggplot2

Btw, did you get the attachments with my earlier mail?

Ta.

On Wed, Jan 16, 2013 at 3:16 AM, arun kirshna [via R] <
ml-node+s7896...@n4.nabble.com> wrote:

>
>
> Hi,
> Check these links:
> http://comments.gmane.org/gmane.comp.lang.r.ggplot2/6527
> https://groups.google.com/forum/#!msg/ggplot2/nfVjxL0DXnY/5zf50zCeZuMJ
> A.K.
>
> ________________________________
> From: Usha Gurunathan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=0>>
>
> To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=1>>
>
> Cc: R help <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=2>>
>
> Sent: Tuesday, January 15, 2013 6:31 AM
> Subject: Re: [R] random effects model
>
>
> Hi AK
>
> Got an error message with
> library(ggplot2) >
> ggplot(BP.stack1,aes(x=factor(HiBP),fill=Obese))+geom_bar(position="fill")
> Error in rename(x, .base_to_ggplot, warn_missing = FALSE) : could not find
> function "revalue" >
> ggplot(BP.stack1,aes(x=factor(HiBP),fill=Overweight))+geom_bar(position="fill")
> Error in rename(x, .base_to_ggplot, warn_missing = FALSE) : could not find
> function "revalue"
> I got the dot plot, thanks for that.
>
> I have attached some plots, not sure how to interpret, they had unusual
> patterns.Is it because of missing data? I tried removing the missing data
> too. They still appeared the same. Do I need to transform the data?
>
>
> Thanks in advance.
>
>
>
>
>
> On Tue, Jan 15, 2013 at 8:54 AM, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=3>>
> wrote:
>
> HI,
>
> >
> >
> >BP_2b<-read.csv("BP_2b.csv",sep="\t")
> >BP_2bNM<-na.omit(BP_2b)
> >
> >BP.stack3 <-
> reshape(BP_2bNM,idvar="CODEA",timevar="time",sep="",varying=list(c("Obese14","Obese21"),c("Overweight14","Overweight21"),c("hibp14","hibp21")),v.names=c("Obese","Overweight","HiBP"),times=factor(c(1,2)),direction="long")
>
> >library(car)
> >BP.stack3$Obese<- recode(BP.stack3$Obese,"1='Obese';0='Not Obese'")
> >BP.stack3$Overweight<- recode(BP.stack3$Overweight,"1='Overweight';0='Not
> Overweight'")
> >
> >library(ggplot2)
> >ggplot(BP.stack3,aes(x=factor(HiBP),fill=Obese))+geom_bar(position="fill")
>
> >ggplot(BP.stack3,aes(x=factor(HiBP),fill=Overweight))+geom_bar(position="fill")
>
> >
> >You could try lmer() from lme4.
> >library(lme4)
> >fm1<-lmer(HiBP~time+(1|CODEA), family=binomial,data=BP.stack3) #check
> codes, not sure
> >print(dotplot(ranef(fm1,post=TRUE),
> > scales = list(x = list(relation = "free")))[[1]])
> >qmt1<- qqmath(ranef(fm1, postVar=TRUE))
> >print(qmt1[[1]])
> >
> >
> >A.K.
> >
> >
> >
> >
> >
> >________________________________
> >From: Usha Gurunathan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=4>>
>
> >To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=5>>
>
> >Cc: R help <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=6>>
>
> >Sent: Monday, January 14, 2013 6:32 AM
> >
> >Subject: Re: [R] random effects model
> >
> >
> >Hi AK
> >
> >I have been trying to create some plots. All being categorical variables,
> I am not getting any luck with plots. The few ones that have worked are
> below:
> >
> >barchart(~table(HiBP)|Obese,data=BP.sub3) ## BP.sub3 is the stacked data
> without missing values
> >
> >barchart(~table(HiBP)|Overweight,data=BP.sub3)
> >
> >plot(jitter(hibp14,factor=2)~jitter(Obese14,factor=2),col="gray",cex=0.7,
> data=Copy.of.BP_2) ## Copy.of.BP_2 is the original wide format
> >
> >## not producing any good plots with mixed models as well.
> >summary(lme.3 <- lme(HiBP~time, data=BP.sub3,random=~1|CODEA,
> na.action=na.omit))
> >anova(lme.3)
> >head(ranef(lme.3))
> >print(plot(ranef(lme.3))) ##
> >
> >Thanks for any help.
> >
> >
> >
> >
> >
> >On Mon, Jan 14, 2013 at 4:33 AM, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=7>>
> >>From: Usha Gurunathan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=8>>
>
> >>To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=9>>
> >>On Sun, Jan 13, 2013 at 1:22 PM, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=10>>
> >>>Regarding the vmv package, you could email Waqas Ahmed Malik ([hidden
> email] <http://user/SendEmail.jtp?type=node&node=4655612&i=11>) regarding
> options for changing the title and the the font etc.
> >>>You could also use this link (
> http://www.r-bloggers.com/visualizing-missing-data-2/ ) to plot missing
> value (?plot.missing()). I never used that package, but you could try.
> Looks like it gives more information.
> >>>
> >>>A.K.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>________________________________
> >>>From: Usha Gurunathan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=12>>
>
> >>>To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=13>>
> >>>On Sun, Jan 13, 2013 at 9:17 AM, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=14>>
> wrote:
> >>>
> >>>Hi,
> >>>>Yes, you are right. 72.655222% was those missing among females.
> 35.14377% of values in females are missing from among the whole dataset
> (combined total of Males+Females data after removing the NAs from the
> variable "Sex").
> >>>>
> >>>>A.K.
> >>>>
> >>>>
> >>>>
> >>>>________________________________
> >>>>From: Usha Gurunathan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=15>>
>
> >>>>To: arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=16>>
>
> >>>>Cc: R help <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=17>>
> >>>>On Sun, Jan 13, 2013 at 3:28 AM, arun <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655612&i=18>>
> wrote:
> >>>>
> >>>>lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x)
> (nrow(x[!complete.cases(x[,-2]),])/nrow(x))*100) #gives the percentage of
> rows of missing #values from the overall rows for Males and Females
> >>>>>#$Female
> >>>>>#[1] 72.65522
> >>>>>#
> >>>>>#$Male
> >>>>>#[1] 74.47401
> >>>>>
> >>>>>#iF you want the percentage from the total number rows in Males and
> Females (without NA's in the the Sex column)
> >>>>> lapply(split(BP_2bSexNoMV,BP_2bSexNoMV$Sex),function(x)
> (nrow(x[!complete.cases(x[,-2]),])/nrow(BP_2bSexNoMV))*100)
> >>>>>#$Female
> >>>>>#[1] 35.14377
> >>>>>#
> >>>>>#$Male
> >>>>>#[1] 38.45048
> >>>>
> >>>
> >>
> >
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4655612&i=19>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655612.html
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655704.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

arun

unread,
Jan 16, 2013, 9:04:50 PM1/16/13
to rex2013, R help, R mixed models
HI,

In the same link at the bottom of the page,

"

All is well now after updating all packages with the following:
update.packages()"

It may or may not solve your problem.

I got your attachments. You should post those questions in (r-sig-mix...@r-project.org). I suggest you to read lme4 book (http://lme4.r-forge.r-project.org/lMMwR/)
#lrgprt.pdf

A.K.







----- Original Message -----
From: rex2013 <usha....@gmail.com>
To: r-h...@r-project.org
Cc:

rex2013

unread,
Jan 18, 2013, 10:16:09 PM1/18/13
to r-h...@r-project.org
Hi AK

I got the gg plots , thanks. All of a sudden I just got it. I didn't even
update packages. Not sure what worked.

Just a basic confusion, to find out if in the association between HiBP and
Obese/Overweight status, to quantify if obese males and obese females run a
different risk of becoming hypertensive, will the gee output of

BP.gee <- gee(HiBP~Obese*
Sex ,data=BPsub3,id=CODEA,family=binomial,
corstr="exchangeable",na.action=na.omit)

suffice?

Cheers.





> >>
On Thu, Jan 17, 2013 at 2:01 PM, arun kirshna [via R] <
ml-node+s7896...@n4.nabble.com> wrote:

> HI,
>
> In the same link at the bottom of the page,
>
> "
>
> All is well now after updating all packages with the following:
> update.packages()"
>
> It may or may not solve your problem.
>
> I got your attachments. You should post those questions in ([hidden
> email] <http://user/SendEmail.jtp?type=node&node=4655802&i=0>). I
> suggest you to read lme4 book (http://lme4.r-forge.r-project.org/lMMwR/)
> #lrgprt.pdf
>
> A.K.
>
>
>
>
>
>
>
> ----- Original Message -----
> From: rex2013 <[hidden email]<http://user/SendEmail.jtp?type=node&node=4655802&i=1>>
>
> To: [hidden email] <http://user/SendEmail.jtp?type=node&node=4655802&i=2>
> Cc:
> Sent: Wednesday, January 16, 2013 5:06 AM
> Subject: Re: [R] random effects model
>
> Hi
>
> I tried removing the missing values and installing "plyr". Still error
> message appears with ggplot2
>
> Btw, did you get the attachments with my earlier mail?
>
> Ta.
>
> On Wed, Jan 16, 2013 at 3:16 AM, arun kirshna [via R] <
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4655802&i=3>>
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4655802&i=4>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4655802&i=5>mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4655802.html
View this message in context: http://r.789695.n4.nabble.com/random-effects-model-tp4654346p4656033.html
Reply all
Reply to author
Forward
0 new messages