I am hoping to find someone who uses both R and program Stata for GLMs.
I am a beginner R user, finding my own way through; learning code etc. at the same time as learning the statistics I need to complete my project.
What I have is the code from Stata and am trying to reproduce the same analysis in R - my program of choice.
. glm count md ms rf sg, family(poisson)
exposure(effort) eform
I am lost at the point of finding the equivalent code for 'exposure'.
Having looked at a few forums and 'googled'. I thought 'offset', used as offset=(log(Eff)) or the equivalent +offset(log(Eff)) would produce the desired effect.
Incidentally my code was: glm(Count~md+ms+rf+sg+offset(Eff),family=poisson,data=DepthHabGen)
(Making use of glm{stats})
However, offset does not seem to be equivalent to 'exposure' in Stata. As coefficients and log likelhood estimates differ.
So I asked the following questions:
1. Do both programs produce the same results without 'exposure' i.e. glm models
Yes, log likelihoods and coefficients are the same.
2. How about using the unintuitive non logged " offset=Eff" ?
Coefficients and log likelihoods still differ.
So now I defer to those with more experience, does anyone know how to produce the equivalent of Stata's glm 'exposure' in R?
Thank you in advance,
Columbine
[[alternative HTML version deleted]]
______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
> I am hoping to find someone who uses both R and program Stata for GLMs.
[snip]
> What I have is the code from Stata and am trying to reproduce the same
analysis in R - my program of choice.
>
> . glm count md ms rf sg, family(poisson)
> exposure(effort) eform
>
> I am lost at the point of finding the equivalent code for 'exposure'.
>
> Having looked at a few forums and 'googled'. I thought 'offset', used as
offset=(log(Eff)) or the
> equivalent +offset(log(Eff)) would produce the desired effect.
>
> Incidentally my code was:
glm(Count~md+ms+rf+sg+offset(Eff),family=poisson,data=DepthHabGen)
>
> (Making use of glm{stats})
>
Based on your discussion above did you mean
glm(count~md+ms+rf+sg+offset(log(Eff)),
family=poisson,data=DepthHabGen)
? is this just a typo ?
according to http://www.stata.com/help.cgi?glm
exposure(varname) include ln(varname) in model with coefficient
constrained to 1
offset(varname) include varname in model with coefficient
constrained to 1
> However, offset does not seem to be equivalent to 'exposure' in Stata. As
coefficients and log likelhood
> estimates differ.
>
> So I asked the following questions:
>
> 1. Do both programs produce the same results without
> 'exposure' i.e. glm models
>
> Yes, log likelihoods and coefficients are the same.
>
> 2. How about using the unintuitive non logged " offset=Eff" ?
>
> Coefficients and log likelihoods still differ.
You're not doing anything obviously wrong. Are you sure your
"effort" and "Eff" variables are the same, i.e. nothing got mangled
moving to R?
I don't use Stata, perhaps someone else can try. Posting a small
reproducible example would be helpful.
Officially I tried:
> glm(count~md+ms+rf+sg+offset(log(Eff)), family=poisson,data=DepthHabGen)
> glm(count~md+ms+rf+sg, offset=(log(Eff)), family=poisson,data=DepthHabGen)
(which of course are the same as eachother)
> glm(count~md+ms+rf+sg, offset=(Eff), family=poisson,data=DepthHabGen)
> glm(count~md+ms+rf+sg+offset(Eff), family=poisson,data=DepthHabGen)
(which are also the same between themselves, yet wrong compared to the STATA model)
Additionally, given the text you found on stata website, which I am familiar with, I also tried:
> glm(count~md+ms+rf+sg, offset=(exp(Eff)), family=poisson,data=DepthHabGen)
> glm(count~md+ms+rf+sg+offset(exp(Eff)), family=poisson,data=DepthHabGen)
(which still might be the solution however R issues the following response:
Error: no valid set of coefficients has been found: please supply starting value)
Which, according to other help postings might be able to be forced to function. And I quote from 2008 message
[R] glm error message when using family Gamma(link="inverse")"You might be able to calm glm()'s frayed nerves by supplying some decent
start values as it is asking. Possibly starting with a subset of
variables, using the coef()s for the subset and setting the others to zero
when you attempt to fit all together will be enough to push it past its
sticking point."
I'm not sure how to go about employing this:
Using, coef() will bring up a list of all coeff, which at this point having taken the inverse log or exponential of the variable "Eff", are all very large!
(Eff: is a set of real numbers representing the mean of effort in days 1-209, for each count
Also, combined with other postings and I gather it may be possible to use the start() or etastart() functions
>From [R] documentation, Fitting Generalized linear Models
start
starting values for the parameters in the linear predictor.
etastart
starting values for the linear predictor.
Which I don't understand how to do?
Or perhaps change the link? What would be recommended?
Thank you.
> To: r-h...@stat.math.ethz.ch
> From: bbo...@gmail.com
> Date: Tue, 16 Nov 2010 13:16:20 +0000
> Subject: Re: [R] Offset in glm poisson using R vs Exposure in Stata
[[alternative HTML version deleted]]
On 11/16/2010 03:08 PM, Columbine Caroline Waring wrote:
> Officially I tried:
**A**
>> glm(count~md+ms+rf+sg+offset(log(Eff)), family=poisson,data=DepthHabGen)
>> glm(count~md+ms+rf+sg, offset=(log(Eff)), family=poisson,data=DepthHabGen)
> (which of course are the same as eachother)
>
**B**
>> glm(count~md+ms+rf+sg, offset=(Eff), family=poisson,data=DepthHabGen)
>> glm(count~md+ms+rf+sg+offset(Eff), family=poisson,data=DepthHabGen)
> (which are also the same between themselves, yet wrong compared to the
> STATA model)
>
> Additionally, given the text you found on stata website, which I am
> familiar with, I also tried:
**C**
>> glm(count~md+ms+rf+sg, offset=(exp(Eff)), family=poisson,data=DepthHabGen)
>> glm(count~md+ms+rf+sg+offset(exp(Eff)), family=poisson,data=DepthHabGen)
> (which still might be the solution however R issues the following response:
> Error: no valid set of coefficients has been found: please supply
> starting value)
In my opinion, B and C are just wrong (C is in the wrong direction,
and it's not surprising that glm has hiccups when adding a
doubly-exponentiated version of the Eff variable to the linear predictor).
So I think all the other stuff about specifying starting values is
essentially a red herring.
I still don't know what Stata is doing but in your position I would
make up some data where I knew the answer and try it in both R and
Stata. For example:
set.seed(1001)
md <- runif(100)
ms <- runif(100)
dat <- expand.grid(md=md,ms=ms)
dat$eff <- runif(nrow(dat))+2*dat$md
dat$eta <- with(dat,2*md-2*ms+log(eff))
dat$y <- with(dat,rpois(nrow(dat),exp(eta)))
m1 <- glm(y~md+ms+offset(log(eff)),data=dat, family="poisson")
summary(m1)
I have purposely set up the offset here so that it is strongly
correlated with md, and will screw things up if it is not accounted for
properly. I made the data set quite large so that it is clear that the
model is accurately retrieving the coefficients (2 and -2) assigned to
the predictors.
cheers
Ben
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkzjCfsACgkQc5UpGjwzenOc+wCfTMK8AdbiFkraQeDTd1LMcqOf
1dAAmgP/bR72ELMHsmAYHcPM2IX0AWLN
=bnkm
-----END PGP SIGNATURE-----
Thank you for your assistance.
Going back to basics and using the data set as you suggested has resulted in a win.
Set A works!
using +offset(log(variable)) or ,offest=(log(Eff)) is the same as using exposure(variable) program stata.
I went back and isloated a problem with code before this point.
Thank you again.
Columbine
> Date: Tue, 16 Nov 2010 17:47:23 -0500
> From: bbo...@gmail.com
> To: caqui...@hotmail.com
> CC: r-h...@r-project.org
> Subject: Re: [R] Offset in glm poisson using R vs Exposure in Stata
>
[[alternative HTML version deleted]]