[R] lme funcion in R

0 views
Skip to first unread message

Hongwei Dong

unread,
Aug 3, 2009, 1:15:46 PM8/3/09
to r-h...@r-project.org
Hi, R users,
I'm using the "lme" function in R to estimate a 2 level mixed effects
model, in which the size of the subject groups are different. It turned out
that It takes forever for R to converge. I also tried the same thing in SPSS
and SPSS can give the results out within 20 minutes. Anyone can give me some
advice on the lme function in R, especially why R does not converge? Thanks.


Harry

[[alternative HTML version deleted]]

______________________________________________
R-h...@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Chuck Cleland

unread,
Aug 3, 2009, 1:26:22 PM8/3/09
to Hongwei Dong, r-h...@r-project.org
On 8/3/2009 1:15 PM, Hongwei Dong wrote:
> Hi, R users,
> I'm using the "lme" function in R to estimate a 2 level mixed effects
> model, in which the size of the subject groups are different. It turned out
> that It takes forever for R to converge. I also tried the same thing in SPSS
> and SPSS can give the results out within 20 minutes. Anyone can give me some
> advice on the lme function in R, especially why R does not converge? Thanks.

Harry:
You are much more likely to get helpful advice if you include the code
you used to attempt to fit the model and a brief description of the
data. For example, something along these lines but for your data and model:

library(nlme)

fm2 <- lme(distance ~ age + Sex, data = Orthodont, random = ~ 1)

str(Orthodont)

Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and
'data.frame': 108 obs. of 4 variables:
$ distance: num 26 25 29 31 21.5 22.5 23 26.5 23 22.5 ...
$ age : num 8 10 12 14 8 10 12 14 8 10 ...
$ Subject : Ord.factor w/ 27 levels "M16"<"M05"<"M02"<..: 15 15 15 15 3
3 3 3 7 7 ...
$ Sex : Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "outer")=Class 'formula' length 2 ~Sex
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
- attr(*, "formula")=Class 'formula' length 3 distance ~ age | Subject
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
- attr(*, "labels")=List of 2
..$ x: chr "Age"
..$ y: chr "Distance from pituitary to pterygomaxillary fissure"
- attr(*, "units")=List of 2
..$ x: chr "(yr)"
..$ y: chr "(mm)"
- attr(*, "FUN")=function (x)
..- attr(*, "source")= chr "function (x) max(x, na.rm = TRUE)"
- attr(*, "order.groups")= logi TRUE

hope this helps,

Chuck

> Harry
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-h...@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Chuck Cleland, Ph.D.
NDRI, Inc. (www.ndri.org)
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

Jason Morgan

unread,
Aug 3, 2009, 1:36:11 PM8/3/09
to Hongwei Dong, r-h...@r-project.org
On 2009.08.03 10:15:46, Hongwei Dong wrote:
> Hi, R users,
> I'm using the "lme" function in R to estimate a 2 level mixed effects
> model, in which the size of the subject groups are different. It turned out
> that It takes forever for R to converge. I also tried the same thing in SPSS
> and SPSS can give the results out within 20 minutes. Anyone can give me some
> advice on the lme function in R, especially why R does not converge? Thanks.
>
> Harry

Hello Harry,

As Chuck mentions, providing some more information on the model and
the data you are using would be helpful. Also, be sure to compare the
optimization methods used in SPSS to that used in R. You can change
the optimization method in R if the default seems to be causing
issues. See help(lmeControl) for numerous setting options.

~Jason

--
Jason W. Morgan
Graduate Student
Department of Political Science
*The Ohio State University*
154 North Oval Mall
Columbus, Ohio 43210

Hongwei Dong

unread,
Aug 3, 2009, 1:44:51 PM8/3/09
to r-h...@r-project.org
Thanks for the replies above. Here are my script and data structure:
library(nlme)
tlevel<-lme(fixed = LN_unitlandval ~
MH_D+APT_D+ResOth_D+NonRes_D+Vacant_D+access_emp1+pct_vacant+transit_D+park_dum,data=lusdrdata,random
= ~MH_D+APT_D+ResOth_D+NonRes_D+Vacant_D | TAZ)

str:

$ TAZ : int 100 100 100 100 100 100 100 100 100 100 ...
$ MH_D : num 0 0 0 0 0 0 0 0 0 0 ...
$ APT_D : num 0 0 0 0 0 0 0 0 0 0 ... $ ResOth_D : num 0 0 0 0 0 0 0 0 0 0
... $ NonRes_D : num 0 0 0 0 0 0 0 0 0 1 ...
$ Vacant_D : num 1 1 1 0 0 1 1 1 1 0 ...
$ access_emp1 : num 45.8 45.8 45.8 45.8 45.8 ...
$ pct_vacant : num 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 ... $ transit_D :
num 0 0 0 0 0 0 0 0 0 0 ... $ park_dum : num 0 0 0 0 0 0 0 0 0 0 ...


Thanks.

Harry

[[alternative HTML version deleted]]

Hongwei Dong

unread,
Aug 3, 2009, 10:33:14 PM8/3/09
to Jason Morgan, r-h...@r-project.org
Thanks.
I tried to set a higher tolerance for the convergence, such as
changing tolerance from 1e-6 to 1e-5, msTol from 1e-7 to 1e-6, but R still
does not converge. Any more suggestions?

Harry


On Mon, Aug 3, 2009 at 10:36 AM, Jason Morgan <jwm-r...@skepsi.net> wrote:

[[alternative HTML version deleted]]

ONKELINX, Thierry

unread,
Aug 4, 2009, 4:28:12 AM8/4/09
to Hongwei Dong, r-h...@r-project.org
Dear Harry,

Your model seems rather complex. Do you have enough data to support it?
Did you check for multicollinearity between the variables?

HTH,

Thierry


------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry....@inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-...@r-project.org [mailto:r-help-...@r-project.org]
Namens Hongwei Dong
Verzonden: maandag 3 augustus 2009 19:45
Aan: r-h...@r-project.org
Onderwerp: Re: [R] lme funcion in R

str:


Thanks.

Harry

[[alternative HTML version deleted]]

Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.

Hongwei Dong

unread,
Aug 4, 2009, 7:48:51 PM8/4/09
to r-h...@r-project.org
Yeah, I have a very large sample size, about 60,000 observations.
Multicollinearity should not be a problem here. The weird thing is that SPSS
can converge very quickly and gives out reasonable results.
The only problem I can think of is that, my first level (random) variables
are dummy variables: 6 housing types, and I used five dummies in model and
one as the reference. I also tried to combine them into two groups and use
only dummy at random level, but it does not work either.

is there any one here has similar experience with the LME function in R?

Thanks.

Harry

On Tue, Aug 4, 2009 at 1:28 AM, ONKELINX, Thierry
<Thierry....@inbo.be>wrote:

David Winsemius

unread,
Aug 4, 2009, 8:47:55 PM8/4/09
to Hongwei Dong, r-h...@r-project.org

On Aug 4, 2009, at 7:48 PM, Hongwei Dong wrote:

> Yeah, I have a very large sample size, about 60,000 observations.
> Multicollinearity should not be a problem here. The weird thing is
> that SPSS
> can converge very quickly and gives out reasonable results.
> The only problem I can think of is that, my first level (random)
> variables
> are dummy variables: 6 housing types, and I used five dummies in
> model and
> one as the reference. I also tried to combine them into two groups
> and use
> only dummy at random level, but it does not work either.
>
> is there any one here has similar experience with the LME function
> in R?

I have absolutely no experience with "LME" but I can predict with very
high probability that you would be getting more sensible result if you
modeled those housing types with a single factor variable rather than
creating 6 dummies. ((Would one generally not create a reference dummy?)

?factor

--
David.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

David Winsemius

unread,
Aug 4, 2009, 8:59:56 PM8/4/09
to David Winsemius, r-h...@r-project.org

On Aug 4, 2009, at 8:47 PM, David Winsemius wrote:

>
> On Aug 4, 2009, at 7:48 PM, Hongwei Dong wrote:
>
>> Yeah, I have a very large sample size, about 60,000 observations.
>> Multicollinearity should not be a problem here. The weird thing is
>> that SPSS
>> can converge very quickly and gives out reasonable results.
>> The only problem I can think of is that, my first level (random)
>> variables
>> are dummy variables: 6 housing types, and I used five dummies in
>> model and
>> one as the reference. I also tried to combine them into two groups
>> and use
>> only dummy at random level, but it does not work either.
>>
>> is there any one here has similar experience with the LME function
>> in R?
>
> I have absolutely no experience with "LME" but I can predict with
> very high probability
> that you would be getting more sensible result if you modeled those
> housing types with a

> single factor variable rather than creating 6 dummies.

> ((Would one generally not create a reference dummy?)

^^^meant to create a double negative here ^^^
>

... avoid creating a reference dummy?????

>>> On Mon, Aug 3, 2009 at 10:36 AM, Jason Morgan <jwm-r-
>>> he...@skepsi.net>

Hongwei Dong

unread,
Aug 4, 2009, 9:06:11 PM8/4/09
to David Winsemius, r-h...@r-project.org
Thanks, David, you are right. If I use continuous data such as 1, 2, ...6 to
represent those 6 housing types, the model works with the lme function in R.
The problem is, the relationship between the 6 housing types are not
continuous, which we assume when we use 1,2,..6 to represent them.
Harry

Hongwei Dong

unread,
Aug 4, 2009, 9:08:09 PM8/4/09
to r-h...@r-project.org
In addition, it seems the lme function in R are very troubled with the dummy
variables used at the first (random) level.
Harry

David Winsemius

unread,
Aug 4, 2009, 10:11:31 PM8/4/09
to Hongwei Dong, r-h...@r-project.org

On Aug 4, 2009, at 9:06 PM, Hongwei Dong wrote:

> Thanks, David, you are right. If I use continuous data such as 1,
> 2, ...6 to represent those 6 housing types, the model works with the
> lme function in R. The problem is, the relationship between the 6
> housing types are not continuous, which we assume when we use
> 1,2,..6 to represent them.

And that is why you use a factor variable rather than a numeric
variable.

> Van: r-help-...@r-project.org [mailto:r-help-bounces@r-

Hongwei Dong

unread,
Aug 4, 2009, 10:50:17 PM8/4/09
to r-h...@r-project.org
Hi, the problem has been solved by using "lmer" rather than the "lme"
function. It seems the "lme" function dislike my large data size. "lmer"
function deals with large data size much better. Thanks for the replies
above.
Harry

>>>> Van: r-help-...@r-project.org [mailto:r-help-...@r-project.org]

ONKELINX, Thierry

unread,
Aug 5, 2009, 4:22:30 AM8/5/09
to Hongwei Dong, r-h...@r-project.org
Harry,

I you use dummy variables, then you can only use (n-1) dummy variables
if your variable has n levels. Otherwise you introduce
multicollinearity! If you use n dummy variable then you can express one
dummy variable as a linear combination of the others.

Make use of a factor variable. That is much easier to work with that
dummy variables. The model itself will create the necessary dummy
variables.

lusdrdata$HousingType <- factor(lusdrdata$HousingType, levels = 1:6,
labels = c("Reference", "MH_D", "APT_D", "ResOth_D", "NonRes_D",
"Vacant_D"))
lme(fixed = LN_unitlandval ~ HousingType +
access_emp1+pct_vacant+transit_D +park_dum,data=lusdrdata, random = ~
HousingType | TAZ)

HTH,

Thierry

Verzonden: woensdag 5 augustus 2009 1:49


Aan: r-h...@r-project.org
Onderwerp: Re: [R] lme funcion in R

Yeah, I have a very large sample size, about 60,000 observations.

Hongwei Dong

unread,
Aug 5, 2009, 3:11:51 PM8/5/09
to ONKELINX, Thierry, r-h...@r-project.org
Thanks, Thierry and other R users.
I estimate the model using the factor rather than the dummy variables I used
previously. It still takes forever for the function "lme" to run. But "lmer"
is much better with my large data size (about 60,000 observations).
The interesting part is that the results from the model using factor are
slightly different from what I got from the model using dummy variables,
especially for the variables at random level.

The estimated random effects by using dummy variable are like this (each
dummy got one intercept):

Random effects:
Groups Name Variance Std.Dev. Corr
TAZ (Intercept) 0.059160 0.24323
MH_D 0.215210 0.46391 -0.583
TAZ (Intercept) 0.212061 0.46050
APT_D 0.205028 0.45280 -0.992
TAZ (Intercept) 0.086223 0.29364
ResOth_D 0.305678 0.55288 0.665
TAZ (Intercept) 0.161892 0.40236
NonRes_D 0.537284 0.73300 -0.874
TAZ (Intercept) 0.088684 0.29780
Vacant_noimp_D 0.501495 0.70816 -0.570
TAZ (Intercept) 0.136630 0.36964
Vacant_imp_D 0.368722 0.60722 -0.850
Residual 0.382439 0.61842
Number of obs: 55762, groups: TAZ, 739

The estimated random effects by using factor are like this (one intercept
for all):

Random effects:
Groups Name Variance Std.Dev. Corr

TAZ (Intercept) 0.83894 0.91594

HousingType1MH_D 0.23214 0.48181 -0.375

HousingType1APT_D 0.28850 0.53712 -0.827 0.630

HousingType1ResOth_D 0.29392 0.54214 0.156 -0.251 -0.165

HousingType1NonRes_D 0.58169 0.76269 -0.572 0.155 0.656
-0.030
HousingType1Vacant_imp_D 0.45349 0.67342 -0.522 0.203 0.265
0.101 0.611
HousingType1Vacant_noimp_D 0.54146 0.73584 -0.286 0.251 0.265
0.390 0.313 0.475
Residual 0.38228 0.61829

Number of obs: 55762, groups: TAZ, 739

The fixed coefficients for each group are also slightly different. I'm
wondering which one makes more sense.


Thanks.

Harry


Harry


R still report error

On Wed, Aug 5, 2009 at 1:22 AM, ONKELINX, Thierry

ONKELINX, Thierry

unread,
Aug 6, 2009, 4:13:24 AM8/6/09
to Hongwei Dong, r-h...@r-project.org
Dear Harry,

You get different results because your model specification is
different!. The specificication of your first model seems wrong to me.
Having an intercept for each level is non-sense. You probably defined
the random effects as (MH_D |TAZ) + (APT_D|TAZ) + (ResOth_D|TAZ) +
(NonRes_D|TAZ) + (Vacant_noimp_D| TAZ ) + (Vacant_imp_D|TAZ).
You should either use
(1|TAZ) + (MH_D -1 |TAZ) + (APT_D - 1|TAZ) + (ResOth_D - 1|TAZ) +
(NonRes_D - 1|TAZ) + (Vacant_noimp_D - 1| TAZ ) + (Vacant_imp_D - 1|TAZ)

or
(MH_D + APT_D + ResOth_D + NonRes_D + Vacant_noimp_D +
Vacant_imp_D|TAZ)

The last model is equivalent with (HousingType|TAZ)

The difference between both models is the specication of the random
effects The first model assumes that the levels of Housingtype are
independent. The last model allows for correlation between those levels.

HTH,

Thierry

________________________________

Reply all
Reply to author
Forward
0 new messages