clogit command in stata

1,334 views
Skip to first unread message

Ashish Awasthi

unread,
Apr 16, 2013, 1:40:52 AM4/16/13
to meds...@googlegroups.com
Dear All
I have confusion over a basic problem
I am running conditional logistic regression in a a large data set having 33,000 controls and 1521 cases, when I run clogit model in stata as 

stepwise, pe(.05): clogit im b4 v025 v106 v190 v212 v426 v445 v457 v701 v705 v714 v717 , group(bord) or
 it returns error as:

9163 (group size) take 8906 (# positives) combinations results in numeric overflow; computations
cannot proceed
r(1400);
I have tried different combinations of variable but every time the same error occur 
Is there any way to tackle this error??
Thanks


Ashish Awasthi


Neil Shephard

unread,
Apr 16, 2013, 4:02:35 AM4/16/13
to meds...@googlegroups.com
You'll likely get better support for Stata questions on Statalist (see
http://www.stata.com/statalist/ and the FAQ at
http://www.stata.com/support/faqs/resources/statalist-faq/).

In this instance I suspect you may have to contact Stata Tech support
, as it sounds as though you are encountering memory overflows.

You could try running your command in a small subset of your data, if
that runs then its almost definitely memory issues.

Tech support will want to know details of the version in terms of
release (v11/v12) and type of Stata you have (Intercooled/SE/MP)
whether its uptodate (check with -update query- and do as advised if
its not uptodate then try again). They may wish to know your hardware
specification too (and the OS you are running).

Good luck,

Neil


--
"To kill an error is as good a service as, and sometimes even better
than, the establishing of a new truth or fact" - Charles Darwin

Neil Shephard
Clinical Trials Research Unit
University of Sheffield

roland andersson

unread,
Apr 16, 2013, 6:44:40 AM4/16/13
to medstats
Have you tried running a sompler model without stepwise? I am wondering about the number of groups (9163) whereas you only have 1521 cases. Is there an error in the variable "bord"? 

Roland 


2013/4/16 Ashish Awasthi <ashish...@gmail.com>

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules
 
---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Ashish Awasthi

unread,
Apr 16, 2013, 7:30:45 AM4/16/13
to meds...@googlegroups.com
Yes I have tried clogit without stepwise and it is also giving same error

clogit im b4 v025 v106 v190 v212 v426 v445 v457 v701 v705 v714 v717 , group(bord) or
note: multiple positive outcomes within groups encountered.
9163 (group size) take 8906 (# positives) combinations results in numeric overflow; computations
cannot proceed
r(1400);

bord is categories in three parts with frequency 10,901 , 9,376 and 12,765


--
    Regards
Ashish Awasthi
PhD Scholar
Dept of Biostatistics & Health Informatics
SGPGIMS, Lucknow-226014

Swank, Paul R

unread,
Apr 16, 2013, 11:51:59 AM4/16/13
to meds...@googlegroups.com

Have you tried with just 1 or 2 variables?

 

Dr. Paul R. Swank, Professor

Health Promotion and Behavioral Sciences

School of Public Health

University of Texas Health Science Center Houston

roland andersson

unread,
Apr 16, 2013, 12:16:08 PM4/16/13
to meds...@googlegroups.com

I am not sure (do not have the helpfiles at hand in the phone) but i assume that the group variable should identify the case-control associations, ie you should have the same number of groups as you have cases
Roland

från min telefon

Ashish Awasthi

unread,
Apr 16, 2013, 1:23:20 PM4/16/13
to meds...@googlegroups.com

Yes I have tried only one or two and different combination of variables also but still same error

Ashish Awasthi
(From Nokia Mobile)
-----Original Message-----
From: Swank, Paul R
Sent: 16/04/2013 9:21:59 pm
Subject: RE: {MEDSTATS} clogit command in stata

Have you tried with just 1 or 2 variables?

Dr. Paul R. Swank, Professor
Health Promotion and Behavioral Sciences
School of Public Health
University of Texas Health Science Center Houston

From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of Ashish Awasthi
Sent: Tuesday, April 16, 2013 6:31 AM
To: meds...@googlegroups.com
Subject: Re: {MEDSTATS} clogit command in stata

Yes I have tried clogit without stepwise and it is also giving same error

clogit im b4 v025 v106 v190 v212 v426 v445 v457 v701 v705 v714 v717 , group(bord) or
note: multiple positive outcomes within groups encountered.
9163 (group size) take 8906 (# positives) combinations results in numeric overflow; computations
cannot proceed
r(1400);

bord is categories in three parts with frequency 10,901 , 9,376 and 12,765



On Tue, Apr 16, 2013 at 4:14 PM, roland andersson <roland...@gmail.com<mailto:roland...@gmail.com>> wrote:
Have you tried running a sompler model without stepwise? I am wondering about the number of groups (9163) whereas you only have 1521 cases. Is there an error in the variable "bord"?

Roland

2013/4/16 Ashish Awasthi <ashish...@gmail.com<mailto:ashish...@gmail.com>>
Dear All
I have confusion over a basic problem
I am running conditional logistic regression in a a large data set having 33,000 controls and 1521 cases, when I run clogit model in stata as

stepwise, pe(.05): clogit im b4 v025 v106 v190 v212 v426 v445 v457 v701 v705 v714 v717 , group(bord) or
it returns error as:

9163 (group size) take 8906 (# positives) combinations results in numeric overflow; computations
cannot proceed
r(1400);
I have tried different combinations of variable but every time the same error occur
Is there any way to tackle this error??
Thanks


Ashish Awasthi

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com<mailto:MedS...@googlegroups.com> .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com<mailto:medstats%2Bunsu...@googlegroups.com>.
For more options, visit https://groups.google.com/groups/opt_out.



--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com<mailto:MedS...@googlegroups.com> .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com<mailto:medstats%2Bunsu...@googlegroups.com>.
For more options, visit https://groups.google.com/groups/opt_out.





--
Regards
Ashish Awasthi
PhD Scholar
Dept of Biostatistics & Health Informatics
SGPGIMS, Lucknow-226014
(M) 9208604604
--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com<mailto:MedS...@googlegroups.com> .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com<mailto:medstats+u...@googlegroups.com>.

Stas Kolenikov

unread,
Apr 16, 2013, 1:49:15 PM4/16/13
to meds...@googlegroups.com
Stata complains that your group sizes are very large. The conditional
logit model is supposed to be working with many short panels, as is
typically the case in longitudinal studies, where you may have single
or low double digits per each level of group. Estimation conditions on
the number of positive outcomes within a panel, and estimates the
coefficients by utilizing the differences between groups that have the
same number of positive outcomes, essentially. So (1) with just three
groups, it cannot estimate much, as there is not enough variability
between groups for the model to utilize to identify coefficients.
Also, the asymptotics is in the number of groups, so (2) normality is
at least dubious with essentially three observations. Finally, (3)
there are some combinatoric calculations within a group (all possible
combinations for a given number of cases and controls), which is
feasible with 20-some observations per group, but totally blows up
with several thousands, as the sum that you have to compute has
something like 9163 choose 8906 = 10^543 terms. I don't think you will
have the patience to wait until this is finished...

What is the bord variable that you used to group your observations, in
the context of your study?

-- Stas Kolenikov, PhD, PStat (SSC)
-- Senior Survey Statistician, Abt SRBI
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer
-- http://stas.kolenikov.name

Ashish Awasthi

unread,
Apr 16, 2013, 2:53:42 PM4/16/13
to meds...@googlegroups.com
Dear Mr.Kolenikov
Your input helps me a lot to troubleshoot my problem. Now I am able to run the clogit command  using "if" condition in dataset, but I have to analyse entire data set in a stance, so should I increase max memory more than 2048 M bytes or more in stata?
bord variable is birth order of a child 

Thanks a lot for your help

roland andersson

unread,
Apr 16, 2013, 3:10:30 PM4/16/13
to medstats
As I understand the group-variable it is an identifier that identifies the controls associated with the cases, The matched controls should have the same identification number as the case. If you have 1521 cases you should have 1521 groups. 

Roland


BXC (Bendix Carstensen)

unread,
Apr 16, 2013, 4:41:45 PM4/16/13
to meds...@googlegroups.com

Not necessarily, you can have a conditional logistic regression with more than one case per stratum. But THAT really pushes up the complexity of the computations.

b.r.

Bendix Carstensen

 

From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of roland andersson
Sent: 16. april 2013 21:11
To: medstats
Subject: Re: {MEDSTATS} clogit command in stata

 

As I understand the group-variable it is an identifier that identifies the controls associated with the cases, The matched controls should have the same identification number as the case. If you have 1521 cases you should have 1521 groups. 

 

Roland

 

 

Neil Shephard

unread,
Apr 17, 2013, 3:40:15 AM4/17/13
to meds...@googlegroups.com
On 16 April 2013 19:53, Ashish Awasthi <ashish...@gmail.com> wrote:
> should I increase max memory more than 2048
> M bytes or more in stata?

If you're using a Stata12 then you do not need to -set memory 2048M-
as Stata now handles memory on the fly, see
http://www.stata.com/stata12/automatic-memory-management/

Ashish Awasthi

unread,
Apr 17, 2013, 5:39:19 AM4/17/13
to meds...@googlegroups.com
I dont know whats the cause of my problem, I can do same analysis using same variables for 3000 cases but when I use more than 3000 it does not work


--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





--
    Regards
Ashish Awasthi
PhD Scholar
Dept of Biostatistics & Health Informatics
SGPGIMS, Lucknow-226014

Neil Shephard

unread,
Apr 17, 2013, 5:46:54 AM4/17/13
to meds...@googlegroups.com
On 17 April 2013 10:39, Ashish Awasthi <ashish...@gmail.com> wrote:
> I dont know whats the cause of my problem, I can do same analysis using same
> variables for 3000 cases but when I use more than 3000 it does not work

In that case, as per my original suggestion, try contacting Stata Tech
Support because the message you first posted suggests to me a memory
overflow. This can not be addressed by making more memory available
to Stata because it is often an error or bug in the underlying code (I
think Stata is probably written in C or C++ or such like).

roland andersson

unread,
Apr 17, 2013, 7:35:18 AM4/17/13
to medstats
Ashish
Have you posed the question on statalist? I think you will get better help there. Or contact Stata technical support. 

Roland


2013/4/17 Ashish Awasthi <ashish...@gmail.com>

Ashish Awasthi

unread,
Apr 17, 2013, 7:51:30 AM4/17/13
to meds...@googlegroups.com

Yes I have mailed them and waiting for reply
Ashish Awasthi
(From Nokia Mobile)
-----Original Message-----
From: roland andersson
Sent: 17/04/2013 5:05:18 pm
Subject: Re: {MEDSTATS} clogit command in stata

Steve Simon, P.Mean Consulting

unread,
Apr 17, 2013, 8:32:00 AM4/17/13
to meds...@googlegroups.com, Ashish Awasthi
My first thought was the same as many of the others, that conditional
logistic regression (CLR) is only used for a matched case control and
not for a stratified design like yours. But I went to Google and found a
nice resource that explained that it is indeed used for both.

--> http://www.cceb.upenn.edu/pages/localio/EPI521/2006/v2part4.pdf

It is a way to avoid having to model separate intercepts for each
matched pair or for each strata. This is especially critical when your
sample size is small and you have lots of matched pairs/lots of strata
levels. That's not the case for your example, apparently. Furthermore,
the article notes that CLR cannot handle large strata well and suggests
estimating an intercept for each strata in an ordinary plain vanilla
logistic regression model.

There's a lesson to be learned here. If your model fails to converge,
consider the possibility that you are using the wrong model for your
data. I know that I have the temptation to try to use brute force and
fit the model no matter what the computer tells me. Increase the
available memory! Upgrade to a new version! Find a different computer!
Try a different piece of software!

When your program complains about your model, maybe you need to sidestep
the issue by fitting a different but comparable model.

Steve Simon, n...@pmean.com, Standard Disclaimer.
Sign up for the Monthly Mean, the newsletter that
dares to call itself average at www.pmean.com/news
Reply all
Reply to author
Forward
0 new messages