"Warnings: There are 73 (50.0%) cells (i.e., dependent variable levels
by subpopulations) with zero frequencies."
All data are included in the model but in the Case Processing Summary
it footnotes that "The DV has only 1 value observed in 73
subpopulations."
Suggestions,
WMB
You could try duplicating a couple of records to be sure,
but it seems likely to me that the message means this: that
each of the 73 observations has its own 'subpopulation'
as described or defined by the 5 continuous covariates.
Logistic regression is defined that way, where the first
approximation to the solution makes use of the 'groups'
that have been defined. Then it does something else.
That's another facet that makes me nervous about
logistic regression -- I don't know exactly what it does.
--
Rich Ulrich, wpi...@pitt.edu
>
> Suggestions,
> WMB
I have seen that warning too. I have never taken it too seriously where
continuous predictors are concerned--and I've wondered why SPSS includes
continuous variables when figuring out the proportion of empty cells. But
if the crosstabulation of a bunch of categorical variables is too sparse,
then I do worry about building a model on a bunch of empty cells.
--
Bruce Weaver
E-mail: wea...@mcmaster.ca
Homepage: http://www.angelfire.com/wv/bwhomedir/
If you get that warning, I think it means that you can't
use -- or trust, anyway -- that goodness of fit test that
was, 15 years ago, the preferred Likelihood chi-squared.
So you use the other one.
Do the programs warn you about what test to use, when?
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html
I haven't been using NOMREG lately, so I had to go back and dig out a file
to check. Here's the exact wording of the warning from a model I ran:
"There are 17 (14.9%) cells (i.e., dependent variable levels by
subpopulations) with zero frequencies."
The model had a binary outcome, 2 categorical (binary) predictors, and 1
continuous predictor. The likelihood ratio test for the model:
chi-square = 40.518, df = 3, p < .001. There is no advice given
concerning goodness of fit tests.
If I drop the continuous predictor, there is no warning (i.e., no empty
cells), and the likelihood ratio test is: chi-square = 25.172, df=2, p <
0.001.
As I said before, I have never taken this warning too seriously if it is
due mainly to continous variables in the model. Including continous
variables in the crosstabulation of all variables to determine percentage
of empty cells doesn't make a lot of sense to me. After all, if continuous
variables are measured with sufficient precision, you could easily have a
situation where no 2 cases have exactly the same value--and that would
result in a huge percentage of empty cells.
So, is the presence of empty "cells" due to continuous variables REALLY a
problem for multinomial logistic regression? I'm curious to hear what
others have to say about this. Brendan Halpin--if you're reading this,
please jump in!
Cheers,
> So, is the presence of empty "cells" due to continuous variables REALLY a
> problem for multinomial logistic regression? I'm curious to hear what
> others have to say about this. Brendan Halpin--if you're reading this,
> please jump in!
I think Rich Ulrich has already given the core of the answer: when
the warning is triggered, the deviance (-2*Likelihood Ratio) can no
longer be assumed to have a chi-sq distribution, so it can no
longer be used as an overall goodness of fit indicator.
More strictly speaking, if the number of "settings" (combinations
of values of explanatory variables) is large (or dependent on the
sample, as it will be if continuous variables are present) you
can't use the -2LR test for goodness of fit. If so, the
Hosmer-Lemeshow statistic is an apparently effective fudge - as I
recall it splits the sample into deciles according to predicted
probability, and makes a calculation based on residuals in these
groups -- details should be available in SPSS, H&S's own book, and
Agresti's _Intro to Categ Data Analysis_, none of which I have to
hand ATM.
Logistic regression with grouped data has a fixed number of
settings (N-cells in the implied crosstabulation), so as long as
there are few cells with low expected values, the asymptotics are
satisfied.
In any case, the parameter estimates and their SEs, and a chi-sq
test on nested pairs of models (X2 = Dev1 - Dev2, df = df1 - df2;
H0 that none of the added variables improve the model) are not
affected by this problem.
Brendan
--
Brendan Halpin, Department of Sociology, University of Limerick, Ireland
Tel: w +353-61-213147 f +353-61-202569 h +353-61-390476; Room F2-025 x 3147
<mailto:brendan...@ul.ie> <http://wivenhoe.staff8.ul.ie/~brendan>
> Bruce Weaver <wea...@mcmail.cis.mcmaster.ca> writes:
>
> > So, is the presence of empty "cells" due to continuous variables REALLY a
> > problem for multinomial logistic regression? I'm curious to hear what
> > others have to say about this. Brendan Halpin--if you're reading this,
> > please jump in!
>
> I think Rich Ulrich has already given the core of the answer: when
> the warning is triggered, the deviance (-2*Likelihood Ratio) can no
> longer be assumed to have a chi-sq distribution, so it can no
> longer be used as an overall goodness of fit indicator.
Okay, I think I'm with you now. I thought we were talking about the
chi-square test for the change in -2LL that is used to compare nested
models (under "Model Fitting Information" and "Likelihood Ratio Tests" in
the SPSS output). But I see now (at least I think I do!) that you're
talking about the "Deviance" SPSS shows under "Goodness of Fit". A
"Pearson" chi-square is shown in the same box.
> More strictly speaking, if the number of "settings" (combinations
> of values of explanatory variables) is large (or dependent on the
> sample, as it will be if continuous variables are present) you
> can't use the -2LR test for goodness of fit. If so, the
> Hosmer-Lemeshow statistic is an apparently effective fudge - as I
> recall it splits the sample into deciles according to predicted
> probability, and makes a calculation based on residuals in these
> groups -- details should be available in SPSS, H&S's own book, and
> Agresti's _Intro to Categ Data Analysis_, none of which I have to
> hand ATM.
>
> Logistic regression with grouped data has a fixed number of
> settings (N-cells in the implied crosstabulation), so as long as
> there are few cells with low expected values, the asymptotics are
> satisfied.
>
> In any case, the parameter estimates and their SEs, and a chi-sq
> test on nested pairs of models (X2 = Dev1 - Dev2, df = df1 - df2;
> H0 that none of the added variables improve the model) are not
> affected by this problem.
As I said above, I thought the earlier posts were saying there was a
problem with ~these~ likelihood ratio chi-square tests. Thanks for
clarifying, Brendan.