The essential issue underlying "multicollinearity" in regression
is whether you have redundant (or nearly redundant) independent
variables, i.e. one (or more) are linearly predictable from the
others.
To investigate that issue, I would analyse the "design matrix"
of the regression. This consists of columns of 0s and 1s for
the categorical variables (your 3-level factor "regimen" would
need 2 columns, your binary variables only 1 each), and a column
for each continuous variable (in your case only 1 column); and
as many rows as there are cases.
You can then look at all columns together, or at a subset of
columns to investigate collinearity between a subset of your
variables.
The mathematical criterion for collinearity is that the rank
of the matrix (using the set of columns you are interested in)
is less than the number of columns of the matrix. This would
be reflected in the occurrence of "zero" values in the "singular
value" component of a Singular Value Decomposition of the matrix.
However, in practice exact collinearity may not occur, while
the variables may be sufficiently nearly "collinear" to cause
trouble in the regression. This can be assessed by looking
at the relative magnitudes of the singular values: If one or
more is small in magnitude compared with the others, then
you are probably facing that situation.
I would know how to set this up in a statistical package like R
(See
http://www.r-project.org ) or equivalently in S-Plus, and
also in matrix-oriented numerical software like MatLab or octave
(See
http://www.octave.org ), but am not familiar enough with
other statistical software to know how to do it (in say STATA
or SAS). However, if any package is halfway decent, it should
be possible.
I see that you have apparently carried out statistical significance
tests to check for collinearity. I would suggest that this is
not the right approach at all. The existence (or near-existence)
of collinearity is simply a numerical fact, and its importance
in regression lies in the fact that when it is present it makes
some of the the estimated coefficients in the regression to be
aliases of others -- i.e. the variation in the dependent variable
which can be explained by some of the variables can be equally
explained by others, leaving out the first, and the data will
offer you no information to choose between these equally well
supported possibilities.
By the same token, the effect of collinearity will be manifested
in the fact that the matrix of variances and covariances for
the estimated coefficients (and all decent regression software
will provide this output) is singular or nearly singular, which
can be examined in the same way as above (or, since it is now
a square matrix, by finding the eigenvalues of this matrix and
applying the same criterion to the eigenvalues). Many statistical
packages will in fact warn that there is singularity.
But, as for a "statistical test for collinearity", what can this
means? Does it mean that you have set up a Null Hypothesis of
collinearity and rejected it at say 5% P-value? And what happens
if the data give a non-significant result? Do you then infer that
you can't "reject collinearity"? This is not relevant to the
underlying issue!
Since you are doing a logistic regression, a related phenomenon
that you need to watch out for is what is called "perfect separation",
in which the set of covariate values associated with the cases
where Response=1, and the set of covariate values assiociated with
cases where Response=0, can be linearly separated in covariate
space, i.e. you can find coefficients a1, a2, ... such that the
value of
a1*X1 + a2*X2 + ...
for every "0" case is less than its value for every "1" case.
In statistical multivariate analysis terms, this amounts to
saying that a linear discriminant analysis of the independent
variables, grouped according to the 0/1 value of the dependent
variable, gives complete discrimination (i.e. there is a linear
discriminant which perfectly predicts which group a case belongs
to). The inmportance of this in logistic regression it that such
a linear combination allows perfect predicition of outcome
(according to the data): the maximum likelihood fit will have
a scale parameter with value 0, so that negative values of the
linear function will predict P=0 for those cases with Response=0,
and positive values will predict P=1 for those cases with
Response=1. Since real life doesn't behave like that, if it
happens you know that your results are unrealistic! Again,
decent software will warn when this is happening.
Hoping this helps,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <
Ted.H...@nessie.mcc.ac.uk>
Fax-to-email:
+44 (0)870 094 0861
Date: 04-Aug-05 Time: 12:44:37
------------------------------ XFMail ------------------------------