My study includes predicting the probability of a high or low score (DV) due to built environment variables (IDV1, IDV2, IDV3, IDV4). Here, DV is categorical with '0' indicating low score and '1' indicating 'high score'. All IDVs are metric (ratio scale). Through your videos, I have learned that to conduct a bivariate analysis with DV and 1 IDV (IDV1-Land-use mix), logistic regression will be applied here. But, before that, we need to check the relationship between these two variables. So, I checked the correlation value and also conducted independent t-test. Both the results were showing a good correlation (Pearson correlation= 0.523, p-value=0.000; t= -7.319, p-value=0.000, equal variance assumed).
Next, bivariate logistic regression was conducted (enter method, not included any control variable). Results are as follows:
a) -2LL was initially 191.522, after 6th iteration, the value becomes 130.821
b) Omnibus test of model coefficients, the p-value is significant (p-value=0.000) (chi-square= 60.701)
c) Cox and Snell R square = 0.344 and Nagelkerke R =0.468
d) Hosmer and Lemeshow test, p-value =0.65
e) Classification table suggests overall prediction is 81.9 % correct
f) Variables in the equation result are as follows:
Variables in the Equation
B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)
Lower Upper
Step 1a
Land-use mix
22.958
3.932
34.088
1
.000
9348643578.839 4203275.002
20792628777285.350
Constant -17.996 3.040 35.052 1 .000 .000
a Variable(s) entered on step 1: Land-use mix.
g) correlation table is as follows:
Correlation Matrix
Constant Land-use mix
Step 1 Constant 1.000 -.997
Land-use mix -.997 1.000
This was a problem encountered in other cases as well. Such an extremely large value for the odds ratio is a problem here which I am not able to decipher.
Kindly let me know if I am doing something wrong here or my process is somewhere not appropriate.
Thanking you
Megha Tyagi
PhD student
Department of Architecture and Planning
IITR