Which class is the `predict` for a binary logistic regression model showing?

42 views
Skip to first unread message

Jordan Howell

unread,
Feb 11, 2020, 10:28:57 AM2/11/20
to pystatsmodels
Hello,

I have a logistic regression model for a binary classifier (0, 1).  I get the single probability back when I predict my test set.  Is that the probability for a 0 or for a 1?  The model is an sm.logit(y,x) 

josef...@gmail.com

unread,
Feb 11, 2020, 10:35:57 AM2/11/20
to pystatsmodels
On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan....@gmail.com> wrote:
Hello,

I have a logistic regression model for a binary classifier (0, 1).  I get the single probability back when I predict my test set.  Is that the probability for a 0 or for a 1?  The model is an sm.logit(y,x) 

for endog=1 

predict(x) = P(y = 1 | x)

 

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/f23355bf-f7e7-4d2a-9fd1-4fbf706907e8%40googlegroups.com.

josef...@gmail.com

unread,
Feb 11, 2020, 10:38:48 AM2/11/20
to pystatsmodels
On Tue, Feb 11, 2020 at 10:35 AM <josef...@gmail.com> wrote:


On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan....@gmail.com> wrote:
Hello,

I have a logistic regression model for a binary classifier (0, 1).  I get the single probability back when I predict my test set.  Is that the probability for a 0 or for a 1?  The model is an sm.logit(y,x) 

for endog=1 

predict(x) = P(y = 1 | x)

in general

  predict(x) = E(y | x)  

  E(y  | x)   =  1 *  P(y = 1 | x)  + 0 *  P(y = 0 | x)

Josef

Jordan Howell

unread,
Feb 11, 2020, 11:44:26 AM2/11/20
to pystatsmodels
Thanks.  I seem to be getting the opposite if the predictions are the probability of endog = 1 which concerns me with the model.


On Tuesday, February 11, 2020 at 10:35:57 AM UTC-5, josefpktd wrote:


On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan...@gmail.com> wrote:
Hello,

I have a logistic regression model for a binary classifier (0, 1).  I get the single probability back when I predict my test set.  Is that the probability for a 0 or for a 1?  The model is an sm.logit(y,x) 

for endog=1 

predict(x) = P(y = 1 | x)

 

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystat...@googlegroups.com.

josef...@gmail.com

unread,
Feb 11, 2020, 5:00:26 PM2/11/20
to pystatsmodels
On Tue, Feb 11, 2020 at 11:44 AM Jordan Howell <jordan....@gmail.com> wrote:
Thanks.  I seem to be getting the opposite if the predictions are the probability of endog = 1 which concerns me with the model.


Are you using GLM or Logit?

Check `model.endog` to see how the endog is encoded if you are using formulas.

patsy produces the "reversed" encoding for the way endog is interpreted in GLM.

one way to check this is to create a numeric 0, 1 endog yourself and compare it with the model.endog that you currently have.

Josef

 

 

On Tuesday, February 11, 2020 at 10:35:57 AM UTC-5, josefpktd wrote:


On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan...@gmail.com> wrote:
Hello,

I have a logistic regression model for a binary classifier (0, 1).  I get the single probability back when I predict my test set.  Is that the probability for a 0 or for a 1?  The model is an sm.logit(y,x) 

for endog=1 

predict(x) = P(y = 1 | x)

 

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/f23355bf-f7e7-4d2a-9fd1-4fbf706907e8%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/d853ea1d-6ac5-4bcd-9f8b-2a3badf01978%40googlegroups.com.

Jordan Howell

unread,
Feb 11, 2020, 5:26:31 PM2/11/20
to pystat...@googlegroups.com
Legit

Jordan

On Feb 11, 2020, at 5:00 PM, josef...@gmail.com wrote:



Jordan Howell

unread,
Feb 13, 2020, 11:53:45 AM2/13/20
to pystatsmodels
Thanks.  

I'm turning the data set into an 'x' and  'y' matrix using patsy.dmatrices and the 'y' is matching my target variable.  

On Tuesday, February 11, 2020 at 5:00:26 PM UTC-5, josefpktd wrote:


On Tue, Feb 11, 2020 at 11:44 AM Jordan Howell <jordan...@gmail.com> wrote:
Thanks.  I seem to be getting the opposite if the predictions are the probability of endog = 1 which concerns me with the model.


Are you using GLM or Logit?

Check `model.endog` to see how the endog is encoded if you are using formulas.

patsy produces the "reversed" encoding for the way endog is interpreted in GLM.

one way to check this is to create a numeric 0, 1 endog yourself and compare it with the model.endog that you currently have.

Josef

 

 

On Tuesday, February 11, 2020 at 10:35:57 AM UTC-5, josefpktd wrote:


On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan...@gmail.com> wrote:
Hello,

I have a logistic regression model for a binary classifier (0, 1).  I get the single probability back when I predict my test set.  Is that the probability for a 0 or for a 1?  The model is an sm.logit(y,x) 

for endog=1 

predict(x) = P(y = 1 | x)

 

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/f23355bf-f7e7-4d2a-9fd1-4fbf706907e8%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystat...@googlegroups.com.

Jordan Howell

unread,
Feb 13, 2020, 12:19:19 PM2/13/20
to pystatsmodels
Excuse the last post.  After doing `model.endog`, I get a 1d array which matches my 0s and 1s.  I still have a sinking feeling the model is predicting the first class, in this case 0.  I'm just not sure how to double check.  

josef...@gmail.com

unread,
Feb 13, 2020, 12:26:05 PM2/13/20
to pystatsmodels
On Thu, Feb 13, 2020 at 12:19 PM Jordan Howell <jordan....@gmail.com> wrote:
Excuse the last post.  After doing `model.endog`, I get a 1d array which matches my 0s and 1s.  I still have a sinking feeling the model is predicting the first class, in this case 0.  I'm just not sure how to double check.  

Can you  make an example, so I can check?

you can also check the predictions itself, e.g. a confusion table with a fixed threshold, which should show what is predicted, unless the data is uninformative.
AFAIR, we have it only for discrete binary models. 



 
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/796bf47d-e6a7-4858-a1a8-6addbf29ad64%40googlegroups.com.

Jordan Howell

unread,
Feb 17, 2020, 9:21:50 AM2/17/20
to pystatsmodels
So I changed the classes around (0 for 1 and 1 for 0) and the results were reversed so I feel like patsy or statsmodels or something is predicting the first class `0` in this case.  

On Thursday, February 13, 2020 at 12:26:05 PM UTC-5, josefpktd wrote:


josef...@gmail.com

unread,
Feb 17, 2020, 9:27:12 AM2/17/20
to pystatsmodels
On Mon, Feb 17, 2020 at 9:22 AM Jordan Howell <jordan....@gmail.com> wrote:
So I changed the classes around (0 for 1 and 1 for 0) and the results were reversed so I feel like patsy or statsmodels or something is predicting the first class `0` in this case.  


You didn't provide any information about what you are actually doing.

Josef

 
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/dc258ca0-beb5-4356-860d-099babeb2866%40googlegroups.com.

Jordan Howell

unread,
Feb 17, 2020, 9:47:03 AM2/17/20
to pystatsmodels
My apologies.  I'm trying to predict a binary outcome via logistic regression.

Code below:
```
def run_model(formula, df, model_set):
    '''
    formula = formula in GLM
    df = dataframe to be used
    model_set = name of df
    '''
    print(f"Turning data into matrix...")
    y, x = patsy.dmatrices(formula, df, return_type = 'dataframe')
    #model_set_dict.update({model_set+'_x': x, model_set+'_y': y})
    print(f"Developing model...")
    model = sm.Logit(y, x)
    print(f"Running Model on {model_set} model set...")
    model_results = model.fit()
    return model_results, x, y

formula2 = 'target_flag ~ current_age_buckets + marital_status_model_S \
+ SEX_F + terr_cat + state_codes + state_codes:terr_cat + eff_year +  RATING_CLASS_CODE_A   +  RATING_CLASS_CODE_AP \
+  RATING_CLASS_CODE_AU +  RATING_CLASS_CODE_B  +  RATING_CLASS_CODE_C +   RATING_CLASS_CODE_D  \
+  RATING_CLASS_CODE_E +  RATING_CLASS_CODE_F +  RATING_CLASS_CODE_G  +  RATING_CLASS_CODE_GV +  RATING_CLASS_CODE_H \
+  RATING_CLASS_CODE_HD  +  RATING_CLASS_CODE_HE +  RATING_CLASS_CODE_HF +  RATING_CLASS_CODE_HG  +  RATING_CLASS_CODE_I \
+  RATING_CLASS_CODE_J +  RATING_CLASS_CODE_K  +  RATING_CLASS_CODE_L +  RATING_CLASS_CODE_LA   \
+  RATING_CLASS_CODE_M  +  RATING_CLASS_CODE_N  +  RATING_CLASS_CODE_NE +  RATING_CLASS_CODE_O \
+  RATING_CLASS_CODE_P  +  RATING_CLASS_CODE_Q  +  RATING_CLASS_CODE_R \
+  RATING_CLASS_CODE_RT +  RATING_CLASS_CODE_S  +  RATING_CLASS_CODE_T    \
+  RATING_CLASS_CODE_UP +  RATING_CLASS_CODE_V +  RATING_CLASS_CODE_VN  +  RATING_CLASS_CODE_W +  RATING_CLASS_CODE_X \
+  RATING_CLASS_CODE_Y  +  RATING_CLASS_CODE_Z +  risk_codes - 1'


second_model = run_model(formula2, train_df, 'train_df')
```



On Monday, February 17, 2020 at 9:27:12 AM UTC-5, josefpktd wrote:


josef...@gmail.com

unread,
Feb 17, 2020, 10:07:17 AM2/17/20
to pystatsmodels
If `target_flag` is categorical, e.g. string, then this should raise an exception in Logit, because patsy creates a 2-dim,, 2 column endog which is not allowed in Logit.
GLM Binomial interprets a 2 column endog as (success, failure) counts. In that case "0" or the the first level of the categorical variable is treated as success.

So I don't see where you would get a reversed endog coding in this case.

Josef



 
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/bdf2e804-b672-4248-a880-ab80900d1349%40googlegroups.com.

Jordan Howell

unread,
Feb 17, 2020, 10:15:24 AM2/17/20
to pystatsmodels
The target flag is a float (0 or 1). After I run the model with:

second_model = run_model(formula2, train_df, 'train_df')

I do the following:

second_model[0].model.endog

I get a 1-D array like the following:

array([0., 1., 1., ..., 1., 0., 0.])

Again, this is the model:

model = sm.Logit(y, x)


On Monday, February 17, 2020 at 10:07:17 AM UTC-5, josefpktd wrote:


josef...@gmail.com

unread,
Feb 17, 2020, 10:23:37 AM2/17/20
to pystatsmodels
On Mon, Feb 17, 2020 at 10:15 AM Jordan Howell <jordan....@gmail.com> wrote:
The target flag is a float (0 or 1). After I run the model with:

It target_flag is numeric, float or int, then neither patsy nor statsmodels should change it.
Your `second_model[0].model.endog` should be identical to your `target_flag` in this case.
You can verify that, if they are not the same, then something strange is going on.

Josef
 
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/96182f32-5bd4-4001-a0a9-845c72b3ba29%40googlegroups.com.

Jordan Howell

unread,
Feb 17, 2020, 10:29:49 AM2/17/20
to pystatsmodels
Yes.  endog is identical to the target flag.  When I run the model, I get one probability number back.  I want to ensure that number or probability is for class '0' or class '1'.  

I did just change the target flag to categorical and that did return a 2D array.  When I ran the `.fit()` on the model, I get the follow error:

ValueError: operands could not be broadcast together with shapes (647274,2) (647274,) 

Not sure why it's erroring out.  

On Monday, February 17, 2020 at 10:23:37 AM UTC-5, josefpktd wrote:


Reply all
Reply to author
Forward
0 new messages