Which class is the `predict` for a binary logistic regression model showing?

Jordan Howell

unread,

Feb 11, 2020, 10:28:57 AM2/11/20

to pystatsmodels

Hello,

I have a logistic regression model for a binary classifier (0, 1). I get the single probability back when I predict my test set. Is that the probability for a 0 or for a 1? The model is an sm.logit(y,x)

josef...@gmail.com

unread,

Feb 11, 2020, 10:35:57 AM2/11/20

to pystatsmodels

On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan....@gmail.com> wrote:

Hello,

I have a logistic regression model for a binary classifier (0, 1). I get the single probability back when I predict my test set. Is that the probability for a 0 or for a 1? The model is an sm.logit(y,x)

for endog=1

predict(x) = P(y = 1 | x)

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/f23355bf-f7e7-4d2a-9fd1-4fbf706907e8%40googlegroups.com.

josef...@gmail.com

unread,

Feb 11, 2020, 10:38:48 AM2/11/20

to pystatsmodels

On Tue, Feb 11, 2020 at 10:35 AM <josef...@gmail.com> wrote:

On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan....@gmail.com> wrote:
Hello,

I have a logistic regression model for a binary classifier (0, 1). I get the single probability back when I predict my test set. Is that the probability for a 0 or for a 1? The model is an sm.logit(y,x)

for endog=1

predict(x) = P(y = 1 | x)

in general

predict(x) = E(y | x)

E(y | x) = 1 * P(y = 1 | x) + 0 * P(y = 0 | x)

Josef

Jordan Howell

unread,

Feb 11, 2020, 11:44:26 AM2/11/20

to pystatsmodels

Thanks. I seem to be getting the opposite if the predictions are the probability of endog = 1 which concerns me with the model.

On Tuesday, February 11, 2020 at 10:35:57 AM UTC-5, josefpktd wrote:

On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan...@gmail.com> wrote:
Hello,

I have a logistic regression model for a binary classifier (0, 1). I get the single probability back when I predict my test set. Is that the probability for a 0 or for a 1? The model is an sm.logit(y,x)

for endog=1

predict(x) = P(y = 1 | x)

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pystat...@googlegroups.com.

josef...@gmail.com

unread,

Feb 11, 2020, 5:00:26 PM2/11/20

to pystatsmodels

On Tue, Feb 11, 2020 at 11:44 AM Jordan Howell <jordan....@gmail.com> wrote:

Thanks. I seem to be getting the opposite if the predictions are the probability of endog = 1 which concerns me with the model.

Are you using GLM or Logit?

Check `model.endog` to see how the endog is encoded if you are using formulas.

https://github.com/statsmodels/statsmodels/issues/2181

patsy produces the "reversed" encoding for the way endog is interpreted in GLM.

one way to check this is to create a numeric 0, 1 endog yourself and compare it with the model.endog that you currently have.

Josef

On Tuesday, February 11, 2020 at 10:35:57 AM UTC-5, josefpktd wrote:

On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan...@gmail.com> wrote:
Hello,

I have a logistic regression model for a binary classifier (0, 1). I get the single probability back when I predict my test set. Is that the probability for a 0 or for a 1? The model is an sm.logit(y,x)

for endog=1

predict(x) = P(y = 1 | x)

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/f23355bf-f7e7-4d2a-9fd1-4fbf706907e8%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/d853ea1d-6ac5-4bcd-9f8b-2a3badf01978%40googlegroups.com.

Jordan Howell

unread,

Feb 11, 2020, 5:26:31 PM2/11/20

to pystat...@googlegroups.com

Legit

Jordan

On Feb 11, 2020, at 5:00 PM, josef...@gmail.com wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BD5s6hMgCSX5qKLuVJkopLo%2BdctRj5d69oqFHcjjyJFEA%40mail.gmail.com.

Jordan Howell

unread,

Feb 13, 2020, 11:53:45 AM2/13/20

to pystatsmodels

Thanks.

I'm turning the data set into an 'x' and 'y' matrix using patsy.dmatrices and the 'y' is matching my target variable.

On Tuesday, February 11, 2020 at 5:00:26 PM UTC-5, josefpktd wrote:

On Tue, Feb 11, 2020 at 11:44 AM Jordan Howell <jordan...@gmail.com> wrote:
Thanks. I seem to be getting the opposite if the predictions are the probability of endog = 1 which concerns me with the model.

Are you using GLM or Logit?

Check `model.endog` to see how the endog is encoded if you are using formulas.
https://github.com/statsmodels/statsmodels/issues/2181

patsy produces the "reversed" encoding for the way endog is interpreted in GLM.

one way to check this is to create a numeric 0, 1 endog yourself and compare it with the model.endog that you currently have.

Josef

On Tuesday, February 11, 2020 at 10:35:57 AM UTC-5, josefpktd wrote:

On Tue, Feb 11, 2020 at 10:29 AM Jordan Howell <jordan...@gmail.com> wrote:
Hello,

I have a logistic regression model for a binary classifier (0, 1). I get the single probability back when I predict my test set. Is that the probability for a 0 or for a 1? The model is an sm.logit(y,x)

for endog=1

predict(x) = P(y = 1 | x)

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/f23355bf-f7e7-4d2a-9fd1-4fbf706907e8%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystat...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/d853ea1d-6ac5-4bcd-9f8b-2a3badf01978%40googlegroups.com.

Jordan Howell

unread,

Feb 13, 2020, 12:19:19 PM2/13/20

to pystatsmodels

Excuse the last post. After doing `model.endog`, I get a 1d array which matches my 0s and 1s. I still have a sinking feeling the model is predicting the first class, in this case 0. I'm just not sure how to double check.

josef...@gmail.com

unread,

Feb 13, 2020, 12:26:05 PM2/13/20

to pystatsmodels

On Thu, Feb 13, 2020 at 12:19 PM Jordan Howell <jordan....@gmail.com> wrote:

Excuse the last post. After doing `model.endog`, I get a 1d array which matches my 0s and 1s. I still have a sinking feeling the model is predicting the first class, in this case 0. I'm just not sure how to double check.

Can you make an example, so I can check?

you can also check the predictions itself, e.g. a confusion table with a fixed threshold, which should show what is predicted, unless the data is uninformative.

AFAIR, we have it only for discrete binary models.

To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/796bf47d-e6a7-4858-a1a8-6addbf29ad64%40googlegroups.com.

Jordan Howell

unread,

Feb 17, 2020, 9:21:50 AM2/17/20

to pystatsmodels

So I changed the classes around (0 for 1 and 1 for 0) and the results were reversed so I feel like patsy or statsmodels or something is predicting the first class `0` in this case.

On Thursday, February 13, 2020 at 12:26:05 PM UTC-5, josefpktd wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/796bf47d-e6a7-4858-a1a8-6addbf29ad64%40googlegroups.com.

josef...@gmail.com

unread,

Feb 17, 2020, 9:27:12 AM2/17/20

to pystatsmodels

On Mon, Feb 17, 2020 at 9:22 AM Jordan Howell <jordan....@gmail.com> wrote:

So I changed the classes around (0 for 1 and 1 for 0) and the results were reversed so I feel like patsy or statsmodels or something is predicting the first class `0` in this case.

Is there anything beyond https://github.com/statsmodels/statsmodels/issues/2181 ?

You didn't provide any information about what you are actually doing.

Josef

To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/dc258ca0-beb5-4356-860d-099babeb2866%40googlegroups.com.

Jordan Howell

unread,

Feb 17, 2020, 9:47:03 AM2/17/20

to pystatsmodels

My apologies. I'm trying to predict a binary outcome via logistic regression.

Code below:

```

def run_model(formula, df, model_set):

'''

formula = formula in GLM

df = dataframe to be used

model_set = name of df

'''

print(f"Turning data into matrix...")

y, x = patsy.dmatrices(formula, df, return_type = 'dataframe')

#model_set_dict.update({model_set+'_x': x, model_set+'_y': y})

print(f"Developing model...")

model = sm.Logit(y, x)

print(f"Running Model on {model_set} model set...")

model_results = model.fit()

return model_results, x, y

formula2 = 'target_flag ~ current_age_buckets + marital_status_model_S \

+ SEX_F + terr_cat + state_codes + state_codes:terr_cat + eff_year + RATING_CLASS_CODE_A + RATING_CLASS_CODE_AP \

+ RATING_CLASS_CODE_AU + RATING_CLASS_CODE_B + RATING_CLASS_CODE_C + RATING_CLASS_CODE_D \

+ RATING_CLASS_CODE_E + RATING_CLASS_CODE_F + RATING_CLASS_CODE_G + RATING_CLASS_CODE_GV + RATING_CLASS_CODE_H \

+ RATING_CLASS_CODE_HD + RATING_CLASS_CODE_HE + RATING_CLASS_CODE_HF + RATING_CLASS_CODE_HG + RATING_CLASS_CODE_I \

+ RATING_CLASS_CODE_J + RATING_CLASS_CODE_K + RATING_CLASS_CODE_L + RATING_CLASS_CODE_LA \

+ RATING_CLASS_CODE_M + RATING_CLASS_CODE_N + RATING_CLASS_CODE_NE + RATING_CLASS_CODE_O \

+ RATING_CLASS_CODE_P + RATING_CLASS_CODE_Q + RATING_CLASS_CODE_R \

+ RATING_CLASS_CODE_RT + RATING_CLASS_CODE_S + RATING_CLASS_CODE_T \

+ RATING_CLASS_CODE_UP + RATING_CLASS_CODE_V + RATING_CLASS_CODE_VN + RATING_CLASS_CODE_W + RATING_CLASS_CODE_X \

+ RATING_CLASS_CODE_Y + RATING_CLASS_CODE_Z + risk_codes - 1'

second_model = run_model(formula2, train_df, 'train_df')

```

On Monday, February 17, 2020 at 9:27:12 AM UTC-5, josefpktd wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/dc258ca0-beb5-4356-860d-099babeb2866%40googlegroups.com.

josef...@gmail.com

unread,

Feb 17, 2020, 10:07:17 AM2/17/20

to pystatsmodels

If `target_flag` is categorical, e.g. string, then this should raise an exception in Logit, because patsy creates a 2-dim,, 2 column endog which is not allowed in Logit.

GLM Binomial interprets a 2 column endog as (success, failure) counts. In that case "0" or the the first level of the categorical variable is treated as success.

So I don't see where you would get a reversed endog coding in this case.

Josef

To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/bdf2e804-b672-4248-a880-ab80900d1349%40googlegroups.com.

Jordan Howell

unread,

Feb 17, 2020, 10:15:24 AM2/17/20

to pystatsmodels

The target flag is a float (0 or 1). After I run the model with:

second_model = run_model(formula2, train_df, 'train_df')

I do the following:

second_model[0].model.endog

I get a 1-D array like the following:

array([0., 1., 1., ..., 1., 0., 0.])

Again, this is the model:

model = sm.Logit(y, x)

On Monday, February 17, 2020 at 10:07:17 AM UTC-5, josefpktd wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/bdf2e804-b672-4248-a880-ab80900d1349%40googlegroups.com.

josef...@gmail.com

unread,

Feb 17, 2020, 10:23:37 AM2/17/20

to pystatsmodels

On Mon, Feb 17, 2020 at 10:15 AM Jordan Howell <jordan....@gmail.com> wrote:

The target flag is a float (0 or 1). After I run the model with:

It target_flag is numeric, float or int, then neither patsy nor statsmodels should change it.

Your `second_model[0].model.endog` should be identical to your `target_flag` in this case.

You can verify that, if they are not the same, then something strange is going on.

Josef

To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/96182f32-5bd4-4001-a0a9-845c72b3ba29%40googlegroups.com.

Jordan Howell

unread,

Feb 17, 2020, 10:29:49 AM2/17/20

to pystatsmodels

Yes. endog is identical to the target flag. When I run the model, I get one probability number back. I want to ensure that number or probability is for class '0' or class '1'.

I did just change the target flag to categorical and that did return a 2D array. When I ran the `.fit()` on the model, I get the follow error:

ValueError: operands could not be broadcast together with shapes (647274,2) (647274,)

Not sure why it's erroring out.

On Monday, February 17, 2020 at 10:23:37 AM UTC-5, josefpktd wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/96182f32-5bd4-4001-a0a9-845c72b3ba29%40googlegroups.com.

Reply all

Reply to author

Forward