Micro average in ROC/AUC

55 views
Skip to first unread message

Nathan George

unread,
Mar 29, 2021, 12:55:57 PM3/29/21
to Yellowbrick
I'm confused by the example in the documentation: https://www.scikit-yb.org/en/latest/api/classifier/rocauc.html#quick-method

It has the ROC curves for the two classes and the macro average, which make sense (they are all similar). However, the micro average has a much bigger AUC score, which makes no sense to me. Shouldn't the micro average be about the same as the macro average?  I started working on some code to test this out, but haven't finished it (here it is in case it helps:)


from sklearn.metrics import auc, roc_auc_score, roc_curve

roc_auc_score(y_test, lr_model.predict_proba(x_test)[:, 1])

# predictions for class 0 and 1, and ground truth
preds_0 = [0.2, 0.3, 0.7, 0.2]
preds_1 = [0.8, 0.7, 0.3, 0.8]
targets = [0, 1, 0, 0]

roc_auc_score(targets, preds_1)

roc_auc_score(targets, preds_0)

# returns tpr, fpr, thresholds
# tpr = TP/(TP+FN)
# fpr = FP/(FP+TN)
p1_roc = roc_curve(targets, preds_1)

p2_roc = roc_curve(targets, preds_0)
all_thresholds = np.hstack((p1_roc[2], p2_roc[2]))
# sort greatest to least
all_thresholds.sort()
all_thresholds = all_thresholds[::-1]

micro_tpr = []
micro_fpr = []
for t in all_thresholds:
    p0_preds = preds_0 < t
    p1_preds = preds_1 >= t
    micro_tpr.append()
    
macro_avg_tpr = np.vstack((p1_roc[0], p2_roc[0])).mean(axis=0)

Nathan George

unread,
Mar 29, 2021, 5:49:24 PM3/29/21
to Yellowbrick
I did some more work on this, and came to the conclusion the macro and micro averages for ROC/AUC don't make sense. A SO post talks about it here:

I also coded more, and I think the macro calculations are wrong in yellowbrick, since it is only taking the average of the TPR and not the FPR as well.  Here is the code I came up with to try things out:

from yellowbrick.classifier.rocauc import roc_auc
from yellowbrick.datasets import load_credit
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

#Load the classification dataset
X, y = load_credit()

#Create the train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.98, stratify=y, random_state=42)

# Instantiate the visualizer with the classification model
model = LogisticRegression()
roc_auc(model, X_train, y_train, X_test=X_test, y_test=y_test, classes=['not_defaulted', 'defaulted'])

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import auc, roc_auc_score, roc_curve
from sklearn.metrics import confusion_matrix
from tqdm import tqdm

proba_predictions = model.predict_proba(X_test)
preds_0 = proba_predictions[:, 0]
preds_1 = proba_predictions[:, 1]
targets = y_test

# returns tpr, fpr, thresholds
p0_roc = roc_curve(1-targets, preds_0)
p1_roc = roc_curve(targets, preds_1)

all_thresholds = np.unique(np.hstack((p0_roc[2], p1_roc[2])))
# sort greatest to least
all_thresholds.sort()
all_thresholds = all_thresholds[::-1]

micro_tpr = []
micro_fpr = []
macro_tpr = []
macro_fpr = []
macro_tpr2 = []
macro_fpr2 = []
p0_tpr_list = []
p0_fpr_list = []
p1_tpr_list = []
p1_fpr_list = []
for t in tqdm(all_thresholds):
    p0_preds = preds_0 < t
    p1_preds = preds_1 >= t
    tn0, fp0, fn0, tp0 = confusion_matrix(1-targets, 1-p0_preds).ravel()
    tn1, fp1, fn1, tp1 = confusion_matrix(targets, p1_preds).ravel()
    # tpr = TP/(TP+FN)
    # fpr = FP/(FP+TN)
    micro_tpr.append((tp0 + tp1) / (tp0 + tp1 + fn0 + fn1))
    micro_fpr.append((fp0 + fp1) / (fp0 + fp1 + tn0 + tn1))
    p0_tpr = tp0 / (tp0 + fn0)
    p1_tpr = tp1 / (tp1 + fn1)
    p0_fpr = fp0 / (fp0 + tn0)
    p1_fpr = fp1 / (fp1 + tn1)
    p0_tpr_list.append(p0_tpr)
    p0_fpr_list.append(p0_fpr)
    p1_tpr_list.append(p1_tpr)
    p1_fpr_list.append(p1_fpr)
    macro_tpr.append((p0_tpr + p1_tpr) / 2)
    macro_fpr.append((p0_fpr + p1_fpr) / 2)
    
    # this is actually the same as micro avg
    macro_tp = (tp0 + tp1) / 2
    macro_fn = (fn0 + fn1) / 2
    macro_fp = (fp0 + fp1) / 2
    macro_tn = (tn0 + tn1) / 2
    macro_tpr2.append(macro_tp / (macro_tp + macro_fn))
    macro_fpr2.append(macro_fp / (macro_fp + macro_tn))

# fpr on x, tpr on y
plt.plot(p0_roc[0], p0_roc[1], label='p0_roc')
plt.plot(p1_roc[0], p1_roc[1], label='p1_roc')
plt.plot(macro_fpr, macro_tpr, label='macro_avg')
# plt.plot(macro_fpr2, macro_tpr2, label='macro_avg2')
plt.plot(micro_fpr, micro_tpr, label='micro_avg')
plt.legend()

Nathan George

unread,
Mar 29, 2021, 5:50:51 PM3/29/21
to Yellowbrick

Benjamin Bengfort

unread,
Apr 1, 2021, 8:24:19 PM4/1/21
to Yellowbrick
Hi Nathan, 

Thank you for the note and sorry about the confusion in the documentation. Yellowbrick does have a binary classification mode for ROC/AUC - it defaults to multiclass; try:

[snip]
roc_auc(model, X_train, y_train, X_test=X_test, y_test=y_test, classes=['not_defaulted', 'defaulted'], binary=True)

To see if that helps.

Best Regards,
Benjamin Bengfort
Reply all
Reply to author
Forward
0 new messages