Micro average in ROC/AUC

Nathan George

unread,

Mar 29, 2021, 12:55:57 PM3/29/21

to Yellowbrick

I'm confused by the example in the documentation: https://www.scikit-yb.org/en/latest/api/classifier/rocauc.html#quick-method

It has the ROC curves for the two classes and the macro average, which make sense (they are all similar). However, the micro average has a much bigger AUC score, which makes no sense to me. Shouldn't the micro average be about the same as the macro average? I started working on some code to test this out, but haven't finished it (here it is in case it helps:)

from sklearn.metrics import auc, roc_auc_score, roc_curve

roc_auc_score(y_test, lr_model.predict_proba(x_test)[:, 1])

# predictions for class 0 and 1, and ground truth

preds_0 = [0.2, 0.3, 0.7, 0.2]

preds_1 = [0.8, 0.7, 0.3, 0.8]

targets = [0, 1, 0, 0]

roc_auc_score(targets, preds_1)

roc_auc_score(targets, preds_0)

# returns tpr, fpr, thresholds

# tpr = TP/(TP+FN)

# fpr = FP/(FP+TN)

p1_roc = roc_curve(targets, preds_1)

p2_roc = roc_curve(targets, preds_0)

all_thresholds = np.hstack((p1_roc[2], p2_roc[2]))

# sort greatest to least

all_thresholds.sort()

all_thresholds = all_thresholds[::-1]

micro_tpr = []

micro_fpr = []

for t in all_thresholds:

p0_preds = preds_0 < t

p1_preds = preds_1 >= t

micro_tpr.append()

macro_avg_tpr = np.vstack((p1_roc[0], p2_roc[0])).mean(axis=0)

Nathan George

unread,

Mar 29, 2021, 5:49:24 PM3/29/21

to Yellowbrick

I did some more work on this, and came to the conclusion the macro and micro averages for ROC/AUC don't make sense. A SO post talks about it here:

I also coded more, and I think the macro calculations are wrong in yellowbrick, since it is only taking the average of the TPR and not the FPR as well. Here is the code I came up with to try things out:

from yellowbrick.classifier.rocauc import roc_auc

from yellowbrick.datasets import load_credit

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import train_test_split

#Load the classification dataset

X, y = load_credit()

#Create the train and test data

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.98, stratify=y, random_state=42)

# Instantiate the visualizer with the classification model

model = LogisticRegression()

roc_auc(model, X_train, y_train, X_test=X_test, y_test=y_test, classes=['not_defaulted', 'defaulted'])

import numpy as np

import matplotlib.pyplot as plt

from sklearn.metrics import auc, roc_auc_score, roc_curve

from sklearn.metrics import confusion_matrix

from tqdm import tqdm

proba_predictions = model.predict_proba(X_test)

preds_0 = proba_predictions[:, 0]

preds_1 = proba_predictions[:, 1]

targets = y_test

# returns tpr, fpr, thresholds

p0_roc = roc_curve(1-targets, preds_0)

p1_roc = roc_curve(targets, preds_1)

all_thresholds = np.unique(np.hstack((p0_roc[2], p1_roc[2])))

# sort greatest to least

all_thresholds.sort()

all_thresholds = all_thresholds[::-1]

micro_tpr = []

micro_fpr = []

macro_tpr = []

macro_fpr = []

macro_tpr2 = []

macro_fpr2 = []

p0_tpr_list = []

p0_fpr_list = []

p1_tpr_list = []

p1_fpr_list = []

for t in tqdm(all_thresholds):

p0_preds = preds_0 < t

p1_preds = preds_1 >= t

tn0, fp0, fn0, tp0 = confusion_matrix(1-targets, 1-p0_preds).ravel()

tn1, fp1, fn1, tp1 = confusion_matrix(targets, p1_preds).ravel()

# tpr = TP/(TP+FN)

# fpr = FP/(FP+TN)

micro_tpr.append((tp0 + tp1) / (tp0 + tp1 + fn0 + fn1))

micro_fpr.append((fp0 + fp1) / (fp0 + fp1 + tn0 + tn1))

p0_tpr = tp0 / (tp0 + fn0)

p1_tpr = tp1 / (tp1 + fn1)

p0_fpr = fp0 / (fp0 + tn0)

p1_fpr = fp1 / (fp1 + tn1)

p0_tpr_list.append(p0_tpr)

p0_fpr_list.append(p0_fpr)

p1_tpr_list.append(p1_tpr)

p1_fpr_list.append(p1_fpr)

macro_tpr.append((p0_tpr + p1_tpr) / 2)

macro_fpr.append((p0_fpr + p1_fpr) / 2)

# this is actually the same as micro avg

macro_tp = (tp0 + tp1) / 2

macro_fn = (fn0 + fn1) / 2

macro_fp = (fp0 + fp1) / 2

macro_tn = (tn0 + tn1) / 2

macro_tpr2.append(macro_tp / (macro_tp + macro_fn))

macro_fpr2.append(macro_fp / (macro_fp + macro_tn))

# fpr on x, tpr on y

plt.plot(p0_roc[0], p0_roc[1], label='p0_roc')

plt.plot(p1_roc[0], p1_roc[1], label='p1_roc')

plt.plot(macro_fpr, macro_tpr, label='macro_avg')

# plt.plot(macro_fpr2, macro_tpr2, label='macro_avg2')

plt.plot(micro_fpr, micro_tpr, label='micro_avg')

plt.legend()

Nathan George

unread,

Mar 29, 2021, 5:50:51 PM3/29/21

to Yellowbrick

Oops, here is the SO post: https://datascience.stackexchange.com/questions/89180/can-micro-average-roc-auc-score-be-larger-than-class-roc-auc-scores

Benjamin Bengfort

unread,

Apr 1, 2021, 8:24:19 PM4/1/21

to Yellowbrick

Hi Nathan,

Thank you for the note and sorry about the confusion in the documentation. Yellowbrick does have a binary classification mode for ROC/AUC - it defaults to multiclass; try:

[snip]

roc_auc(model, X_train, y_train, X_test=X_test, y_test=y_test, classes=['not_defaulted', 'defaulted'], binary=True)

To see if that helps.

Best Regards,

Benjamin Bengfort

Reply all

Reply to author

Forward