auc on validation set during training does not match with sklearn auc

308 views
Skip to first unread message

bano...@gmail.com

unread,
Sep 8, 2018, 6:19:35 PM9/8/18
to Keras-users

Hi all, 


I am using my test set as a validation set. I used a similar approach as How to compute Receiving Operating Characteristic (ROC) and AUC in keras?

The issue is that my val_auc during the training is around 0.85, however, when I use

fpr, tpr, _ = roc_curve(test_label, test_prediction)
roc_auc = auc(fpr, tpr)

I got the auc of 0.60. I understand that they use different formulation and also streaming auc might be different than the one that sklearn calculate. however, the difference is very large and I can't figure out what causes this difference. Here us the code that I used. 


# define roc_callback, inspired by https://github.com/keras-team/keras/issues/6050#issuecomment-329996505
def auc_roc(y_true, y_pred):
    # any tensorflow metric
    value, update_op = tf.contrib.metrics.streaming_auc(y_pred, y_true)

    # find all variables created for this metric
    metric_vars = [i for i in tf.local_variables() if 'auc_roc' in i.name.split('/')[1]]

    # Add metric variables to GLOBAL_VARIABLES collection.
    # They will be initialized for new session.
    for v in metric_vars:
        tf.add_to_collection(tf.GraphKeys.GLOBAL_VARIABLES, v)

    # force to update metric values
    with tf.control_dependencies([update_op]):
        value = tf.identity(value)
        return value

clf = Sequential()

clf.add(LSTM(units = 128, input_shape = (windowlength, trainX.shape[2]), return_sequences = True))#, kernel_regularizer=regularizers.l2(0.01)))

clf.add(Dropout(0.2))

clf.add(LSTM(units = 64, return_sequences = False))#, kernel_regularizer=regularizers.l2(0.01)))

clf.add(Dropout(0.2))

clf.add(Dense(units = 128, activation = 'relu'))
clf.add(Dropout(0.2))

clf.add(Dense(units = 128, activation = 'relu'))

clf.add(Dense(units = 1, activation = 'sigmoid'))
clf.compile(loss='binary_crossentropy', optimizer = 'adam', metrics = ['acc', auc_roc])

my_callbacks = [EarlyStopping(monitor='auc_roc', patience=50, verbose=1, mode='max')]
clf.fit(trainX, trainY, batch_size = 1000, epochs = 80, class_weight = class_weights, validation_data = (testX, testY), 
        verbose = 2, callbacks=my_callbacks)
y_pred_pro = model.predict_proba(testX)
print (roc_auc_score(y_test, y_pred_pro))

Sergey O.

unread,
Sep 8, 2018, 11:25:20 PM9/8/18
to bano...@gmail.com, Keras-users
Might be worth switching to
tf.metrics.auc()
as
tf.contrib.metrics.streaming_auc() is now deprecated and soon to be removed


I'm getting the same answer with tf and sklearn, can you try the following code (replace a and b with your data) to see if it matches? 

So we can troubleshoot if it has something to do with keras (likely due to shape of the input) or if it has something to do with the tensorflow function.

import sklearn
from sklearn import metrics
tf.reset_default_graph()

a = np.random.randint(0,2,size=100)
b = np.random.randint(0,2,size=100)

print(metrics.roc_auc_score(a,b))
tf_auc,tf_update_op = tf.metrics.auc(a,b)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
print(sess.run((tf_auc,tf_update_op)))




--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/c8a5c3e2-4e72-4071-a74d-0454a6eb547a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages