model.predict_generator not working with test images

rcam...@pwssc.org

unread,

Aug 29, 2018, 5:22:08 PM8/29/18

to Keras-users

Hi all:

I am developing CNN with Keras to identify plankton images take with an in situ camera I have on a mooring in Prince William Sound. I have several million images, amounting to 10's of gigabytes, so taking things into memory is not an option.

I have been having trouble getting sensible predictions on my test sets following building up and validating a model - although the model trains up well, and evaluate_generator gives good scores, when I use the predict_generator to generate predictions (e.g. to make a confusion matrix) I am getting results that look no different from random. To illustrate this I've put together an example using MNIST.

The MNIST data is available as .png files in subdirectories here.

Here is a model posted by Aditya Soni that trains up quite well, it should just work with an adjustment to train_data_dir to wherever one puts the files:

import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D
from keras.layers.normalization import BatchNormalization
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau


img_width, img_height = 28, 28
batch_size = 64
num_classes = 10
epochs = 20
input_shape = (img_width, img_height, 3)
train_data_dir = 'S:/mnist_png/training'


model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',kernel_initializer='he_normal',input_shape=input_shape))
model.add(Conv2D(32, kernel_size=(3, 3),activation='relu',kernel_initializer='he_normal'))
model.add(MaxPool2D((2, 2)))
model.add(Dropout(0.20))
model.add(Conv2D(64, (3, 3), activation='relu',padding='same',kernel_initializer='he_normal'))
model.add(Conv2D(64, (3, 3), activation='relu',padding='same',kernel_initializer='he_normal'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Conv2D(128, (3, 3), activation='relu',padding='same',kernel_initializer='he_normal'))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())
model.add(Dropout(0.25))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.RMSprop(),
              metrics=['accuracy'])

learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc', 
                                            patience=3, 
                                            verbose=1, 
                                            factor=0.5, 
                                            min_lr=0.0001)

datagen = ImageDataGenerator(
    rescale=1. / 255,	#normalization
	featurewise_center=False,  # set input mean to 0 over the dataset
    samplewise_center=False,  # set each sample mean to 0
    featurewise_std_normalization=False,  # divide inputs by std of the dataset
    samplewise_std_normalization=False,  # divide each input by its std
    zca_whitening=False,  # apply ZCA whitening
    rotation_range=15, # randomly rotate images in the range (degrees, 0 to 180)
    zoom_range = 0.1, # Randomly zoom image 
    width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=False,  # randomly flip images
    vertical_flip=False,  # randomly flip images
	validation_split=0.1)
		
train_generator = datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode='categorical')		

h=model.fit_generator(
    train_generator,
	steps_per_epoch=60000//batch_size,
    epochs=epochs,
	verbose=1,
	callbacks=[learning_rate_reduction],)

....the final epoch on that has loss: 0.0293 - acc: 0.9911

I then test it thusly:

test_data_dir = 'S:/mnist_png/testing'

test_datagen = ImageDataGenerator(
    rescale=1. / 255)

test_generator = test_datagen.flow_from_directory(
    test_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
	shuffle='False',
    class_mode='categorical')

#Evaluate model on test set
scores = model.evaluate_generator(test_generator,workers=12)

...scores returns loss=0.01471, acc=0.9956, not bad. Note that I've set shuffle to 'False' so it should be going through in order.

But then if I look at the predictions it all falls apart:

test_generator.reset() #Necessary to force it to start from beginning
Y_pred = model.predict_generator(test_generator)
y_pred = np.argmax(Y_pred, axis=-1)
sum(y_pred==test_generator.classes)/10000

I got the first line above from Keras issue 3296. The accuracy in that final line (# of correct predictions / number of images) is 0.1033, not different from random. Further, if I generate a confusion matrix:

from sklearn.metrics import confusion_matrix
confusion_matrix(test_generator.classes,y_pred)

it returns the following:

array([[104, 117, 109, 100,  93, 105,  95,  88,  75,  94],
       [109, 134, 101, 122, 121, 100, 105, 106, 115, 122],
       [106, 116, 113,  82, 112, 105,  98,  92, 104, 104],
       [ 83, 124,  92, 118,  90,  71, 101, 132,  97, 102],
       [101, 105, 104, 107, 107,  76,  91, 109,  98,  84],
       [ 89,  92,  98,  76,  85,  71,  97,  93, 102,  89],
       [ 88, 111, 106, 105,  81,  89,  76, 104,  88, 110],
       [ 87, 116,  95, 103, 104,  88, 102, 122,  99, 112],
       [104, 113, 100,  92, 105,  96,  92,  87,  92,  93],
       [112, 104, 115,  96,  91,  95, 102,  95, 103,  96]], dtype=int64)

...which also looks random to me.

I have checked that test_generator.classes matches up with the directory names in test_generator.filenames. I expect that something is happening with the way that the predict_generator is making the predictions, and I've been up and down the documentation and the web, but I'm baffled at this point how to get it to work. Anyone have any thoughts?

Sergey O.

unread,

Aug 29, 2018, 6:06:55 PM8/29/18

to rcam...@pwssc.org, Keras-users

Sounds like a pretty cool dataset!

I don't think test_generator.classes is what you want.

Do the following instead:

# to get one batch of images and labels

y_img_batch, y_class_batch = test_generator[0]

y_pred = np.argmax(model.predict(y_img_batch),-1)

y_true = np.argmax(y_class_batch,-1)

print(sum(y_pred==y_true)/batch_size)

PS, if you print y_class_batch, you'll see that the order is not the same as test_generator.classes!

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/ce9172e9-2c34-4a04-8319-db322c449c74%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sergey O.

unread,

Aug 29, 2018, 6:25:22 PM8/29/18

to rcam...@pwssc.org, Keras-users

I think I found a better solution (which doesn't require calling the batch generator):

test_generator.reset()

Y_pred = model.predict_generator(test_generator)

classes = test_generator.classes[test_generator.index_array]

y_pred = np.argmax(Y_pred, axis=-1)

sum(y_pred==classes)/10000

0.9922

from sklearn.metrics import confusion_matrix

confusion_matrix(test_generator.classes[test_generator.index_array],y_pred)

array([[ 980, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 1133, 2, 0, 0, 0, 0, 0, 0, 0], [ 1, 0, 1031, 0, 0, 0, 0, 0, 0, 0], [ 1, 1, 10, 987, 0, 6, 0, 4, 1, 0], [ 0, 0, 0, 0, 976, 0, 0, 0, 1, 5], [ 1, 0, 0, 1, 0, 885, 4, 0, 1, 0], [ 5, 4, 0, 0, 0, 0, 948, 0, 1, 0], [ 0, 1, 7, 0, 0, 0, 0, 1019, 0, 1], [ 7, 0, 4, 0, 1, 0, 0, 1, 960, 1], [ 0, 0, 0, 0, 4, 0, 0, 2, 0, 1003]])

rcam...@pwssc.org

unread,

Aug 29, 2018, 8:04:37 PM8/29/18

to Keras-users

Perfect, that's got it - thanks! I figured I must be messing up the indexing somewhere.

Cheers, Rob

lamborg...@gmail.com

unread,

Oct 24, 2019, 10:15:34 AM10/24/19

to Keras-users

Hi sokrypton,

This part code actually works for my validation, but for my test and training set, I still like predicting the ransom predictions. I check the generator, they are all in the same setting.

在 2018年8月30日星期四 UTC+2上午12:25:22，sokrypton写道：

To unsubscribe from this group and stop receiving emails from it, send an email to keras...@googlegroups.com.

artimi...@gmail.com

unread,

Mar 22, 2020, 9:34:03 PM3/22/20

to Keras-users

setting shuffle=false to evaluate_generator and predict_generator fixed the issue for me

Choe Seonghun

unread,

Mar 31, 2020, 6:06:12 AM3/31/20

to Keras-users

Thanks for your advice.

setting shuffle = True is the cause of the problem.

Lance Norskog

unread,

Mar 31, 2020, 4:14:21 PM3/31/20

to Choe Seonghun, Keras-users

Accuracy of 0.99 is almost certainly bogus. The model is probably far overtrained.

You might get more valid results by monitoring the val_loss, and stopping when that is at its minimum.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/0f5d6325-cc19-4171-94d1-119754f39b50%40googlegroups.com.

--

Lance Norskog
lance....@gmail.com
Redwood City, CA

sathiya...@gmail.com

unread,

Apr 5, 2020, 9:46:50 AM4/5/20

to Keras-users

eval_idg = ImageDataGenerator(rescale=1. / 255)
eval_g = eval_idg.flow_from_directory(directory=r'C:/Users/admin/Downloads/plantdisease_dataset/Testing',
                                      target_size=(100, 100),
                                      class_mode='binary',
                                      batch_size=5,
                                      shuffle=False)
eval_acc = my_model.evaluate_generator(eval_g, steps=1)
#print('evaluation Loss over never-before-seen images is: {:.4f}'.format(eval_loss))
print('evaluation Accuracy over never-before-seen images is: {:4.2f}%'.format(eval_acc*100), '\n')

# Individual Predictions
pred_idg = eval_idg
pred_g = eval_g
pred = my_model.predict_generator(pred_g, steps=1)
print(pred_g.filenames, '\n')
print(pred_g.class_indices, '\n')
print(pred[0:5], '\n')

Hi,

I dont get any output .can anyone help me?

brettbu...@gmail.com

unread,

Apr 15, 2020, 6:31:13 PM4/15/20

to Keras-users

Thank you!!

To unsubscribe from this group and stop receiving emails from it, send an email to keras...@googlegroups.com.

souzap...@gmail.com

unread,

Jun 25, 2020, 1:48:44 AM6/25/20

to Keras-users

Grateful!

To unsubscribe from this group and stop receiving emails from it, send an email to keras...@googlegroups.com.

Reply all

Reply to author

Forward