datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=False,
width_shift_range=0.1,
height_shift_range=0.1)
datagen.fit(X_train)
datagen_test = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=False,
width_shift_range=0.1,
height_shift_range=0.1,
)
datagen_test.fit(X_train)
model.fit_generator(datagen.flow(X_train, Y_train,
batch_size=batch_size),
samples_per_epoch=len(X_train),
nb_epoch=nb_epoch,
verbose=1,
validation_data = datagen_test.flow(X_test, Y_test,
batch_size=batch_size),
nb_val_samples = len(Y_test),
class_weight={0:1.0/WEIGHT, 1: 1.0/1.0})Hi Eric,
I've generally followed advice from Stanford cs231n/Karpathy, which states (you can find this information in the link by searching for Common Pitfall):Common pitfall. An important point to make about the preprocessing is that any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake. Instead, the mean must be computed only over the training data and then subtracted equally from all splits (train/val/test).
Thus I've always done something like:
train_datagen = ImageDataGenerator(
featurewise_center=True,
featurewise_std_normalization=False,
width_shift_range=0.1,
height_shift_range=0.1)
train_datagen.fit(X_train)
model.fit_generator(train_datagen.flow(X_train, Y_train,
batch_size=batch_size),
samples_per_epoch=len(X_train),
nb_epoch=nb_epoch,
verbose=1,
validation_data = train_datagen.flow(X_test, Y_test,
batch_size=batch_size),
nb_val_samples = len(Y_test),
class_weight={0:1.0/WEIGHT, 1: 1.0/1.0})
Hope this helps, and I'd be interested to know what others do as well.
Cheers
def flow(self, X, y, batch_size=32, shuffle=False, seed=None,
save_to_dir=None, save_prefix='', save_format='jpeg'):
assert len(X) == len(y)
self.X = X
self.y = y
self.save_to_dir = save_to_dir
self.save_prefix = save_prefix
self.save_format = save_format
self.reset()
self.flow_generator = self._flow_index(X.shape[0], batch_size,
shuffle, seed)
return self