Using image tranformations for validation

jo...@ericjonas.com

unread,

May 18, 2016, 4:11:05 PM5/18/16

to Keras-users

Hello! Happy Keras user here, I'm experimenting with data augmentation and I couldn't find an example which uses augmentation on both train and test datasets. When poking around, it appears I end up needing to create two separate ImageDataGenerators. Is this correct? In the example below I create both datagen and datagen_test, fit both to X_train, and then use datagen_test.flow for the validation data. Is there an easier way?

Thanks!

...Eric

    datagen = ImageDataGenerator(
        featurewise_center=True,
        featurewise_std_normalization=False, 
        width_shift_range=0.1,  
        height_shift_range=0.1)


    datagen.fit(X_train)


    datagen_test = ImageDataGenerator(
        featurewise_center=True,
        featurewise_std_normalization=False, 
        width_shift_range=0.1,  
        height_shift_range=0.1, 
    )
    datagen_test.fit(X_train)
        
    model.fit_generator(datagen.flow(X_train, Y_train, 
                                     batch_size=batch_size), 
                        samples_per_epoch=len(X_train), 
                        nb_epoch=nb_epoch,
                        verbose=1, 
                        validation_data = datagen_test.flow(X_test, Y_test, 
                                                           batch_size=batch_size), 
                        nb_val_samples = len(Y_test), 
                        class_weight={0:1.0/WEIGHT, 1: 1.0/1.0})

asmith26

unread,

May 19, 2016, 6:24:30 AM5/19/16

to Keras-users, jo...@ericjonas.com

Hi Eric,

I've generally followed advice from

Stanford cs231n/Karpathy, which states (you can find this information in the link by searching for Common Pitfall):

Common pitfall. An important point to make about the preprocessing is that any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation / test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake. Instead, the mean must be computed only over the training data and then subtracted equally from all splits (train/val/test).

Thus I've always done something like:

train_datagen = ImageDataGenerator(
        featurewise_center=True,
        featurewise_std_normalization=False, 
        width_shift_range=0.1,  
        height_shift_range=0.1)

train_datagen.fit(X_train)
        
model.fit_generator(train_datagen.flow(X_train, Y_train, 
                                     batch_size=batch_size), 
                        samples_per_epoch=len(X_train), 
                        nb_epoch=nb_epoch,
                        verbose=1, 
                        validation_data = train_datagen.flow(X_test, Y_test, 
                                                           batch_size=batch_size), 
                        nb_val_samples = len(Y_test), 
                        class_weight={0:1.0/WEIGHT, 1: 1.0/1.0})

Hope this helps, and I'd be interested to know what others do as well.
Cheers

jo...@ericjonas.com

unread,

May 19, 2016, 9:31:10 AM5/19/16

to Keras-users, jo...@ericjonas.com

Hi, thank you for your response! That was my intent by having the two separate ImageDataGenerators, but each one learning image statistics (mean, var, etc.) from the training data (X_train). My concern comes from looking at the ImageDataGenerator source, where


 def flow(self, X, y, batch_size=32, shuffle=False, seed=None,
 save_to_dir=None, save_prefix='', save_format='jpeg'):
   assert len(X) == len(y)
   self.X = X
   self.y = y
   self.save_to_dir = save_to_dir
   self.save_prefix = save_prefix
   self.save_format = save_format
   self.reset()
   self.flow_generator = self._flow_index(X.shape[0], batch_size,
   shuffle, seed)
   return self

This suggests that a single ImageDataGeneartor object can only have a single dataset it is iterating over at a single time, as X and y are set properties of the object (rather than being captured in the generator. In that case, it would seem to suggest that your example would not work, but perhaps I'm not following the codepath correctly?

Thanks again for the help,

...Eric

1.gup...@gmail.com

unread,

Sep 19, 2019, 10:01:46 AM9/19/19

to Keras-users

Hi, did you find your answer? I am using the same thing while doing cross-validation. Should I make different ImageDataGenerators for each validation set?

Reply all

Reply to author

Forward