def _load_image_and_apply_transform(self, j):
fname = self.filenames[j]
img = load_img(os.path.join(self.directory, fname),
color_mode=self.color_mode,
target_size=self.target_size,
interpolation=self.interpolation)
x = img_to_array(img, data_format=self.data_format)
# Pillow images should be closed after `load_img`,
# but not PIL images.
if hasattr(img, 'close'):
img.close()
params = self.image_data_generator.get_random_transform(x.shape)
x = self.image_data_generator.apply_transform(x, params)
x = self.image_data_generator.standardize(x)
return x
def _get_batches_of_transformed_samples(self, index_array):
...
# build batch of image data
results = self.pool.map(self._load_image_and_apply_transform, index_array)
for i in range(len(index_array)):
batch_x[i] = results[i]
It gives me good results so far. For instance on a training data set with 317k images, on an 8 VCPUs machine with a 1080 TI gpu, it reduced the epoch time from ~3h20 to ~0:53. I can also see that the GPU is well used as well as all 8 VCPUs.
My questions:
- what do you think about this modification? (we could also add a parameter to flow_from_directory / DirectoryIterator to specify if we want to use multi-threaded data loading/augmentation and on how many threads)
- anything speaks against it?
- would you be interested in a pull request?
Feedback welcome!
Cheers,
Pat