Convolution2D on images with different sizes

3,848 views
Skip to first unread message

deco...@gmail.com

unread,
Feb 4, 2016, 12:03:48 PM2/4/16
to Keras-users
I would  to process the output of the Convolution2D with a RNN (Similar approach to the Image Caption Generation with attention paper http://arxiv.org/pdf/1502.03044v2.pdf ) so I think I could use a Masking layer to deal with the different output size.

However I can't deal with different input shape to the network, What would be the best way to use Convolution2D on images with different size? Up to now I could only think of padding the numpy arrays of the input images with zeros to make them of the same size. However this can be very ineficient if there is a considerable variation in the size of the images (as can be the case).

Can someone imagine a way that it can be possible to work with different size images other than padding them? I'm not an expert in Theano but I think it can't be done. But it would be great if it was possible since I think it would probably avoid unnecessary usage of memory and computing time.

If someone thinks its possible and can give me some hints I would really appreciate it. (Also if someone is certain that it can't be done with current Theano and/or Tensorflow )

Atlas

unread,
Feb 10, 2016, 5:34:22 PM2/10/16
to Keras-users, deco...@gmail.com
Could you have a small input shape for your Convolution2D sets and then downsample your images to it? If an image is smaller then use the upsample or zeropad layers..

François Chollet

unread,
Feb 10, 2016, 5:37:46 PM2/10/16
to Atlas, Keras-users, deco...@gmail.com
What would be the best way to use Convolution2D on images with different size?

I believe that is possible with Keras. Do you have a specific code example that doesn't work?

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/8776dba5-225e-43a9-ad8b-4c0519e231d9%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Klemen Grm

unread,
Feb 12, 2016, 3:40:45 AM2/12/16
to Keras-users, deco...@gmail.com
In  what use case does this not work for you? I've just tried it with a fully convolutional network, different image sizes can be used both in prediction and in training modes, provided the tensor shapes match.

>>> from keras.models import Sequential
>>> from keras.layers import Convolution2D
>>> m = Sequential()
>>> m.add(Convolution2D(8,3,3, input_shape=(1,10,10)))
>>> m.compile(loss="mae", optimizer="sgd")
>>> c = m.predict(np.random.rand(1,1,10,10))
>>> c.shape
(1, 8, 8, 8)
>>> c = m.predict(np.random.rand(1,1,20,20))
>>> c.shape
(1, 8, 18, 18)
>>> m.fit(np.random.rand(100,1,10,10), np.random.rand(100,8,8,8))
<keras.callbacks.History object at 0x7f349919fb10>
>>> m.fit(np.random.rand(100,1,12,12), np.random.rand(100,8,10,10))
<keras.callbacks.History object at 0x7f3489d07a90>

deco...@gmail.com

unread,
Feb 15, 2016, 5:08:38 AM2/15/16
to Keras-users, deco...@gmail.com
From the constructor of the Convolution2D, since it requires the input_shape parameter:

m.add(Convolution2D(8,3,3, input_shape=(1,10,10)))

I imagined that It has to work with images of size grayscale 10x10. Now I see that this parameter is somehow ignored and that it actually works with images of different sizes.  So I will be more specific in my question:

model = Sequential()

model.add(Convolution2D(8, 3, 3, input_shape=np.shape(X_train[np.newaxis,0,:,:])))

model.add(Permute((3,2,1)))

model.add(Reshape((-1,np.prod(model.layers[-1].output_shape[-2:]))))

<------- Masking ? ------->

model.add(SimpleRNN(output_dim))

model.add(TimeDistributedDense(nb_classes,activation='softmax'))

optimizer = RMSprop(lr=learning_rate)

model.compile(loss='categorical_crossentropy', optimizer=optimizer)



In my training set, each image has a different shape. Since the output of the Convolution2D will have different shape depending on the imput image, after the reshape I will have sequences of different length.  Is it possible to deal with that variable length sequence generated inside the neural network ?
The only idea that I could come up with was to pad the images with zeros, so that the output of the Conv2D will always produce a sequence of the same length, but some of the elements of that sequence will just contain no information at all and I am wasting computation time.

It would be great If I could have a layer to put in between the Reshape layer and the SimpleRNN layer that could pad/mask the sequences output from the Conv2D to make it possible to. Is that possible with Keras ?

Maybe, since It seems to ignore the input_shape parameter I can just work with batch_size=1 and it should work. Am I right ?

Klemen Grm

unread,
Feb 15, 2016, 5:13:01 AM2/15/16
to Keras-users, deco...@gmail.com
No, that's not the case. The input_shape parameter is not ignored, it is used for the case where the following layers depend on the shape of the layer's output. Therefore, different input sizes will only work when that's not the case - ie, when you're working with a fully-convolutional network and when the training output sizes match. If you have non-convolutional layers, they will be initialised for the input size specified, and the network will only work for inputs of that size, in which case you may consider either scaling, cropping or padding your images to match it.

asphalt

unread,
Jul 21, 2017, 6:52:35 AM7/21/17
to Keras-users, deco...@gmail.com

Hey, 

I have a training set of images, each of which has a different size. I do not want to lose data by resizing the images. How can one feed the data to model.fit() funcion, as it accepts only arrays and a single array consisting of multiple arrays( of different dimensions) is not supported by numpy. 

Thanks for all the help!!

Daπid

unread,
Jul 21, 2017, 8:34:59 AM7/21/17
to asphalt, Keras-users, deco...@gmail.com
The simplest thing is to use fit_generator and feed same-sized batches. The alternative, is to implement some sort of masking and padding, but how to do it correctly depends exactly on what exactly you are doing inside the network.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users+unsubscribe@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/c2485083-50b7-4c08-ad01-58663eda579d%40googlegroups.com.

deco...@gmail.com

unread,
Jul 26, 2017, 7:18:37 AM7/26/17
to Keras-users, asfi...@gmail.com, deco...@gmail.com
If your dataset is small and performance is not a main issue, you can train with batchsize=1, thus all of your images in the batch (one) will be of the same size.


El viernes, 21 de julio de 2017, 14:34:59 (UTC+2), David Menéndez Hurtado escribió:
The simplest thing is to use fit_generator and feed same-sized batches. The alternative, is to implement some sort of masking and padding, but how to do it correctly depends exactly on what exactly you are doing inside the network.
On 21 July 2017 at 12:52, asphalt <asfi...@gmail.com> wrote:

Hey, 

I have a training set of images, each of which has a different size. I do not want to lose data by resizing the images. How can one feed the data to model.fit() funcion, as it accepts only arrays and a single array consisting of multiple arrays( of different dimensions) is not supported by numpy. 

Thanks for all the help!!

On Thursday, February 4, 2016 at 10:33:48 PM UTC+5:30, deco...@gmail.com wrote:
I would  to process the output of the Convolution2D with a RNN (Similar approach to the Image Caption Generation with attention paper http://arxiv.org/pdf/1502.03044v2.pdf ) so I think I could use a Masking layer to deal with the different output size.

However I can't deal with different input shape to the network, What would be the best way to use Convolution2D on images with different size? Up to now I could only think of padding the numpy arrays of the input images with zeros to make them of the same size. However this can be very ineficient if there is a considerable variation in the size of the images (as can be the case).

Can someone imagine a way that it can be possible to work with different size images other than padding them? I'm not an expert in Theano but I think it can't be done. But it would be great if it was possible since I think it would probably avoid unnecessary usage of memory and computing time.

If someone thinks its possible and can give me some hints I would really appreciate it. (Also if someone is certain that it can't be done with current Theano and/or Tensorflow )

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.

stewart...@gmail.com

unread,
Oct 18, 2017, 10:41:26 AM10/18/17
to Keras-users
No matter what you do, all of the images in a batch must be of the same dimension. The frameworks are simply built that way and I am aware of no exceptions.
One of the best things you can do performance-wise is to batch the images in clusters of similar image size, however this may have adverse consequences on the quality of your mini-batch gradients, particularly if the statistical properties of smaller images do not match those of the larger images, which depending on the dataset may or may not be likely.
Reply all
Reply to author
Forward
0 new messages