What is the best way to handle variable batch size?

마피아

unread,

Jan 21, 2016, 3:59:41 AM1/21/16

to Discuss

Hi.

It seems like TensorFlow does not cope with variable batch size well.

I am trying to use a TensorFlow version CNN as a feature extractor, and the number of images I want to process is always different, so it is essential to handle variable size of batch size.

Is there any way to to this rather than just setting the batch size 1 and iterate whole images every time, or setting the batch size big and padding blank spaces?

Thank you.

-Taeksoo

Rafał Józefowicz

unread,

Jan 21, 2016, 7:10:12 AM1/21/16

to Discuss

You don't have specify the batch size value during the graph construction. The only change needed would be to update the placeholders to replace the dimension value with None, e.g.:

inputs = tf.placeholder(tf.float32, [None, 32, 32, 3])

batch_size = tf.shape(inputs[0])[0] # batch_size is now a tensor and we don't know its value until we execute the graph.

With this you can feed a batch of any size.

마피아

unread,

Jan 21, 2016, 9:55:26 AM1/21/16

to Discuss

Thank you, that seems like the solution.

I will try that one.

-Taeksoo

2016년 1월 21일 목요일 오후 9시 10분 12초 UTC+9, Rafał Józefowicz 님의 말:

마피아

unread,

Jan 21, 2016, 9:04:22 PM1/21/16

to Discuss

Is there a way of coping with variable sequence length when using RNN as well?

Each of the maximum length of each minibatch can be differ a lot. I always set the maximum length as the sequence length, and iterate the sequences with that length even though some minibatches have much shorter sequence lengths.

Is there also a smart way of treating variable sequence lengths?

Thank you.

-Taeksoo

2016년 1월 21일 목요일 오후 9시 10분 12초 UTC+9, Rafał Józefowicz 님의 말:

You don't have specify the batch size value during the graph construction. The only change needed would be to update the placeholders to replace the dimension value with None, e.g.:

Mike Schuster

unread,

Jan 21, 2016, 9:13:31 PM1/21/16

to 마피아, Discuss

One way we are doing this is to organize the batches by length (roughly all the sequences in a single batch have the same length) and then pass in the sequence lengths such that computation gets terminated early whenever the longest sequence is done. Of course the graph will have to be unrolled up to the longest sequence overall but you will need that memory anyway when these sequences are processed.

If you go through all your data starting from the shortest sequences it may also improve convergence as it is often easier to learn the short sequences and then generalize to longer ones.

Mike

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/c4e9511b-a933-4fe8-8b79-ca9e952d782d%40tensorflow.org.

마피아

unread,

Jan 21, 2016, 9:21:30 PM1/21/16

to Discuss

Passing the length of the minibatch and terminating earlier sounds like a good idea... I never thought that way. Thank you very much!

By the way, when you saying "if you go through all your data starting from the shortest sequences it may also improve convergence", you mean I gather short sequences and make them a minibatch?

Thank you

-Taeksoo

2016년 1월 22일 금요일 오전 11시 13분 31초 UTC+9, Mike Schuster 님의 말:

Mike Schuster

unread,

Jan 21, 2016, 11:24:55 PM1/21/16

to 마피아, Discuss

On Jan 21, 2016 6:21 PM, "마피아" <jazzsa...@gmail.com> wrote:
>
> Passing the length of the minibatch and terminating earlier sounds like a good idea... I never thought that way. Thank you very much!
>

I said this because the rnn code will do the early termination for you (but only if you pass in the lengths).

> By the way, when you saying "if you go through all your data starting from the shortest sequences it may also improve convergence", you mean I gather short sequences and make them a minibatch?

Usually it is easier to learn from short sequences (depending on the application). Yes, gather short sequences first, run through minibatch, then use longer sequences, run through minibatch etc.

Mike

> To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/376403af-c207-4860-b607-39c2f8086dbd%40tensorflow.org.

Kenneth Tran

unread,

Jan 24, 2016, 4:06:35 AM1/24/16

to Discuss

Hi Mike,

Does doing this break the randomness of data distribution and worsen the convergence rate?

In another study on linear learning, we found that random shuffling (between epochs) is necessary to obtain good convergence rate.

-Ken

Mike Schuster

unread,

Jan 24, 2016, 12:25:21 PM1/24/16

to Kenneth Tran, Discuss

On Sun, Jan 24, 2016 at 1:06 AM, Kenneth Tran <o...@kentran.net> wrote:

Hi Mike,

Does doing this break the randomness of data distribution and worsen the convergence rate?

In another study on linear learning, we found that random shuffling (between epochs) is necessary to obtain good convergence rate.

It all depends on your data distribution. Yes, you need to randomize and shuffle, but for RNNs/LSTMs where you share parameters across time,

the random distribution of lengths is *not* important during training (in a few experiments I have seen). In fact, you can speed up learning

(convergence) in these cases because it is easier to learn from the short sequences first, then gradually going to longer ones (still randomly

distributed among themselves of course).

Mike

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/27ea9165-a6e9-4476-8dd6-57d3b8e1a77d%40tensorflow.org.

Kenneth Tran

unread,

Jan 24, 2016, 1:01:17 PM1/24/16

to Mike Schuster, Discuss

I didn't mean random distribution of lengths. When you order sequences by length, the order is fixed across epochs.

I found that for SGD and variants, even if the data is randomized, shuffling between epochs is still necessary for good convergence.

--

Sent while mobile

dia...@gmail.com

unread,

May 27, 2016, 4:24:21 PM5/27/16

to Discuss

Hey Taekso and guys, thanks for this discussion posted here. It has been really productive for me since I'm just starting with Tensor flow. I don't know whether it's possible or not, this that I'm going to say, but to cope with the sequence lenght, wouldn't be also possible to replace the sequence lenght of the placeholder with None, same as you did with the batch size?

I haven't tested that, however if you guys say that the best option is to give the maximum sequence lenght of the batch to the model, then I will keep it like that.

Thanks

Nathaniel Tucker

unread,

May 27, 2016, 5:31:26 PM5/27/16

to dia...@gmail.com, Discuss

@Taikso, what I have done in the past is to use the `sequence_length` param on `rnn` (see here), but be a bit careful, the below batch of code will throw an error with `sequence_length`. But if you remove the embedding wrapper I think it does compile:

In [1]: import tensorflow as tf

In [2]: encoder_cell_size=1024

In [3]: encoder_cell = tf.nn.rnn_cell.GRUCell(encoder_cell_size)

In [4]: vocab_size = 10

In [5]: embedding_size = 1024

In [6]: embedding_encoder_cell = tf.nn.rnn_cell.EmbeddingWrapper(

   ...:             encoder_cell, embedding_classes=vocab_size,

   ...:             embedding_size=embedding_size)

In [7]:  max_sentence_len=40

In [8]: encoder_inputs = tf.placeholder(tf.int32, shape=[max_sentence_len])

In [9]: encoder_sequence_length = tf.placeholder(tf.int32)

In [10]:         _, encoder_state = tf.nn.rnn(embedding_encoder_cell,
   ....:             tf.unpack(tf.reshape(encoder_inputs, [max_sentence_len, 1])),
   ....:             dtype=tf.float32, sequence_length=encoder_sequence_length)

--

You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/e2767f72-a84e-4ec8-9d8b-84d8b718e347%40tensorflow.org.

Eugene Brevdo

unread,

May 29, 2016, 1:35:28 AM5/29/16

to Nathaniel Tucker, Discuss, dia...@gmail.com

Use tf.nn.dynamic_rnn. then for each minibatch you only feed a 3-tensor shaped (batch, time, depth) with time padded to the maximum length for that minibatch.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/CALg6vLPFjiE6FKfUwvT_trV0cVvLSydbsh45p3oKmGiWHiBbiw%40mail.gmail.com.

dia...@gmail.com

unread,

May 31, 2016, 3:11:23 AM5/31/16

to Discuss, k...@google.com, dia...@gmail.com

Hello Eugene, Thanks for your answer. It was just on time.

Greetings

Diego

Reply all

Reply to author

Forward

Message has been deleted