Input data format for Recurrent Layers (PR #2033)

Ralph Aeschimann

no leída,

31 may 2016, 11:38:04 a.m.31/5/16

para Caffe Users

Hi,

I am figuring out how to use the recurrent layer proposed in PR #2033 to BVLC/caffe master branch.

My questions are theoretical, I haven't done any testing yet.

The first comment from the author in the PR describes the following

RecurrentLayer requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ... and the second -- the "sequence continuation indicators" delta -- has shape T x N, each holding T timesteps of N independent "streams". delta_{t,n} should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n.

I train a ConvNet with images from image sequences. Each image has dimensions C x H x W (channels, height, width).

In my previous experiments (ConvNet without any recurrent layers), I use an image batch of size N_batch for training, so my input data has shape N_batch x C x H x W.

Q1: In what shape should I pass my data to the first input of RecurrentLayer?

In case that I am only loading one image per iteration as input to the net (N_batch = 1), should this be T x N x C x H x W where T = N = 1?

In case that I load a batch of images in chronological order (e.g. N_batch = 10), should this be T x N x C x H x W, where T = 10 and N = 1?

Q2: If I want to use N > 1, then I need to make sure that every stream is one independent image sequence, right? So images of a certain image sequence A should always be passed to a certain steam n_A, right?

Q3: For testing, I want to load one image at a time classify it. This corresponds to T = N = 1. Can I use a net a trained with T = 10 and N = 1 for this testing or does it have to be a net trained with T = N = 1?

I would be happy about any comments or references.

I avoid posting these questions on the PR itself since the amount of comments there is getting out of hand.

Thanks,

Ralph

Anirban Ray

no leída,

19 sept 2016, 10:41:38 p.m.19/9/16

para Caffe Users

Hey, I also have similar query. Did you figure out yet ?

Ralph Aeschimann

no leída,

26 sept 2016, 9:48:19 a.m.26/9/16

para Caffe Users

I have figured out the answer after testing:

I train a ConvNet with images from image sequences. Each image has dimensions C x H x W (channels, height, width).
In my previous experiments (ConvNet without any recurrent layers), I use an image batch of size N_batch for training, so my input data has shape N_batch x C x H x W.

Q1: In what shape should I pass my data to the first input of RecurrentLayer?
In case that I am only loading one image per iteration as input to the net (N_batch = 1), should this be T x N x C x H x W where T = N = 1?
In case that I load a batch of images in chronological order (e.g. N_batch = 10), should this be T x N x C x H x W, where T = 10 and N = 1?

The two statements can be answered with "yes".

You can load multiple contiguous frames as a batch and pass them to the recurrent layer in the "T" timestep dimension.

If a frame starts a new sequence, you have to set the delta to 0, otherwise it is 1.

In order to make sequence learning possible, you have to pass the sequences in chronological order to the net.

Q2: If I want to use N > 1, then I need to make sure that every stream is one independent image sequence, right? So images of a certain image sequence A should always be passed to a certain steam n_A, right?

Correct.

Every stream stands for a sequence of clips attached to each other, e.g. stream A shows sequences A1, A2, etc.

Different streams show different sequences of clips, e.g. stream B shows sequences B1, B2, etc.

Q3: For testing, I want to load one image at a time classify it. This corresponds to T = N = 1. Can I use a net a trained with T = 10 and N = 1 for this testing or does it have to be a net trained with T = N = 1?

Yes, the values of T and N at testing time can be different from the values at training time.

Anirban Ray

no leída,

17 oct 2016, 9:55:34 p.m.17/10/16

para Caffe Users

Hey, Thank you so much for sharing. I still have some confusion about setting the delta value of 0 or 1.

Can you please share a snippet of your code if possible ?

Thanks !

Responder a todos

Responder al autor

Reenviar