Hi,
I am figuring out how to use the recurrent layer proposed in PR
#2033 to BVLC/caffe master branch.
My questions are theoretical, I haven't done any testing yet.
The first comment from the author in the PR describes the following
RecurrentLayer
requires 2 input (bottom) Blobs. The first -- the input data itself -- has shape T x N x ...
and the second -- the "sequence continuation indicators" delta
-- has shape T x N
, each holding T
timesteps of N
independent "streams". delta_{t,n}
should be a binary indicator (i.e., value in {0, 1}), where a value of 0 means that timestep t of stream n is the beginning of a new sequence, and a value of 1 means that timestep t of stream n is continuing the sequence from timestep t-1 of stream n.
I train a ConvNet with images from image sequences. Each image has dimensions C x H x W (channels, height, width).
In my previous experiments (ConvNet without any recurrent layers), I use an image batch of size N_batch for training, so my input data has shape N_batch x C x H x W.
Q1: In what shape should I pass my data to the first input of RecurrentLayer?
In case that I am only loading one image per iteration as input to the net (N_batch = 1), should this be T x N x C x H x W where T = N = 1?
In case that I load a batch of images in chronological order (e.g. N_batch = 10), should this be T x N x C x H x W, where T = 10 and N = 1?
Q2: If I want to use N > 1, then I need to make sure that every stream is one independent image sequence, right? So images of a certain image sequence A should always be passed to a certain steam n_A, right?
Q3: For testing, I want to load one image at a time classify it. This corresponds to T = N = 1. Can I use a net a trained with T = 10 and N = 1 for this testing or does it have to be a net trained with T = N = 1?
I would be happy about any comments or references.
I avoid posting these questions on the PR itself since the amount of comments there is getting out of hand.
Thanks,
Ralph