LSTM caffe code for activity recognition by lisa - classification with smaller memory

66 views
Skip to first unread message

dusa

unread,
May 13, 2016, 3:36:39 PM5/13/16
to Caffe Users

Hi!
I am trying to run the LSTM code for activity recognition by lisa -http://www.eecs.berkeley.edu/~lisa_anne/LRCN_video

I trained the singleframe and lstm models now I am at the classification stage. I have a Tesla with 2.0 computation capability so I did all training by using smaller batches, however now in classification, I am not sure how do I run with smaller set of inputs.

--> So I assume it takes 16 frames at once and according to this line - shape = (10*16, 3, 227, 227) - it does take them as batches of 10 as well - I don't know python but I have figured out this should be the batch and we need to use deploy.prototxt and write datalayer in code etc etc.

So! What exactly is this 10?? Is it taking 10 samples to average? Or is it keeping bits to refer to sequence of frames, or is it that we use the output of the CNNs as input for LSTM and the layer we use as input is 10*16(frames)?? It is also declared in the deploy:

name: "reshape-data"

type: "Reshape"

bottom: "fc6"

top: "fc6-reshape"

reshape_param{

shape{

dim: 16

dim: 10

dim: 4096

} (and a few similar layers like this)

--- what exactly is it and how can I make the classify run with smaller memory?

I also am not sure about this line's purpose with the 10s? For averaging? output_predictions[i:i+batch_size] = np.mean(out['probs'].reshape(10,caffe_in.shape[0]/10,3),0)

Much appreciated. Thanks!

edit: Also see the input in deploy:

name: "Hyb2Net-LSTM"

input: "data"

input_dim: 160

input_dim: 3

input_dim: 227

input_dim: 227

input: "clip_markers"

input_dim: 160

input_dim: 1

input_dim: 1

input_dim: 1

So again, why do we have 160 inputs? Considering 16 is the no. of frames from the video, is 10 the output of the layer we use as input to LSTM or what? Thx

Reply all
Reply to author
Forward
0 new messages