LSTM caffe code for activity recognition by lisa - classification with smaller memory

480 views
Skip to first unread message

dusa

unread,
May 13, 2016, 3:37:49 PM5/13/16
to Caffe Users

Hi!
I am trying to run the LSTM code for activity recognition by lisa -http://www.eecs.berkeley.edu/~lisa_anne/LRCN_video

I trained the singleframe and lstm models now I am at the classification stage. I have a Tesla with 2.0 computation capability so I did all training by using smaller batches, however now in classification, I am not sure how do I run with smaller set of inputs.

--> So I assume it takes 16 frames at once and according to this line - shape = (10*16, 3, 227, 227) - it does take them as batches of 10 as well - I don't know python but I have figured out this should be the batch and we need to use deploy.prototxt and write datalayer in code etc etc.

So! What exactly is this 10?? Is it taking 10 samples to average? Or is it keeping bits to refer to sequence of frames, or is it that we use the output of the CNNs as input for LSTM and the layer we use as input is 10*16(frames)?? It is also declared in the deploy:

name: "reshape-data"

type: "Reshape"

bottom: "fc6"

top: "fc6-reshape"

reshape_param{

shape{

dim: 16

dim: 10

dim: 4096

} (and a few similar layers like this)

--- what exactly is it and how can I make the classify run with smaller memory?

I also am not sure about this line's purpose with the 10s? For averaging? output_predictions[i:i+batch_size] = np.mean(out['probs'].reshape(10,caffe_in.shape[0]/10,3),0)

Much appreciated. Thanks!

edit: Also see the input in deploy:

name: "Hyb2Net-LSTM"

input: "data"

input_dim: 160

input_dim: 3

input_dim: 227

input_dim: 227

input: "clip_markers"

input_dim: 160

input_dim: 1

input_dim: 1

input_dim: 1

So again, why do we have 160 inputs? Considering 16 is the no. of frames from the video, is 10 the output of the layer we use as input to LSTM or what? Thx

auro tripathy

unread,
Jul 26, 2016, 1:40:02 PM7/26/16
to Caffe Users
Glad to hear that someone else is actively working with this example of LSTM. The classification can be done sequence by sequence (1*16). It may take longer The times*10 may have to do with the fact that oversampling takes  10 random samples within the same image (I'm making a statement out of memory). 

dusa

unread,
Aug 1, 2016, 12:35:17 PM8/1/16
to Caffe Users
@auro tripathy

Thank you for your answer, yes indeed, I have changed the part where it takes 10 crops and just used one center image and was able to run it without having the memory issue. My results were definitely overfitting though, that should be due to the dataset as well.
I will try to run this with a new but not so good GPU, I am hoping to decrease the memory requirement by using cuDNN. 

dusa

unread,
Jan 9, 2017, 7:08:22 PM1/9/17
to Caffe Users


Hi all, I was wondering if anyone has recently used this fork. I tried to merge upstream it to the master caffe but there are quite a lot conflicts I don't know where to begin to fix with. Since I have had problems with the memory, I changed my GPU and came back to this fork to revisit a project I have started but no luck with making it run.
Message has been deleted

dusa

unread,
Jan 10, 2017, 11:28:33 AM1/10/17
to Caffe Users

Just an update to my last post, it does build without the cudnn support. I turned that off and it has built, I haven't started experimenting yet but it seems to be fine so far.
Reply all
Reply to author
Forward
0 new messages