Long-term Recurrent Convolutional Networks paper reproduction error in LSTM?

1,110 views
Skip to first unread message

Ashwin Nair

unread,
Mar 24, 2016, 11:36:25 AM3/24/16
to Caffe Users

I am trying to reproduce the Long-term Recurrent Convolutional Networks paper.

I have used their given code. And followed their instructions and generated the single frame model. But when trying to train the LSTM hybrid network it fails. I have already made necessary changes as mentioned in the instructions.

The command i run is the caffe train -solver lstm_solver_flow.prototxt -weights singleframe_flow/snaps/snapshots_singleFrame_flow_v2_iter_50000.caffemodel the output I get is


I0323 18:16:30.685951  9123 net.cpp:205] This network produces output loss
I0323 18:16:30.685967  9123 net.cpp:446] Collecting Learning Rate and Weight Decay.
I0323 18:16:30.685976  9123 net.cpp:218] Network initialization done.
I0323 18:16:30.685982  9123 net.cpp:219] Memory required for data: 817327112
I0323 18:16:30.686339  9123 solver.cpp:42] Solver scaffolding done.
I0323 18:16:30.686388  9123 caffe.cpp:86] Finetuning from singleframe_flow/snaps/snapshots_singleFrame_flow_v2_iter_50000.caffemodel
I0323 18:16:33.377488  9123 solver.cpp:247] Solving lstm_joints
I0323 18:16:33.377518  9123 solver.cpp:248] Learning Rate Policy: step
I0323 18:16:33.391726  9123 solver.cpp:291] Iteration 0, Testing net (#0)
Traceback (most recent call last):
  File "/home/anilil/projects/lstm/lisa-caffe-public/examples/LRCN_activity_recognition/sequence_input_layer.py", line 220, in forward
    new_result_data = [None]*len(self.batch_advancer.result['data']) 
KeyError: 'data'
terminate called after throwing an instance of 'boost::python::error_already_set'
*** Aborted at 1458753393 (unix time) try "date -d @1458753393" if you are using GNU date ***
PC: @     0x7f243731bcc9 (unknown)
*** SIGABRT (@0x23a3) received by PID 9123 (TID 0x7f24389077c0) from PID 9123; stack trace: ***
    @     0x7f243731bd40 (unknown)
    @     0x7f243731bcc9 (unknown)
    @     0x7f243731f0d8 (unknown)
    @     0x7f2437920535 (unknown)
    @     0x7f243791e6d6 (unknown)
    @     0x7f243791e703 (unknown)
    @     0x7f243791e976 (unknown)
    @     0x7f2397bb5bfd caffe::PythonLayer<>::Forward_cpu()
    @     0x7f243821d87f caffe::Net<>::ForwardFromTo()
    @     0x7f243821dca7 caffe::Net<>::ForwardPrefilled()
    @     0x7f243822fd77 caffe::Solver<>::Test()
    @     0x7f2438230636 caffe::Solver<>::TestAll()
    @     0x7f243823837b caffe::Solver<>::Step()
    @     0x7f2438238d5f caffe::Solver<>::Solve()
    @           0x4071c8 train()
    @           0x405701 main
    @     0x7f2437306ec5 (unknown)
    @           0x405cad (unknown)
    @                0x0 (unknown)
run_lstm_flow.sh: line 8:  9123 Aborted                 (core dumped) GLOG_logtostderr=1 $TOOLS/caffe train -solver lstm_solver_flow.prototxt -weights singleframe_flow/snaps/snapshots_singleFrame_flow_v2_iter_50000.caffemodel
Done.



This is my changed sequence_input_layer.py and prototext files. My input train and test txts to the network is of this format.

I think the main problem is the ##rearrange the data: The LSTM takes inputs as [video0_frame0, video1_frame0,...] but the data is currently arranged as [video0_frame0, video0_frame1, ...]

I was not able to solve this is confused me quite a bit. But I might be wrong.

Jacob

unread,
Mar 24, 2016, 9:54:54 PM3/24/16
to Caffe Users
I am experiencing exactly the same issue.
Did you get any solutions yet? :(

Ashwin Nair

unread,
Mar 26, 2016, 12:05:51 PM3/26/16
to Caffe Users
Nope not yet.. plz let me know if u have any ideas..
Now I have used the pdb to debug the pyhon code. 
The error occurs after the setup is called. When it's called it seems to work well cause the self.thread_result has values.
The problem is after that when the forward is called the self.thread and the self.thread_result is empty.

Camille Dupont

unread,
May 4, 2016, 3:58:36 AM5/4/16
to Caffe Users
I removed the thread in sequence_input_layer.py:
result['data'] = pool.map(image_processor, im_info)
and replaced it with:
result['data'] = [None] * len(im_info)
j = 0
for im in im_info:
    result['data'][j] = image_processor(im)
    j+=1


It solves the problem for me.

dusa

unread,
May 13, 2016, 6:31:02 PM5/13/16
to Caffe Users

Hi! Since you guys have been using this code, I thought I could ask a related question here
I am trying to run the LSTM code for activity recognition by lisa -http://www.eecs.berkeley.edu/~lisa_anne/LRCN_video

I trained the singleframe and lstm models now I am at the classification stage. I have a Tesla with 2.0 computation capability so I did all training by using smaller batches, however now in classification, I am not sure how do I run with smaller set of inputs.

--> So I assume it takes 16 frames at once and according to this line - shape = (10*16, 3, 227, 227) - it does take them as batches of 10 as well - I don't know python but I have figured out this should be the batch and we need to use deploy.prototxt and write datalayer in code etc etc.

So! What exactly is this 10?? Is it taking 10 samples to average? Or is it keeping bits to refer to sequence of frames, or is it that we use the output of the CNNs as input for LSTM and the layer we use as input is 10*16(frames)?? It is also declared in the deploy:

name: "reshape-data"

type: "Reshape"

bottom: "fc6"

top: "fc6-reshape"

reshape_param{

shape{

dim: 16

dim: 10

dim: 4096

} (and a few similar layers like this)

--- what exactly is it and how can I make the classify run with smaller memory?

I also am not sure about this line's purpose with the 10s? For averaging? output_predictions[i:i+batch_size] = np.mean(out['probs'].reshape(10,caffe_in.shape[0]/10,3),0)

Much appreciated. Thanks!

edit: Also see the input in deploy:

name: "Hyb2Net-LSTM"

input: "data"

input_dim: 160

input_dim: 3

input_dim: 227

input_dim: 227

input: "clip_markers"

input_dim: 160

input_dim: 1

input_dim: 1

input_dim: 1

So again, why do we have 160 inputs? Considering 16 is the no. of frames from the video, is 10 the output of the layer we use as input to LSTM or what? Thx

Lisa

unread,
May 13, 2016, 9:31:49 PM5/13/16
to Caffe Users
Hey!

So if you see an error that looks like: KeyError: 'data' that means that somewhere in the python data layer there was an issue setting the data values.  This could be for a variety of reasons but if I were you I would setup a break point before line 110 (this line: result['data'] = pool.map(image_processor, im_info)) and double check that something is not failing in the image_processor. A common thing to fail is that the
path pointing to your extracted video frames is not correct.

Hopefully that makes sense and helps :-)

Lisa

unread,
May 13, 2016, 9:38:47 PM5/13/16
to Caffe Users
The Caffe LSTM takes input blobs which are TxNxF where T is the number of time steps, N is the batch size (so in this case number of videos) and F is the number of features (in this case 4096).  So there are 160 inputs because I have 10 videos with 16 frames each.  In the eval code I am taking 5 separate crops and flipping.  My final number is the average of all these clips.

All the reshapes are there because most layers in Caffe take layers with Nx... blobs, not TxN... blobs.  So if you have 10 videos with 16 frames each and want to put these frames into a convnet, you need to have a 160x3x227x227 blob containing your images not a TxNx3x227x227.  To run with a different batch size, you will need to replace my batch size with your batch size everywhere. 

dusa

unread,
May 17, 2016, 3:14:26 PM5/17/16
to Caffe Users
Hi Lisa! Thanks for your reply and clearing it up for me. So this is actually what I have been trying to do. I use the classify_video.py to test the lstm model I have trained on a single video to see it works. 

In deploy.prototxt for lstm I change the inputs as:

input: "data"
input_dim: 16
input_dim: 3
input_dim: 227
input_dim: 227
input: "clip_markers"
input_dim: 16
input_dim: 1
input_dim: 1
input_dim: 1

and the reshape layers as 

   dim: 16
      dim: 1
      dim: 4096

and

 dim: 16
      dim: 1

Then on the code, I change the part, in transformer as : shape = (1*16, 3, 227, 227) 

But then when I run the lstm classify on the video I get this error:

File "classify_video.py", line 81, in LRCN_classify_video
    output_predictions[i:i+clip_length] = np.mean(out['probs'],1)
ValueError: could not broadcast input array from shape (160,3) into shape (16,3)

Is this because it takes 5 crops, 2 mirrors still (oversample)??

Do I need to force it to use only one crop? Or just resize the image to 227x227 and use it as input??

I am sorry if this is a trivial question but I don't know python at all (I have been using the matlab wrapper), trying to figure out as I go and I have tried to comment out the oversampling part but it didn't work (possibly I get a wrong set of dimensions)

This is the part I have mentioned, and where the error above occurs:

 for i in range(0,len(input_data),clip_length):
    clip_input = input_data[i:i+clip_length]
    clip_input = caffe.io.oversample(clip_input,[227,227])
    clip_clip_markers = np.ones((clip_input.shape[0],1,1,1))
    clip_clip_markers[0:10,:,:,:] = 0
#    if is_flow:  #need to negate the values when mirroring
#      clip_input[5:,:,:,0] = 1 - clip_input[5:,:,:,0]
    caffe_in = np.zeros(np.array(clip_input.shape)[[0,3,1,2]], dtype=np.float32)
    for ix, inputs in enumerate(clip_input):
      caffe_in[ix] = transformer.preprocess('data',inputs)
    out = net.forward_all(data=caffe_in, clip_markers=np.array(clip_clip_markers))
    output_predictions[i:i+clip_length] = np.mean(out['probs'],1)
  return np.mean(output_predictions,0).argmax(), output_predictions


Thanks! Much appreciated

dusa

unread,
May 17, 2016, 3:21:46 PM5/17/16
to Caffe Users


ps: I could potentially call predict with a false flag to not oversample but I think the python code never makes that reference? Anyways, I think oversampling is the problem but I am not entirely sure.

dusa

unread,
May 18, 2016, 2:25:36 PM5/18/16
to Caffe Users
Update:

I am trying to return only the center crop with the oversample of Caffe using Python code the caffe.io.oversample function (not the classifier.py). II have tried to modify the code to return only the center crop however it still returns 10 instead of 1 crops. I have rebuilt the caffe and pycaffe however the situation is still the same. How can I get the python code to return only one crop??

Take
image: iterable of (H x W x K) ndarrays
crop_dims: (height, width) tuple for the crops.

Give
crops: (10*N x H x W x K) ndarray of crops for number of inputs N.
"""
# Dimensions and center.
im_shape = np.array(images[0].shape)
crop_dims = np.array(crop_dims)
center = im_shape[:2] / 2.0

# Make crop coordinates
# Take center crop.

        crop = np.tile(center, (1, 2))[0] + np.concatenate([
            -self.crop_dims / 2.0,
            self.crop_dims / 2.0
        ])
        crops = images[:, crop[0]:crop[2], crop[1]:crop[3], :]

return crops

boyihu...@gmail.com

unread,
Jun 29, 2016, 12:55:02 AM6/29/16
to Caffe Users
Hey, I have the same problem right now. I wonder have you solved this problem? Thank you.


On Thursday, March 24, 2016 at 11:36:25 AM UTC-4, Ashwin Nair wrote:

promasterliuss

unread,
Jun 30, 2016, 5:29:20 AM6/30/16
to Caffe Users
I had the same issue before. 

frames[0].split('.')[0] + '.%04d.jpg' in "sequence_input_layer.py" should be the path and name of data. 

In this case, the data should be saved in the form "/path/to/data/0001.jpg".

you can also modify the code by yourself.

lark...@gmail.com

unread,
Jul 5, 2016, 3:33:12 AM7/5/16
to Caffe Users
the main reason is that the sequence images are not loaded successfully.

please check the code /python/caffe/io.py. In io.py, the function skimage.io.imread(filename) can not read the image, you must change the image read function.

the following codes are my changes:


import cv2


 #img = skimage.img_as_float(skimage.io.imread(filename)).astype(np.float32)
im = cv2.imread(filename)
img = np.array(im, dtype=np.float32) / 255.0


hope it can help you.




在 2016年3月24日星期四 UTC+8下午11:36:25,Ashwin Nair写道:

Marcos Vinicius

unread,
Jul 6, 2016, 11:03:32 AM7/6/16
to Caffe Users
This solved the KeyError: 'data' problem for me, but I got now :

new_result_data = [None] * len(self.thread_result['data'])
TypeError: object of type 'numpy.float64' has no len()

jhon.smi...@gmail.com

unread,
Jul 31, 2016, 7:03:19 AM7/31/16
to Caffe Users
Hi All,



I am happy to ask a related question.

How to train and test LSTM from txt file



I jusr downloaded the lastest version of caffe.

I try to replicate the activity classifier using a list of images per activity in a txt file.

I allready extracted the video in to frames and currently I have 3 frames per activity.

I dowload the code of the LSTM and I try to train the classifier by using the layer ImageData , for instanse

layer {
  name: "activity_input"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  } 
  image_data_param {        
 source: "Training_Set.txt"
    batch_size: 16
 shuffle: false
 is_color: true     
  }
 
  transform_param {   
    mean_value: 104
    mean_value: 117
    mean_value: 123
  }
 
 
}

I really than anybode who can guide me how to list the frames in the Training_Set.txt .


Thanks,

Jhon

dusa

unread,
Apr 3, 2017, 11:33:06 PM4/3/17
to Caffe Users
Has anyone solved the error below for good? I had magically made the threading work but now my code decided it is not working again, I am at such an urgent phase too!

abolfazl taghribi

unread,
Apr 22, 2017, 10:11:07 AM4/22/17
to Caffe Users
I found the answer. According to what Lisa said I start debugging the code and I understand that the problem is related to the flow_frames and RGB_frames address. in line 157(this line--> video_dict[video]['frames'] = frames[0].split('.')[0] + '.%04d.jpg') when the code tries to get the address shape it cuts everything before the first dot. So if the address is started like this ../../frames it returns nothing! To solve this copy the frames and flow_images folder to the "lisa-caffe-public/examples/LRCN_activity_recognition" so in the address in the code doesn't need to be change(flow_frames = 'flow_images/'  RGB_frames = 'frames/') and the problem will be solved.
but unfortunately I face "out of memory" error. I have a NVIDIA 1080 gpu with 8G memory! does anyone know how much memory I need if I don't change the batch size?

Jude Larkin

unread,
Dec 6, 2019, 5:23:26 PM12/6/19
to Caffe Users
Thanks for posting your answer. I am currently experiencing the same out of memory issue when I try to run 'run_lstm_flow.sh' I have tried lowering the batch_size in 'evaluate_lstm.py' but it seemed to have no impact. I cant seem to find any other places in the code or prototxt files to adjust for this issue. Were you or anyone else able to find a solution?
Reply all
Reply to author
Forward
0 new messages