I am trying to reproduce the Long-term Recurrent Convolutional Networks paper.
I have used their given code. And followed their instructions and generated the single frame model. But when trying to train the LSTM hybrid network it fails. I have already made necessary changes as mentioned in the instructions.
The command i run is the caffe train -solver lstm_solver_flow.prototxt -weights singleframe_flow/snaps/snapshots_singleFrame_flow_v2_iter_50000.caffemodel
the output I get is
I0323 18:16:30.685951 9123 net.cpp:205] This network produces output loss
I0323 18:16:30.685967 9123 net.cpp:446] Collecting Learning Rate and Weight Decay.
I0323 18:16:30.685976 9123 net.cpp:218] Network initialization done.
I0323 18:16:30.685982 9123 net.cpp:219] Memory required for data: 817327112
I0323 18:16:30.686339 9123 solver.cpp:42] Solver scaffolding done.
I0323 18:16:30.686388 9123 caffe.cpp:86] Finetuning from singleframe_flow/snaps/snapshots_singleFrame_flow_v2_iter_50000.caffemodel
I0323 18:16:33.377488 9123 solver.cpp:247] Solving lstm_joints
I0323 18:16:33.377518 9123 solver.cpp:248] Learning Rate Policy: step
I0323 18:16:33.391726 9123 solver.cpp:291] Iteration 0, Testing net (#0)
Traceback (most recent call last):
File "/home/anilil/projects/lstm/lisa-caffe-public/examples/LRCN_activity_recognition/sequence_input_layer.py", line 220, in forward
new_result_data = [None]*len(self.batch_advancer.result['data'])
KeyError: 'data'
terminate called after throwing an instance of 'boost::python::error_already_set'
*** Aborted at 1458753393 (unix time) try "date -d @1458753393" if you are using GNU date ***
PC: @ 0x7f243731bcc9 (unknown)
*** SIGABRT (@0x23a3) received by PID 9123 (TID 0x7f24389077c0) from PID 9123; stack trace: ***
@ 0x7f243731bd40 (unknown)
@ 0x7f243731bcc9 (unknown)
@ 0x7f243731f0d8 (unknown)
@ 0x7f2437920535 (unknown)
@ 0x7f243791e6d6 (unknown)
@ 0x7f243791e703 (unknown)
@ 0x7f243791e976 (unknown)
@ 0x7f2397bb5bfd caffe::PythonLayer<>::Forward_cpu()
@ 0x7f243821d87f caffe::Net<>::ForwardFromTo()
@ 0x7f243821dca7 caffe::Net<>::ForwardPrefilled()
@ 0x7f243822fd77 caffe::Solver<>::Test()
@ 0x7f2438230636 caffe::Solver<>::TestAll()
@ 0x7f243823837b caffe::Solver<>::Step()
@ 0x7f2438238d5f caffe::Solver<>::Solve()
@ 0x4071c8 train()
@ 0x405701 main
@ 0x7f2437306ec5 (unknown)
@ 0x405cad (unknown)
@ 0x0 (unknown)
run_lstm_flow.sh: line 8: 9123 Aborted (core dumped) GLOG_logtostderr=1 $TOOLS/caffe train -solver lstm_solver_flow.prototxt -weights singleframe_flow/snaps/snapshots_singleFrame_flow_v2_iter_50000.caffemodel
Done.
This is my changed sequence_input_layer.py and prototext files. My input train and test txts to the network is of this format.
I think the main problem is the ##rearrange the data: The LSTM takes inputs as [video0_frame0, video1_frame0,...] but the data is currently arranged as [video0_frame0, video0_frame1, ...]
I was not able to solve this is confused me quite a bit. But I might be wrong.
Hi! Since you guys have been using this code, I thought I could ask a related question here
I am trying to run the LSTM code for activity recognition by lisa -http://www.eecs.berkeley.edu/~lisa_anne/LRCN_video
I trained the singleframe and lstm models now I am at the classification stage. I have a Tesla with 2.0 computation capability so I did all training by using smaller batches, however now in classification, I am not sure how do I run with smaller set of inputs.
--> So I assume it takes 16 frames at once and according to this line - shape = (10*16, 3, 227, 227) - it does take them as batches of 10 as well - I don't know python but I have figured out this should be the batch and we need to use deploy.prototxt and write datalayer in code etc etc.
So! What exactly is this 10?? Is it taking 10 samples to average? Or is it keeping bits to refer to sequence of frames, or is it that we use the output of the CNNs as input for LSTM and the layer we use as input is 10*16(frames)?? It is also declared in the deploy:
name: "reshape-data"
type: "Reshape"
bottom: "fc6"
top: "fc6-reshape"
reshape_param{
shape{
dim: 16
dim: 10
dim: 4096
} (and a few similar layers like this)
--- what exactly is it and how can I make the classify run with smaller memory?
I also am not sure about this line's purpose with the 10s? For averaging? output_predictions[i:i+batch_size] = np.mean(out['probs'].reshape(10,caffe_in.shape[0]/10,3),0)
Much appreciated. Thanks!
edit: Also see the input in deploy:
name: "Hyb2Net-LSTM"
input: "data"
input_dim: 160
input_dim: 3
input_dim: 227
input_dim: 227
input: "clip_markers"
input_dim: 160
input_dim: 1
input_dim: 1
input_dim: 1
So again, why do we have 160 inputs? Considering 16 is the no. of frames from the video, is 10 the output of the layer we use as input to LSTM or what? Thx
Take
image: iterable of (H x W x K) ndarrays
crop_dims: (height, width) tuple for the crops.
Give
crops: (10*N x H x W x K) ndarray of crops for number of inputs N.
"""
# Dimensions and center.
im_shape = np.array(images[0].shape)
crop_dims = np.array(crop_dims)
center = im_shape[:2] / 2.0
# Make crop coordinates
# Take center crop.
crop = np.tile(center, (1, 2))[0] + np.concatenate([
-self.crop_dims / 2.0,
self.crop_dims / 2.0
])
crops = images[:, crop[0]:crop[2], crop[1]:crop[3], :]
return crops
Has anyone solved the error below for good? I had magically made the threading work but now my code decided it is not working again, I am at such an urgent phase too!