LSTM video example in Caffe, why are the frames fed "vertically"?

auro tripathy

unread,

Jun 14, 2016, 5:36:30 PM6/14/16

to Caffe Users

Hi,

I'm studying Jeff Donahue (et al.) activity recognition paper/implementation that uses a CMM+LSTM network(http://arxiv.org/abs/1411.4389).

The good news is, the code works fine in the latest Caffe tree with RNN and LSTM support.

The input layer is interesting, it introduced the notion of clip-markers to demarcate the beginning of a new sequence of labeled frames of a sequence.

The part that's interesting is, the sequences are fed into the network "vertically", not "horizontally. The comment in the python input layer says so:

file: https://github.com/LisaAnne/lisa-caffe-public/blob/lstm_video_deploy/examples/LRCN_activity_recognition/sequence_input_layer.py

comment: #rearrange the data: The LSTM takes inputs as [video0_frame0, video1_frame0,...] but the data is currently arranged as [video0_frame0, video0_frame1, ...]

My question is, what's the intuition/reason in feeding the clip sequences this way?

What's the implication in feeding the sequence "horizontally", i.e., sequence 1, followed by sequence 2?

To add a bit more info in the input layer...

Each sequence consists of 16 consecutive frames with a random starting point in the clip.

The batch size is 24 such sequences.

Then we go thru the transformation to create [video0_frame0, video1_frame0,...]

The start of each sequence is denoted by a clip marker (which also is reshaped)

If it helps to visualize the network, I've attached the draw_net output (top-bottom, left-right).

Thank you

Auro

lisa_lstm_RGB_TB.png

lisa_lstm_RGB.png

Mohammad Moradi

unread,

Aug 28, 2017, 7:14:55 AM8/28/17

to Caffe Users

Hi Aura, I'm studying the same paper. I want to use this network on my own data set but I don't really understand the python input layer. Do you know how sequences are fed to the net?

Pedja

unread,

Sep 2, 2017, 10:49:36 PM9/2/17

to Caffe Users

Hi,

I am also studying same Jeff's LRCN paper and Lisa example and would like to get to understand input layer as well. With the latest Caffe master and old /examples/LRCN_activity_recognition/ from Lisa fork, and instructions from https://people.eecs.berkeley.edu/~lisa_anne/LRCN_video, there is a problem in reading the network, train_test_singleFrame_RGB.prototxt. The error from caffe train or from draw_net or upgrade_net_proto_text, always comes back as "caffe.TransformationParameter" has no field named "flow". CNN with regular input layer from tutorials like /models/bvlc_reference_caffenet/train_val.prototxt, reads and trains fine and make runtest otherwise checks fine.

So the question is, is there any modification needed to the train_test_singleFrame_RGB.prototxt in order to run it with latest Caffe, since upgrade_net_proto_text cannot even open it? Or is something else needed to be borrowed from Lisa branch to run it with latest? Using Ubuntu 16.04, gcc 5.4.0, libprotobuf-dev, protobuf and protoc, all 2.6.1 and python 2.7.12. Per recent Caffe master release notes, LSTM has been merged, unfortunately there are no samples akin to LRCN under /models/ or /examples/ to be able to quickly confirm.

Cheers,
Pedja

$ caffe train -solver singleFrame_solver_RGB.prototxt -weights caffe_imagenet_hyb2_wr_rc_solver_sqrt_iter_310000
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 16:9: Message type "caffe.TransformationParameter" has no field named "flow".
F0902 20:15:40.932231 32269 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: train_test_singleFrame_RGB.prototxt
    @     0x7f460313b5cd google::LogMessage::Fail()
    @     0x7f460313d433 google::LogMessage::SendToLog()
    @     0x7f460313b15b google::LogMessage::Flush()
    @     0x7f460313de1e google::LogMessageFatal::~LogMessageFatal()
    @     0x7f46038dd311 caffe::ReadNetParamsFromTextFileOrDie()
    @     0x7f46039064bc caffe::Solver<>::InitTrainNet()
    @     0x7f46039077b5 caffe::Solver<>::Init()
    @     0x7f4603907adf caffe::Solver<>::Solver()
    @     0x7f46037568e1 caffe::Creator_SGDSolver<>()
    @     0x40ada8 train()
    @     0x4075a8 main
    @     0x7f4601f71830 __libc_start_main
    @     0x407e79 _start

$ python/draw_net.py train_test_singleFrame_RGB.prototxt drawing.png
File "../../python/draw_net.py", line 58, in <module>    main()
File "../../python/draw_net.py", line 44, in main    text_format.Merge(open(args.input_net_proto_file).read(), net)
File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 265, in Merge    return MergeLines(text.split('\n'), message)
File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 298, in MergeLines    _ParseOrMerge(lines, message, True)
File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 229, in _ParseOrMerge    _MergeField(tokenizer, message, allow_multiple_scalars)
File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 382, in _MergeField    _MergeField(tokenizer, sub_message, allow_multiple_scalars)
File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 382, in _MergeField    _MergeField(tokenizer, sub_message, allow_multiple_scalars)
File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 356, in _MergeField    message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 16:5 : Message type "caffe.TransformationParameter" has no field named "flow".

Sayeh Sharifi

unread,

Oct 23, 2017, 3:33:58 PM10/23/17

to Caffe Users

Hi,

I have exactly the same problem. Could you find any solution?

Sayeh

Don Novkov

unread,

Sep 1, 2018, 2:43:53 PM9/1/18

to Caffe Users

I had to tweak some code to get the tutorial https://people.eecs.berkeley.edu/~lisa_anne/LRCN_video to work on my system. It worked fine after I did this:

1. run_singleFrame_RGB.sh:
modify second line: TOOLS=/caffe/build/tools

2. train_test_singleFrame_RGB.prototxt:
image_data_param (2 places): root_folder: "/home/don/Documents/CNN-LSTM/UCF101/ExtractedFrames/" //replace root folder address to whatever you have

3. /caffe/src/caffe/proto/caffe.proto: (change your caffe.proto to add rows that are in lisa-caffe-public-lstm_video_deploy/src/caffe/proto/caffe.proto):
    cd /caffe/src/caffe/proto/caffe.proto
    sudo gedit caffe.proto
    add to 'message TransformationParameter':
    //will flip x flow if flow image input
    optional bool flow = 9 [default = false]; //change the index number as needed
    add to 'message ImageDataParameter':
    // Enforces that have minimum height and width; if not reshapes.
    optional uint32 min_height = 13 [default = 0]; //change the index number as needed
    optional uint32 min_width = 14 [default = 0]; //change the index number as needed
    save/exit

    cd /caffe/build //need to remake caffe after caffe.proto is changed, so that is next two lines:
    sudo cmake ../ -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF
    sudo make --jobs=4

4. singleFrame_solver_flow.prototxt:
change 'device_id: 1' to 'device_id: 0' //=1 would be for 2 GPUs in system, with 1 GPU this gives error 'invalid device ordinal'

Don Novkov

unread,

Sep 3, 2018, 12:40:36 PM9/3/18

to Caffe Users

also, do this for flow:

5. Modify run_singleFrame_flow.sh and train_test_singleFrame_flow.prototxt steps 2 and 3 above

6. Make additional modifications to train_test_singleFrame_flow.prototxt:
image_data_param (2 places): change min_height 227 to new_height 240
change min_width 227 to new_width 320

./run_singleFrame_flow.sh

Reply all

Reply to author

Forward