LSTM video example in Caffe, why are the frames fed "vertically"?

1,875 views
Skip to first unread message

auro tripathy

unread,
Jun 14, 2016, 5:36:30 PM6/14/16
to Caffe Users
Hi,

I'm studying  Jeff Donahue (et al.) activity recognition paper/implementation that uses a CMM+LSTM network(http://arxiv.org/abs/1411.4389).  

The good news is, the code works fine in the latest Caffe tree with RNN and LSTM support. 

The input layer is interesting, it introduced the notion of clip-markers to demarcate the beginning of a new sequence of labeled frames of a sequence. 

The part that's interesting is, the sequences are fed into the network "vertically", not "horizontally. The comment in the python input layer says so:

    comment:  #rearrange the data: The LSTM takes inputs as [video0_frame0, video1_frame0,...] but the data is currently arranged as [video0_frame0, video0_frame1, ...]

My question is, what's the intuition/reason in feeding the clip sequences this way?
What's the implication in feeding the sequence "horizontally", i.e., sequence 1, followed by  sequence 2?

To add a bit more info in the input layer...
Each sequence consists of 16 consecutive frames with a random starting point in the clip.
The batch size is 24 such sequences. 
Then we go thru the transformation to create [video0_frame0, video1_frame0,...]
The start of each sequence is denoted by a clip marker (which also is reshaped)

If it helps to visualize the network, I've attached the draw_net output (top-bottom, left-right).


Thank you
Auro


lisa_lstm_RGB_TB.png
lisa_lstm_RGB.png

Mohammad Moradi

unread,
Aug 28, 2017, 7:14:55 AM8/28/17
to Caffe Users
Hi Aura, I'm studying the same paper. I want to use this network on my own data set but I don't really understand the python input layer. Do you know how sequences are fed to the net?

Pedja

unread,
Sep 2, 2017, 10:49:36 PM9/2/17
to Caffe Users
Hi,

I am also studying same Jeff's LRCN paper and Lisa example and would like to get to understand input layer as well. With the latest Caffe master and old /examples/LRCN_activity_recognition/ from Lisa fork, and instructions from https://people.eecs.berkeley.edu/~lisa_anne/LRCN_video, there is a problem in reading the network, train_test_singleFrame_RGB.prototxt. The error from caffe train or from draw_net or upgrade_net_proto_text, always comes back as "caffe.TransformationParameter" has no field named "flow". CNN with regular input layer from tutorials like /models/bvlc_reference_caffenet/train_val.prototxt, reads and trains fine and make runtest otherwise checks fine.

So the question is, is there any modification needed to the train_test_singleFrame_RGB.prototxt in order to run it with latest Caffe, since upgrade_net_proto_text cannot even open it? Or is something else needed to be borrowed from Lisa branch to run it with latest? Using Ubuntu 16.04, gcc 5.4.0, libprotobuf-dev, protobuf and protoc, all 2.6.1 and python 2.7.12. Per recent Caffe master release notes, LSTM has been merged, unfortunately there are no samples akin to LRCN under /models/ or /examples/ to be able to quickly confirm.

Cheers,
Pedja


$ caffe train -solver singleFrame_solver_RGB.prototxt -weights caffe_imagenet_hyb2_wr_rc_solver_sqrt_iter_310000
[libprotobuf ERROR google/protobuf/text_format.cc:274] Error parsing text-format caffe.NetParameter: 16:9: Message type "caffe.TransformationParameter" has no field named "flow".
F0902 20:15:40.932231 32269 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: train_test_singleFrame_RGB.prototxt
    @     0x7f460313b5cd  google::LogMessage::Fail()
    @     0x7f460313d433  google::LogMessage::SendToLog()
    @     0x7f460313b15b  google::LogMessage::Flush()
    @     0x7f460313de1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f46038dd311  caffe::ReadNetParamsFromTextFileOrDie()
    @     0x7f46039064bc  caffe::Solver<>::InitTrainNet()
    @     0x7f46039077b5  caffe::Solver<>::Init()
    @     0x7f4603907adf  caffe::Solver<>::Solver()
    @     0x7f46037568e1  caffe::Creator_SGDSolver<>()
    @     0x40ada8  train()
    @     0x4075a8  main
    @     0x7f4601f71830  __libc_start_main
    @     0x407e79  _start

$ python/draw_net.py train_test_singleFrame_RGB.prototxt drawing.png
  File "../../python/draw_net.py", line 58, in <module>    main()
  File "../../python/draw_net.py", line 44, in main    text_format.Merge(open(args.input_net_proto_file).read(), net)
  File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 265, in Merge    return MergeLines(text.split('\n'), message)
  File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 298, in MergeLines    _ParseOrMerge(lines, message, True)
  File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 229, in _ParseOrMerge    _MergeField(tokenizer, message, allow_multiple_scalars)
  File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 382, in _MergeField    _MergeField(tokenizer, sub_message, allow_multiple_scalars)
  File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 382, in _MergeField    _MergeField(tokenizer, sub_message, allow_multiple_scalars)
  File "/home/username/.local/lib/python2.7/site-packages/google/protobuf/text_format.py", line 356, in _MergeField    message_descriptor.full_name, name))
google.protobuf.text_format.ParseError: 16:5 : Message type "caffe.TransformationParameter" has no field named "flow".

Sayeh Sharifi

unread,
Oct 23, 2017, 3:33:58 PM10/23/17
to Caffe Users
Hi,

I have exactly the same problem. Could you find any solution?
 
Sayeh

Don Novkov

unread,
Sep 1, 2018, 2:43:53 PM9/1/18
to Caffe Users
I had to tweak some code to get the tutorial https://people.eecs.berkeley.edu/~lisa_anne/LRCN_video to work on my system. It worked fine after I did this:

1. run_singleFrame_RGB.sh:
    modify second line: TOOLS=/caffe/build/tools

2. train_test_singleFrame_RGB.prototxt:
    image_data_param (2 places):  root_folder: "/home/don/Documents/CNN-LSTM/UCF101/ExtractedFrames/"  //replace root folder address to whatever you have

3. /caffe/src/caffe/proto/caffe.proto: (change your caffe.proto to add rows that are in lisa-caffe-public-lstm_video_deploy/src/caffe/proto/caffe.proto):
    cd /caffe/src/caffe/proto/caffe.proto
    sudo gedit caffe.proto
    add to 'message TransformationParameter':
        //will flip x flow if flow image input   
        optional bool flow = 9 [default = false]; //change the index number as needed
    add to 'message ImageDataParameter':
      // Enforces that have minimum height and width; if not reshapes.
      optional uint32 min_height = 13 [default = 0]; //change the index number as needed
      optional uint32 min_width = 14 [default = 0];  //change the index number as needed
    save/exit
    cd /caffe/build  //need to remake caffe after caffe.proto is changed, so that is next two lines:
    sudo cmake ../ -DCUDA_USE_STATIC_CUDA_RUNTIME=OFF
    sudo make --jobs=4

4. singleFrame_solver_flow.prototxt:
    change 'device_id: 1' to 'device_id: 0'  //=1 would be for 2 GPUs in system, with 1 GPU this gives error 'invalid device ordinal'

Don Novkov

unread,
Sep 3, 2018, 12:40:36 PM9/3/18
to Caffe Users
also, do this for flow:

5. Modify run_singleFrame_flow.sh and train_test_singleFrame_flow.prototxt steps 2 and 3 above

6. Make additional modifications to train_test_singleFrame_flow.prototxt:
    image_data_param (2 places):    change min_height 227 to new_height 240
                    change min_width 227 to new_width 320

./run_singleFrame_flow.sh
Reply all
Reply to author
Forward
0 new messages