MultiSources : fusion networks architecture

Sabako

unread,

Dec 7, 2016, 2:23:41 PM12/7/16

to Caffe Users

Hello everybody,

I have a question about Multi-source in caffe,

In fact, I created an architecture of 3 networks and I made a concatenation in a layer (concat) that merges the three networks,

the network definition has 3 layers data (Train & TEST) for 3 lmdb folders data (Train_lmdb & Validation_lmdb),

My problem is, for doing the test i need to take 3 images, each image is an input of a one network,

how I can do the calculations to predict the classification ?

Best regards

Patrick McNeil

unread,

Dec 8, 2016, 8:51:56 AM12/8/16

to Caffe Users

Sabako,

I am doing a similar function in my research.

When you have multiple inputs into a single network (three in your case), you need to make sure the images in each LMDB in the same order. For example, if you have three images (front, side, and top view for example) of an object, when you put the images into each database, they need to be at the same entry number in the LMDB. Then, you can use the label from one of the input sources (since they should all have the same label) to calculate the error through the network. Otherwise you end up having to pass all three labels to the end (I have not tried that method, but it may work).

Here is the basic method I have used:

layer {
name: "MOD1_data"
type: "Data"
top: "MOD1_data"
top: "label"
include {
    phase: TRAIN
}
transform_param {
    mean_file: "/data/models/image_mean.binaryproto"
}
data_param {
    source: "/ssd2/final/output/dataset/MOD3/training/MOD3-clean-Array1"
    batch_size: 4
        backend: LMDB
}
}

< Rest of the network for the first input>

layer {
name: "MOD2_data"
type: "Data"
top: "MOD2_data"
include {
    phase: TRAIN
}
transform_param {
    mean_file: "/data/models/image_mean.binaryproto"
}
data_param {
    source: "/ssd2/final/output/dataset/MOD3/training/MOD3-clean-Array2"
    batch_size: 4
        backend: LMDB
}
}

< Rest of the network for the second input>

layer {
name: "MOD3_data"
type: "Data"
top: "MOD3_data"
include {
    phase: TRAIN
}
transform_param {
    mean_file: "/data/models/image_mean.binaryproto"
}
data_param {
    source: "/ssd2/final/output/dataset/MOD3/training/MOD3-clean-Array3"
    batch_size: 4
        backend: LMDB
}
}

< Rest of the network for the third input>

layer {
name: "FINAL_CONCAT"
type: "Concat"
bottom: "MOD1_output"
bottom: "MOD2_output"
bottom: "MOD3_output"
top: "FINAL_CONCAT"
}

<Put your loss calculations here>

I would recommend looking at your network architecture and see if the concatenation is the best architecture. From an efficiency standpoint, you will want to extract the relevant features from each of your inputs and then use the extracted features to determine final output. If you take a standard model (GoogleNet, Alexnet, etc.) and just concatenate the output you may not get correlation between your inputs. You will want to look at when the relevant features are extracted and then combine the relevant features earlier in the process and continue your architecture.

Here is a good reference of what I am trying to describe:

Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., & Ng, A. Y. (2011). Multimodal deep learning. Proceedings of the 28th International Conference on Machine Learning (ICML-11), 689-696.

http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Ngiam_399.pdf

Keeping everything else the same, I got almost 10% better recognition performance combining the extracted features instead of combing the final outputs.

Patrick

Sabako

unread,

Dec 8, 2016, 1:07:16 PM12/8/16

to Caffe Users

Hello Patrick,

Thank you very much for your answer,

Actually it is the same thing I did, I kept the same order for the three data (for my case it is x, y and z),

Here are the two attached files (train_val_m3.prototxt & deploy_m3.prototxt) of network definition,

but the question is : How can I do the test with deploy_m3.prototxt with the 3 inputs data ?

Best regards

deploy_m3.prototxt

train_val_m3.prototxt

Patrick McNeil

unread,

Dec 8, 2016, 2:59:13 PM12/8/16

to Caffe Users

I believe you would have to use one of the programmatic interfaces to do that (Python / C++). I have not found a way to implement a model with the deploy.prototxt with multiple inputs without creating a program and reading the network.

If you are just comparing the performance against a known set (or trying to figure out the best architecture on your current dataset), you could create a "test.prototxt" which is the same as the train_val_m3.prototxt that you current have with the "test" data source for each input defined. Since I am working on different architectures, I used the second method.

In order to execute I use the following shell script:

#!/usr/bin/env sh
echo "Starting testing on: PROTOTXT"
echo "Using model: CAFFEMODEL"
CAFFE=/usr/local/src/caffe/
$CAFFE/build/tools/caffe test \
    -gpu 1 \
    --model=PROTOTXT \
    --weights=CAFFEMODEL

I then substitute the variables:

PROTOTXT = testing.prototxt
CAFFEMODEL = trainedmodel.caffemodel

For one off testing, you basically need to setup the data in a particular layer to source for each input. Unfortunately, I don't have a working example of this right now (I started one Python, but have not yet completed it).

Patrick

Reply all

Reply to author

Forward