Sathisha Basavaraju

unread,

Jan 5, 2017, 10:02:03 AM1/5/17

to Caffe Users

How to load two pre-trained different models into a new model to realize finetune operation?

Patrick McNeil

unread,

Jan 6, 2017, 9:29:02 AM1/6/17

to Caffe Users

I am not sure exactly what you mean by load two pre-trained different models into a new model.

If you mean you have two pre-trained versions of a model and want to combine them into a new model, you would need to load each network and then figure out how to combine the models. This would involve taking the weights from each network and somehow combine them. Maybe you could use a process similar to what Caffe does when performing updates across multiple GPUs. I am not sure there is a good theoretical method for doing this. Within Caffe it would be relatively easy to perform mathematical operations (add/subtract/etc.), but I am not sure how the performance would be of the resulting model.

If you mean you have two different networks that you want to combine into a single network, that is reltatively straightforward using the Net Surgery (http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb) method. This process basically requires the following:

Create and train each of the separate networks you want to use as the source (NET_A and NET_B for reference)
Create the combined network (NET_C for reference) - This network should contain the parts of the first two networks you want to use as a source
Load NET_C (read the PROTOTXT file or create the network programmatically)
For each source network (NET_A and NET_B):

Load the network (PROTOTXT and pre-trained model (CAFFEMODEL file)
Iterate through the source network layers and update the associated NET_C layer

Save the new NET_C network

I used this method for my research it worked pretty well just following the basic Net Surgery page as a reference.

The following is an excerpt of my basic setup I used (I am kind of a newbie at Python so I am sure there are better ways to do this):

#!/usr/bin/python
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import sys
# Import and setup the caffe environment
caffe_root = '/usr/local/src/caffe'
sys.path.insert(0, caffe_root + 'python')
import caffe

# Create the modality layer
def updateModality(modality, source, destination):
MOD = modality + '_'
print 'Performing update for modality: ' + MOD
for layer in source.blobs:
    if any(MOD + layer in s for s in destination.blobs):
      try:
        params = source.params[layer][WEIGHTS].data, source.params[layer][BIASES].data
        destination.params[MOD + layer][WEIGHTS].flat = params[WEIGHTS].flat
        destination.params[MOD + layer][BIASES].flat = params[BIASES].flat
        print 'Updated layer: ' + MOD + layer
      except KeyError, e:
        print '\tNo weights for layer: ' + layer
      except Exception, e:
        print '\tCaught a different exception: ' + str(e)

# Setup the Caffe Environment
caffe.set_device(0)
caffe.set_mode_gpu()
#caffe.set_mode_cpu()
WEIGHTS = 0
BIASES = 1
if (len(sys.argv) != 4):
   print 'Error! invalid number of command line arguments:' + str(len(sys.argv))
   sys.exit()
SPROTO = sys.argv[1]
SMODEL = sys.argv[2]
DEST   = sys.argv[3]

# Read in the network
trained_net = caffe.Net(SPROTO, SMODEL, caffe.TRAIN)
update_net = caffe.Net(DEST, caffe.TRAIN)
modalities = ['ARR1', 'ARR2', 'ARRAY1', 'ARRAY2', 'HEAD0', 'HEAD1', 'HEAD2', 'HEAD3', 'LAPEL0', 'LAPEL1', 'LAPEL2', 'LAPEL3']
for mod in modalities:
updateModality(mod, trained_net, update_net)
print 'Saving the model...'
update_net.save('updated-' + DEST + '.caffemodel')

In my case, I was using a single source network for the pre-training component, but I used multiple copies to update a single network with duplicates of certain layer parameters. In my application have multiple input sources and wanted to use a pre-trained model for each input modality type. I did this across the different modalities defined the "modalities" list. In the source, I used the GoogLeNet as the reference network. For the NET_C version of my updates (update_net in the code), I pre-pended the networks to update with the names in the "modalities" list.

Using the "updateModality" function, the update_net (NET_C) layers get updated for the given modality.

This method could also be used to combine the layers (mathematically) if that is what you need to do. Just find the layers you are looking (in NET_A and NET_B) to combine and then perform your operations on them. Then save the results in the appropriate layer in NET_C.

After you are finished with your layer updates, you would just need to run the fine-tuning operations.

Hopefully, this helps.

Patrick

niraj goel

unread,

Apr 17, 2018, 8:06:12 PM4/17/18

to Caffe Users

Hi Patrick,

First thanks for this great answer and the code snippet.

I am new to caffe and was trying to combine two (different) models. After creating the prototxt file for the combined network, I was trying to copy the weights from the original trained models(.caffemodel). I was wondering what is the reason for using (iterating over) "source.blobs" and "destination.blobs" ? Can't (Shouldn't) it be "source.params" and "destination.params" ? I tried finding the difference and understood in vaguely, It would be great if you can clarify.

When I iterate over blobs, those exceptions are raised as the names (blobs) are not present in the params attribute. Also, in my case I observe that the blob does not contain names of all the layers in the prototxt file (not sure if it is expected to contain).

Also, After the new network (.caffemodel) is created with the intialized weights from other models, Is there a way to verify that it was actually initialized ?

Thanks in advance

Xun Victor

unread,

Apr 19, 2018, 7:41:14 AM4/19/18

to Caffe Users

Hi,

You may want to have a look at this tutorial on Net surgery:
https://github.com/BVLC/caffe/blob/master/examples/net_surgery.ipynb

If you want to check the correctness of your transplant, you can proceed as follow:

Suppose you have loaded a net with:

net = caffe.Net('net_surgery/conv.prototxt', caffe.TEST)

You may want to look at the weights with: net.params.["layer_name"][i].data where layer_name is the caffe name of the weight layer

Patrick McNeil

unread,

Apr 19, 2018, 9:50:06 AM4/19/18

to Caffe Users

In my case, I was using multiple modalities (audio spectrograms) from different audio sources. I used a pre-trained data set to jump start the training process. In my model, I had the network split into three different network modules based off the inception network architecture.

As I understand it, the blob is the actual data (http://caffe.berkeleyvision.org/tutorial/net_layer_blob.html) being processed by Caffe. The parameters are actually contained within the blob in a separate structure and make it easier to identify and manipulate the layers.

Can you print out the list of blobs? I had some initial issues getting the names correctly defined because when I created the model I didn't have a solid naming convention in place.

You should be able to review the initialized weights before and after the copy in the .caffemodel and see the difference.

Patrick

niraj goel

unread,

Apr 23, 2018, 2:40:34 PM4/23/18

to Caffe Users

Hi

Thanks a lot for the responses. I was able to copy the weights.

However I still have some doubts, I will post them shorty with some output snippets.

Can caffe simultaneously load 2 models in a finetuning network?

Sathisha Basavaraju

How to load two pre-trained different models into a new model to realize finetune operation?

Patrick McNeil

niraj goel

Xun Victor

Patrick McNeil

niraj goel