Hi,
I have a neural network with VGG16 architecture, and I used to load parameters from a file called 'vgg16.pkl' (obtained from Lasagne's
model zoo) to initialize the network parameters with. Since there was a one-to-one mapping between network
parameters (as obtained by using the layers.get_all_params
function) and parameters in the pkl file (elements in the 'param values' entry of the object obtained from calling pickle.load(open('vgg16.pkl', 'rb'), encoding='latin1')), it
was easy to assign the right values to the right network parameters – just use
the zip function on the network parameters and the parameters in the pkl file and
loop over the resulting iterable to find the matching pairs.
However, after adding a batch normalization layer to each convolution layer and
each dense layer in the network (by wrapping those layers with the batch_norm function) the number of network parameters
increased from 32 to 80 and therefore there isn't such a one-to-one mapping anymore.
So, how do I make sure I assign the right network parameters the right values,
after having inserted batch normalization layers?
I guess the values of the bias-parameters of the wrapped layers won't
have any effect anymore, since the addition of the bias terms will effectively be
undone by the batch normalization anyway. But since the batch normalization
layers also have bias terms, maybe those can be initialized with the bias parameters obtained from the pkl file instead? (Would that be a good idea?)
Regards
Kristofer
So, how do I make sure I assign the right network parameters the right values, after having inserted batch normalization layers?
But since the batch normalization layers also have bias terms, maybe those can be initialized with the bias parameters obtained from the pkl file instead? (Would that be a good idea?)
The easiest will be to simply load the network parameters before adding the batch normalization layers. I.e., instead of changing the model definition in your Python code, use the original model definition and then write a general-purpose function that modifies/rebuilds the network to use batch normalization.
If you want the pretrained weights to be of any use, you may want to ensure that the network with batch normalization initially computes about the same function as the network without batch normalization. This does not only entail retaining the biases, but also countering the effect of batch normalization by scaling the weights or gamma, which in turn requires you to know the expected batch normalization statistics at each layer for your training data. It's not an entirely easy job, but doable.
How do I write such a function most easily?
I'm starting to think that maybe I don't need batch normalization for the pretrained layers as much as I do for the randomly initialized layers.
By the way, is there some easy way to get a layer by name for a certain network?
def batchNormalizedNetwork(net, exceptionHeads=None):
import lasagne.layers
from lasagne.layers import DenseLayer, batch_norm
from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer
# Make a list of layers not to normalize
exceptionLayers = lasagne.layers.get_all_layers(exceptionHeads)
# Create new layers with batch normalization
newLayers = {}
for name, layer in net.items():
key = id(layer)
newLayers[key] = layer
if layer not in exceptionLayers and isinstance(layer, (DenseLayer, ConvLayer)):
newLayers[key] = batch_norm(layer)
# Update the new (and normalized) layers to take other normalized layers as inputs instead of non-normalized
for name, layer in newLayers.items():
for attributeName, attributeValue in layer.__dict__.items():
if id(attributeValue) in newLayers:
layer.__dict__[attributeName] = newLayers[id(attributeValue)]
else:
try:
# Handle iterable attributes, for example the input_layers attribute in InverseLayer
iterator = iter(attributeValue)
for idx, layerHopefully in enumerate(iterator):
if id(layerHopefully) in newLayers:
attributeValue[idx] = newLayers[id(layerHopefully)]
except:
pass
# Build a new network with the batch normalized layers
newNet = {}
for name, layer in net.items():
newNet[name] = newLayers[id(layer)]
return newNet