How to load weights from pkl-file after inserting batch-normalization layers?

490 views
Skip to first unread message

Kristofer Krus

unread,
Nov 29, 2016, 11:59:54 AM11/29/16
to lasagne-users

Hi,

I have a neural network with VGG16 architecture, and I used to load parameters from a file called 'vgg16.pkl' (obtained from Lasagne's model zoo) to initialize the network parameters with. Since there was a one-to-one mapping between network parameters (as obtained by using the layers.get_all_params function) and parameters in the pkl file (elements in the 'param values' entry of the object obtained from calling pickle.load(open('vgg16.pkl', 'rb'), encoding='latin1')), it was easy to assign the right values to the right network parameters – just use the zip function on the network parameters and the parameters in the pkl file and loop over the resulting iterable to find the matching pairs.

However, after adding a batch normalization layer to each convolution layer and each dense layer in the network (by wrapping those layers with the batch_norm function) the number of network parameters increased from 32 to 80 and therefore there isn't such a one-to-one mapping anymore.

So, how do I make sure I assign the right network parameters the right values, after having inserted batch normalization layers?

I guess the values of the bias-parameters of the wrapped layers won't have any effect anymore, since the addition of the bias terms will effectively be undone by the batch normalization anyway. But since the batch normalization layers also have bias terms, maybe those can be initialized with the bias parameters obtained from the pkl file instead? (Would that be a good idea?)

Regards

Kristofer

Jan Schlüter

unread,
Nov 29, 2016, 2:17:52 PM11/29/16
to lasagne-users

So, how do I make sure I assign the right network parameters the right values, after having inserted batch normalization layers?


The easiest will be to simply load the network parameters before adding the batch normalization layers. I.e., instead of changing the model definition in your Python code, use the original model definition and then write a general-purpose function that modifies/rebuilds the network to use batch normalization.

But since the batch normalization layers also have bias terms, maybe those can be initialized with the bias parameters obtained from the pkl file instead? (Would that be a good idea?)


If you want the pretrained weights to be of any use, you may want to ensure that the network with batch normalization initially computes about the same function as the network without batch normalization. This does not only entail retaining the biases, but also countering the effect of batch normalization by scaling the weights or gamma, which in turn requires you to know the expected batch normalization statistics at each layer for your training data. It's not an entirely easy job, but doable.

Cheers, Jan

Kristofer Krus

unread,
Nov 30, 2016, 10:25:34 AM11/30/16
to lasagne-users
Hi Jan, thanks for your answer.
 
The easiest will be to simply load the network parameters before adding the batch normalization layers. I.e., instead of changing the model definition in your Python code, use the original model definition and then write a general-purpose function that modifies/rebuilds the network to use batch normalization.

How do I write such a function most easily? I guess I can still use batch_norm to add a BatchNormLayer and a NonlinearityLayer on top of a convolution layer or a dense layer in the network, but I also need to find all layers that use that layer (either as 'incoming', which most layer types have as argument, or as 'layer' argument in InverseLayer) in order to modify them to use the NonlinearityLayer instead. How can I find all those layers? Or is there some easier way to rebuild the network to use batch normalization?
 
If you want the pretrained weights to be of any use, you may want to ensure that the network with batch normalization initially computes about the same function as the network without batch normalization. This does not only entail retaining the biases, but also countering the effect of batch normalization by scaling the weights or gamma, which in turn requires you to know the expected batch normalization statistics at each layer for your training data. It's not an entirely easy job, but doable.

I'm starting to think that maybe I don't need batch normalization for the pretrained layers as much as I do for the randomly initialized layers. (The VGG16 network I use is actually just a part of a larger network, SegNet, that has more layers, for which I don't have any pretrained weights and therefore do want batch normalization). Maybe the pretraining removes the need for batch normalization, since they shouldn't have to be trained as much anyway, but then I'm also training the network on a specific kind of images and for a different task than what the weights in the weight file were trained for. Do have any idea of whether batch normalization is necessary also for pretrained weights?

Regards

Kristofer

Jan Schlüter

unread,
Nov 30, 2016, 11:34:08 AM11/30/16
to lasagne-users
How do I write such a function most easily?

It would need to traverse the network and modify it. Something like:

def insert_batchnorm(network):
    layer = network
    parent = None
    while not isinstance(layer, InputLayer):
        if isinstance(layer, (DenseLayer, Conv2DLayer)):
            if parent is not None:
                parent.input_layer = batch_norm(layer)
            else:
                network = batch_norm(layer)
        parent = layer
        layer = layer.input_layer
    return network

This only handles a stack of layers, not arbitrary graphs, but it should get the idea across. You can add whatever code you need to retain the biases (probably by creating a custom batch_norm() function called from here).


I'm starting to think that maybe I don't need batch normalization for the pretrained layers as much as I do for the randomly initialized layers.

Yes, you can try adding batch normalization only for the layers you add on top (but also *before* the first layer you add on top, so its input gets normalized).

Best, Jan

Kristofer Krus

unread,
Dec 1, 2016, 12:14:15 PM12/1/16
to lasagne-users
Thanks Jan.

By the way, is there some easy way to get a layer by name for a certain network?

Regards

Kristofer

Jan Schlüter

unread,
Dec 2, 2016, 8:49:41 AM12/2/16
to lasagne-users
By the way, is there some easy way to get a layer by name for a certain network?

I think Python is easy enough:
thelayer = next(layer for layer in lasagne.layers.get_all_layers(network) if layer.name == 'thelayer')
This will raise a StopIteration if no such layer could be found.

Cheers, Jan

Kristofer Krus

unread,
Dec 22, 2016, 12:19:11 PM12/22/16
to lasagne-users
I wrote a function that seems to work for batch normalizing all Conv2DDNNLayers and DenseLayers in a network.

def batchNormalizedNetwork(net, exceptionHeads=None):
import lasagne.layers
from lasagne.layers import DenseLayer, batch_norm
from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer

# Make a list of layers not to normalize
exceptionLayers = lasagne.layers.get_all_layers(exceptionHeads)

# Create new layers with batch normalization
newLayers = {}
for name, layer in net.items():
key = id(layer)
newLayers[key] = layer
if layer not in exceptionLayers and isinstance(layer, (DenseLayer, ConvLayer)):
newLayers[key] = batch_norm(layer)

# Update the new (and normalized) layers to take other normalized layers as inputs instead of non-normalized
for name, layer in newLayers.items():
for attributeName, attributeValue in layer.__dict__.items():
if id(attributeValue) in newLayers:
layer.__dict__[attributeName] = newLayers[id(attributeValue)]
else:
try:
# Handle iterable attributes, for example the input_layers attribute in InverseLayer
iterator = iter(attributeValue)
for idx, layerHopefully in enumerate(iterator):
if id(layerHopefully) in newLayers:
attributeValue[idx] = newLayers[id(layerHopefully)]
except:
pass

# Build a new network with the batch normalized layers
newNet = {}
for name, layer in net.items():
newNet[name] = newLayers[id(layer)]

return newNet


net is a dictionary that maps keys (layer names) to layers in the network, and the function will output a new dictionary mapping the same keys to the corresponding layers in a network with all convolution layers and fully connected layers batch normalized.

I have assumed that for a layer A, every layer B that it is connected to either is an attribute of A (hence B is an element in the __dict__ attribute of A), or is an element of an iterable attribute of A (although it doesn't seem to be possible to iterate over all attributes that can be converted with the iter function, for some reason). I don't know if that assertion always holds, but if it does I think this function should work. Also, I don't know whether there are any more layers than convolution layers and fully connected layers than need to be batch normalized, but the list of layers to batch normalize could definitely be extended from just Dense2DDNNLayer and ConvLayer.

What do you think about this solution? Can you spot any possible errors in the implementation or pitfalls I have fallen into, or would you perhaps suggest some completely other approach? How would you go ahead to find all layers a specific layer depends upon? I have tried to write this function to be more robust to different network architectures. Also, I don't know whether a dictionary is the best way to store a network but that is what I have used so far and it has worked for me.

Regards

Kristofer
Reply all
Reply to author
Forward
0 new messages