L2 normalization layer

Raúl Gombru

unread,

Jan 15, 2018, 6:48:29 AM1/15/18

to Caffe Users

Hello,

I need to apply L2 normalization to a CNN fc output before computing a ranking loss. I've tried different implementations but, while without normalizing the training works fine, when normalizing at some point the training crashes.

Is there any Caffe or PyCaffe implementation of an L2 normalization layer?

I have seen the forks posted here https://github.com/BVLC/caffe/issues/1224 but I can't manage to compile them with my Caffe.

Raúl

Przemek D

unread,

Jan 15, 2018, 7:06:06 AM1/15/18

to Caffe Users

How about using a Reduction layer? It allows you to compute a single scalar value, reducing a whole blob to it. Among its modes of operation there is SUMSQ, which should achieve what you want.
Add loss_weight: 1.0 to its prototext definition (change the value if needed) to make this become a loss function.

Raúl Gombru

unread,

Jan 15, 2018, 8:10:59 AM1/15/18

to Caffe Users

But this would reduce the dimensionality of my vector to a scalar. I need a layer that l2 normalizes the vector, not a layer that outputs the l2 norm.

Message has been deleted

Przemek D

unread,

Jan 15, 2018, 9:11:02 AM1/15/18

to Caffe Users

This is what the loss_weight bit is for - it makes your layer become a loss function. Its output is treated as a loss value, and used during backpropagation time to generate gradients.

Raúl Gombru

unread,

Jan 15, 2018, 9:26:39 AM1/15/18

to Caffe Users

I found a solution that concatenates simpler layers to l2 normalize a vector:

from caffe import layers as L, params as P

def l2normed(vec, dim):
    """Returns L2-normalized instances of vec; i.e., for each instance x in vec,
    computes  x / ((x ** 2).sum() ** 0.5). Assumes vec has shape N x dim."""
    denom = L.Reduction(vec, axis=1, operation=P.Reduction.SUMSQ)
    denom = L.Power(denom, power=(-0.5))
    denom = L.Reshape(denom, num_axes=0, axis=-1, shape=dict(dim=[1]))
    denom = L.Tile(denom, axis=1, tiles=dim)
    return L.Eltwise(vec, denom, operation=P.Eltwise.PROD)

Which in a prototxt definition is:

layer {
  name: "denom"
  type: "Reduction"
  bottom: "fc5"
  top: "denom"
  reduction_param {
    operation: SUMSQ
    axis: 1
  }
}
layer {
  name: "power"
  type: "Power"
  bottom: "denom"
  top: "power"
  power_param {
    power: -0.5
    shift: 9.99999996004e-13
  }
}
layer {
  name: "reshape"
  type: "Reshape"
  bottom: "power"
  top: "reshape"
  reshape_param {
    shape {
      dim: 1
    }
    axis: -1
    num_axes: 0
  }
}
layer {
  name: "tile"
  type: "Tile"
  bottom: "reshape"
  top: "tile"
  tile_param {
    axis: 1
    tiles: 1000
  }
}
layer {
  name: "elwise"
  type: "Eltwise"
  bottom: "loss3/classifierCustom"
  bottom: "tile"
  top: "elwise"
  eltwise_param {
    operation: PROD
  }
}

Przemek D

unread,

Jan 18, 2018, 10:17:36 AM1/18/18

to Caffe Users

I'm sorry, I misunderstood you at first. Glad you found the solution.

For future reference, this code was originally written by Jeff Donahue in this discussion on github.

Reply all

Reply to author

Forward