My model based on 3 convolutions has no problems converging when using dropout and LRN. After i substituted both layers out with a batch norm layer, it no longer converges. I believe im using it correctly but doesn't work no matter how small the learning rate i set is. Can anyone here shed some light on why?
Original Model:

layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 48
pad: 0
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "drop1"
type: "Dropout"
bottom: "conv1"
top: "conv1"
dropout_param {
dropout_ratio: 0.5
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "norm1"
type: "LRN"
bottom: "conv1"
top: "norm1"
lrn_param {
local_size: 3
alpha: 0.0001
beta: 0.75
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "norm1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
.... Repeat x3
New Model:

layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 48
pad: 0
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "conv1_BN"
type: "BatchNorm" include { phase: TRAIN}
bottom: "conv1"
top: "conv1_BN"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
use_global_stats: false
moving_average_fraction: 0.95
}
}
layer {
name: "conv1_BN"
type: "BatchNorm" include { phase: TEST}
bottom: "conv1"
top: "conv1_BN"
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
param {
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
use_global_stats: true
moving_average_fraction: 0.95
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1_BN"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
.....repeat x3