bias regularization keras

743 views

Skip to first unread message

cheta...@gmail.com

unread,

Mar 3, 2016, 1:12:48 AM3/3/16

to Keras-users

Hi,

I'm attempting to train a toy model on the MNIST dataset. I've successfully incorporated weight regularization, but any bias regularization causes all loss/val_loss functions to become nan.

Here is code to reproduce the issue:

from keras.datasets import mnist
from keras.utils import np_utils
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.regularizers import l2

(x_train, y_train), (x_test, y_test) = mnist.load_data()
num_classes = len(np.unique(y_test))
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)

w_regularizer = l2(10**-5)
b_regularizer = None
b_regularizer = l2(10**-5)

model = Sequential()
model.add(Flatten(input_shape=((28,28))))
model.add(Dense(50, input_dim=28*28, activation='relu', 
                W_regularizer=w_regularizer, b_regularizer=b_regularizer))
model.add(Dense(50, activation='relu', 
                W_regularizer=w_regularizer, b_regularizer=b_regularizer))
model.add(Dense(10, activation='softmax', 
                W_regularizer=w_regularizer, b_regularizer=b_regularizer))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

# fit data
nb_epoch = 10
hist = model.fit(x_train[:100], y_train[:100], batch_size=256, nb_epoch=nb_epoch,
                 validation_split=0.2)

If you comment out the line 'b_regularizer = l2(10**-5)' the code runs successfully and finite loss values are reported by Keras. Not doing so causes all loss values to become NaN after the training loss calculation on the first epoch.

I assume the cause has something to do with how the biases are initialized at 0 and the weights are not. I had a two questions related to this:

Is there a reason to regularize the bias? Unless I did something wrong, it seems like everyone attempting this should have run into this error. I'm relatively new to the deep learning world (worked through Hinton's Coursera course over winter break, am working on a couple application projects now)
Is there a way to non-zero initialize the biases within Keras?

Reply all

Reply to author

Forward

0 new messages