L2 regularization on hidden layer returns NaN from loss function

Matthew

unread,

Jan 12, 2017, 12:47:54 PM1/12/17

to lasagne-users

Hi Everyone,

I recently posted this on StackOverflow but this is probably a better place for it:

I'm trying to put together a really simple 3-layer neural network in lasagne. 30 input neurons, 10-neuron hidden layer, 1-neuron output layer. I'm using the binary_crossentropy loss function and sigmoid nonlinearity. I want to put l1 regularization on the edges entering the output layer and l2 regularization on the edges from the input to the hidden layer. I'm using code very close to the example code on the regularization page of the lasagne documentation and in the MLP example.

The L1 regularization seems to work fine, but whenever I add the L2 regularization's penalty term to the loss function, it returns nan. Everything works fine when I remove the term l2_penalty * l2_reg_param from the last line below. Additionally, I'm able to perform L1 regularization on the hidden layer l_hid1 without any issues.

This is my first foray into theano and lasagne so I feel like the error is probably something pretty simple but I just don't know enough to see it.

Here's the net setup code:

l_in = lasagne.layers.InputLayer(shape=(942,1,1,30),input_var=input_var)

l_hid1 = lasagne.layers.DenseLayer(l_in, num_units=10, nonlinearity=lasagne.nonlinearities.sigmoid, W=lasagne.init.GlorotUniform())

network = lasagne.layers.DenseLayer(l_hid1, num_units=1, nonlinearity=lasagne.nonlinearities.sigmoid)

prediction = lasagne.layers.get_output(network)

l2_penalty = regularize_layer_params(l_hid1, l2)

l1_penalty = regularize_layer_params(network, l1)

loss = lasagne.objectives.binary_crossentropy(prediction, target_var)
loss = loss.mean()
loss = loss + l1_penalty * l1_reg_param + l2_penalty * l2_reg_param

I've been testing the l1 and l2 functions by doing the below: l2 works fine when I pass it a numpy array but not when it gets a weight matrix.

print(l2(l_hid1.W).eval())

>> nan

debug_arr = np.ones((30,10),dtype=np.float32)

print(l2(debug_arr).eval())

>> 300.0

I'm on Theano 0.9.dev2 and Lasagne 0.2.dev1, if that helps.

Thanks for any advice!

Jan Schlüter

unread,

Jan 19, 2017, 11:22:58 AM1/19/17

to lasagne-users

Here's the net setup code:

Looks good (i.e., it fits your description of what you want to do).

I've been testing the l1 and l2 functions by doing the below: l2 works fine when I pass it a numpy array but not when it gets a weight matrix.
print(l2(l_hid1.W).eval())
>> nan 

I think your weight matrix already contains NaNs at this point. If I just execute the few lines you posted, I get:
In [11]: lasagne.regularization.l2(l_hid1.W).eval()
Out[11]: array(14.514272689819336, dtype=float32)

Try including the l2 penalty in the output of your training function, and see how it evolves -- it shouldn't be NaN from the start. If it's NaN after the first update, reduce the l2_reg_param and/or learning rate. I don't see a problem with your code.

Best, Jan

Matthew

unread,

Jan 19, 2017, 2:31:12 PM1/19/17

to lasagne-users

Hi Jan!

I really appreciate that you took the time to look at this. Looking at the l2_penalty by including it in the output of the loss function was a really smart idea, I didn't even think of that. I just tried it, and strangely the l2 penalty is NaN from the start. I'm wondering if it's something with the data type of the weight matrix (or my CUDA/theano install?) -- is x ** 2 (from the l2 source code) defined when type(x) is CudaNdarraySharedVariable?

I modified my program, adding the following print statements (between the commented lines) to the training loop, which is mostly taken from your MLP example:

for batch in iterate_minibatches(X_train, y_train, 100, shuffle=True):
            inputs, targets = batch
            targets = targets.reshape(len(targets),1)
            tr_err, l2_pen = train_fn(inputs, targets)
            #------
            print(type(l_hid1.W))
            print(l2(l_hid1.W).eval())
            print(type(l_hid1.W.get_value()))
            print(l2(l_hid1.W.get_value()).eval())
            #------
            train_err += tr_err
            train_batches += 1
            train_loss_by_epoch[epoch] = train_err

Here's what it spit out:

<class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>
nan
<type 'numpy.ndarray'>
15.033241272
<class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>
nan
<type 'numpy.ndarray'>
15.033241272
<class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>
nan
<type 'numpy.ndarray'>
15.0333986282
<class 'theano.sandbox.cuda.var.CudaNdarraySharedVariable'>
nan
<type 'numpy.ndarray'>
15.033074379

So it looks like l2 runs fine on numpy arrays but not on CudaNdarraySharedVariables. That seems strange though, because apparently this code works fine for other people (including you?)? I wonder if it's something with my CUDA or Theano installs?

And again, really appreciate the help!!

Best,

Matthew

unread,

Jan 19, 2017, 2:34:53 PM1/19/17

to lasagne-users

(I forgot to mention that I did try reducing the learning rate and l2_reg_param by varying amounts spanning orders of magnitude, same results.)

Thanks again!

Jan Schlüter

unread,

Jan 26, 2017, 9:17:27 AM1/26/17

to lasagne-users

Still works for me:

>>> import lasagne
>>> l = lasagne.layers.DenseLayer((942,1,1,30), 10)
>>> from lasagne.regularization import l2
>>> l2(l.W).eval()
array(14.75477409362793, dtype=float32)
>>> l2(l.W.get_value()).eval()
array(14.754775047302246, dtype=float32)
>>> type(l.W)
theano.sandbox.cuda.var.CudaNdarraySharedVariable

Can you put your print statements *before* the first call to train_fn()? A difference between the direct l2(l.W) and the l2(l.W.get_value()) is that the latter will compute the square on CPU rather than GPU, because **2 is directly executed on the numpy array. For comparison, you can also try T.sum(T.square(l.W.get_value())).eval(), this will compute the square on GPU.

Best, Jan

Reply all

Reply to author

Forward