nn.L1Penalty—add before or after network to be regularized?

58 views
Skip to first unread message

Gregory Gundersen

unread,
Aug 16, 2017, 5:39:30 PM8/16/17
to torch7
Which is the correct way to apply L1 regularization to a single layer neural network,

model = nn.Sequential()
model:add(nn.L1Penalty(1))
model:add(nn.Linear(3,2))

Or

model = nn.Sequential()
model:add(nn.Linear(3,2))
model:add(nn.L1Penalty(1))


?

A small test suggests that it is the former:

require 'nn'

torch.manualSeed(0)
math.randomseed(0)

model = nn.Sequential()
model:add(nn.L1Penalty(500))
model:add(nn.Linear(3,2))

input  = torch.Tensor{1,2,-3}
target = torch.Tensor{1,0}
crit   = nn.MSECriterion()

output  = model:forward(input)
crit:forward(output, target)
gradOut = crit:backward(output, target)
model:backward(input,
gradOut)

print(model.gradInput)

If the L1Penalty line is commented out, model.gradInput is

-0.1613
-0.1768
-0.4379


If it is not commented out, then model.gradInput is:

 499.8387
 499.8232
-500.4379

Which is consistent with the derivative of the absolute value times 500 being added to the gradient. Is my understanding correct that the L1Penalty layer should be added before the layer that it regularizes?

Thanks.

Tasty Minerals

unread,
Aug 17, 2017, 11:50:12 AM8/17/17
to torch7
From Torch docs:

L1Penalty is an inline module that in its forward propagation copies the input Tensor directly to
the output, and computes an L1 loss of the latent state (input) and stores it in the module’s loss field.
During backward propagation: gradInput = gradOutput + gradLoss.
This module can be used in autoencoder architectures to apply L1 losses to internal latent state
without having to use Identity and parallel containers to carry the internal code to an output
criterion.

Add it after the network.

Gregory Gundersen

unread,
Aug 17, 2017, 3:07:12 PM8/17/17
to torch7 on behalf of Tasty Minerals
I read the docs and also thought that the L1Penalty layer should be added after the network. But can you explain why my toy example only has the correct gradInput when the layer is added before the network?

The derivative of |x| is 1 or -1 depending on the sign of x. So if the penalty weight is 500, the gradient should be modified by +500 or -500, which is what we see in my example.

If the L1Penalty layer is added after the network, gradInput doesn't make sense to me:

-227.1058
-113.1111
-331.3376



--
You received this message because you are subscribed to a topic in the Google Groups "torch7" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/torch7/__iAj8XJQtg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to torch7+unsubscribe@googlegroups.com.
To post to this group, send email to tor...@googlegroups.com.
Visit this group at https://groups.google.com/group/torch7.
For more options, visit https://groups.google.com/d/optout.

Tasty Minerals

unread,
Aug 17, 2017, 3:46:54 PM8/17/17
to torch7
Again, according to the docs it is applied before criterion.

I applied it on my ffnn before and after hidden layers and before criterion but it only flattened out the train/validation error in all cases. 
To unsubscribe from this group and all its topics, send an email to torch7+un...@googlegroups.com.

Gregory Gundersen

unread,
Aug 17, 2017, 3:53:34 PM8/17/17
to torch7 on behalf of Tasty Minerals
Thank you for your help, but part of my question is specifically about why my example seems to work with the penalty before the linear layer.

To unsubscribe from this group and all its topics, send an email to torch7+unsubscribe@googlegroups.com.

Tasty Minerals

unread,
Aug 17, 2017, 4:22:35 PM8/17/17
to torch7
Sorry, I don't know the answer to your question but...

By just looking at the source code L1Penalty does not do anything special. In forward pass it carries your input and does not change it but it stores a self.loss which is calculated in the process.
This self.loss field is not accessed by any of the parent torch modules. It is simply there, isolated.
During backward pass however L1Penalty does the following thing to your forward output: self.gradInput:resizeAs(input):copy(input):sign():mul(m) 
It uses the sign of tensor and multiplies it by value updating the wieghts.  So, it if is applied after some layer, the weights after this layer will be L1 normalized and so on.
Now, look at your example. You do not do anything to your weights except changing the sign because L1Penalty(1) does multiply the input by 1, so it is not the L1Penalty that changes your gradInput but Linear itself.

On Thursday, August 17, 2017 at 9:53:34 PM UTC+2, Gregory Gundersen wrote:
Thank you for your help, but part of my question is specifically about why my example seems to work with the penalty before the linear layer.

Gregory Gundersen

unread,
Aug 19, 2017, 7:01:41 PM8/19/17
to torch7 on behalf of Tasty Minerals
This helped a lot. Thanks.

On Thu, Aug 17, 2017 at 4:22 PM, Tasty Minerals via torch7 <torch7+APn2wQclZIs4j_Qm8xKHIbhaI...@googlegroups.com> wrote:
Sorry, I don't know the answer to your question but...

By just looking at the source code L1Penalty does not do anything special. In forward pass it carries your input and does not change it but it stores a self.loss which is calculated in the process.
This self.loss field is not accessed by any of the parent torch modules. It is simply there, isolated.
During backward pass however L1Penalty does the following thing to your forward output: self.gradInput:resizeAs(input):copy(input):sign():mul(m) 
It uses the sign of tensor and multiplies it by value updating the wieghts.  So, it if is applied after some layer, the weights after this layer will be L1 normalized and so on.
Now, look at your example. You do not do anything to your weights except changing the sign because L1Penalty(1) does multiply the input by 1, so it is not the L1Penalty that changes your gradInput but Linear itself.

On Thursday, August 17, 2017 at 9:53:34 PM UTC+2, Gregory Gundersen wrote:
Thank you for your help, but part of my question is specifically about why my example seems to work with the penalty before the linear layer.
To unsubscribe from this group and all its topics, send an email to torch7+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages