Dear Dan,
I have implemented a simple component (
L1NormComponent) which divides the input to the layer by its L1 norm. I am doubtful about my implementation of the
Backprop function for this component.
I simply multiply the gradients by out_value and divide by in_value, which is just scaling the gradients independent of the L1 norm division operation performed in the Propagate function.
Another thought is to take partial derivative of the operation, let X be the input then:
\partial (X/|X|) = \partial
sgn(X), which is discontinuous at 0.
But perhaps the derivative of sign can be computed using
Heaviside operation present in Kaldi.
Could you please direct how to use this for the gradient computation?
Thanks,
Brij