How to add activity priors for output units across samples/epochs?

24 views
Skip to first unread message

timher...@gmail.com

unread,
Mar 16, 2018, 6:55:44 AM3/16/18
to Keras-users
Hi everyone,

what I would like to introduce to my model is a regularization of my output units to force them to follow a certain distribution across samples.
For example, let's say I have two relu output units. Aside from minimizing an MSE I also want their activity histograms (across the entire epoch or maybe large batches) to be similar to two given histograms/PDF.
I know about the existence of activity regularizers (and KL divergence losses). However, as far as I understand they are based on the activity across the output units which I do not want. Am I mistaken?

thanks in advance and best regards
Tim

Daπid

unread,
Mar 16, 2018, 12:48:34 PM3/16/18
to timher...@gmail.com, Keras-users
Add the apropiate activity regularisation after a batch norm layer.

 For example, if you want a Gaussian distribution, use L2, since that is the distribution that minimises the sum of squares given a mean 0 and a standard deviation 1.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/89fbbdbe-5663-4537-b6c0-27d6f3b29ceb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Tim Herfurth

unread,
Mar 16, 2018, 12:58:06 PM3/16/18
to Daπid, Keras-users
Thanks.
But doesn't this still push the activity across units towards gaussianity (rather than that of a single unit across samples)?

Best
_________
written from mobile phone. please excuse brevity and typos

Daπid

unread,
Mar 16, 2018, 3:45:25 PM3/16/18
to Tim Herfurth, Keras-users
It depends on how you make batch normalisation compute the statistics. If you make them by unit, then each unit will be pushed towards gaussiannity, but if you compute them across, say, channels, the collective of each channel will follow a Gaussian distribution, but not necessarily each activation. 

Tim Herfurth

unread,
Mar 17, 2018, 11:32:54 AM3/17/18
to Daπid, Keras-users
Ok, cool. That makes sense.
As I take it, the standard implementation of batch normalization does not do it this way. Could you give me a hint of the design of such a regularizer function (maybe codewise)? That would be excellent. Thanks a lot!

___________________
written from mobile phone. please excuse brevity and typos

Sent from TypeApp
Reply all
Reply to author
Forward
0 new messages