Sparse auto encoder with KL-divergence

Ori Tal

unread,

Jul 29, 2015, 1:51:26 AM7/29/15

to Keras-users

I'm trying to build a sparse auto-encoder. I saw there is implantation of the KL-divergence but I don't see any code using it. Am I missing something? Any other option except from implementing it myself?

Thank

François Chollet

unread,

Jul 29, 2015, 2:16:55 AM7/29/15

to Ori Tal, Keras-users

You can use an activity regularizer to implement activity sparsity. See regularizers.py.

Indeed, no part of Keras is currently using KL divergence.

On 29 July 2015 at 14:51, Ori Tal <ori...@gmail.com> wrote:

I'm trying to build a sparse auto-encoder. I saw there is implantation of the KL-divergence but I don't see any code using it. Am I missing something? Any other option except from implementing it myself?

Thank

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/631ff224-c0f2-4d9f-863c-74070d57059a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ori Tal

unread,

Jul 29, 2015, 3:05:24 AM7/29/15

to Keras-users, francois...@gmail.com

I just implemented it:

I will be glad for some comment. If it is useful I'll be glad to add it to the git repository.

from theano import tensor as T

from keras.regularizers import Regularizer

from keras.optimizers import kl_divergence

class SparseActivityRegularizer(Regularizer):

def __init__(self, l1=0., l2=0., p=0.05):

self.p = p

def set_layer(self, layer):

self.layer = layer

def __call__(self, loss):

p_hat = T.sum(T.mean(self.layer.get_output(True) ** 2, axis=0))

loss += kl_divergence(self.p, p_hat)

return loss

def get_config(self):

return {"name": self.__class__.__name__,

"p": self.l1}

scott....@gmail.com

unread,

Oct 23, 2015, 11:08:56 AM10/23/15

to Keras-users, francois...@gmail.com

François,

I am very interesting in implementing a sparse auto-encoder in Keras. Will this be merged into Keras?

Thanks

P.S. I'm very impressed with the quality of Keras and ability to accept modules such as this. I recently implemented cosine similarity as cost and it was remarkable easy to plug into the framework.

Nick Frosst

unread,

Nov 6, 2015, 12:17:45 AM11/6/15

to Keras-users, francois...@gmail.com, scott....@gmail.com

hey! would you mind sharing the cosine similarity cost? I would like to take a look at it

cheers
and thanks
_nick

tszum...@gmail.com

unread,

Apr 14, 2016, 6:33:17 PM4/14/16

to Keras-users, francois...@gmail.com

Ori,

I was wondering if you (or anyone else on the board) had success with the KL divergence code you posted. I'm able to get it to execute, but it doesn't converge very well on MNIST data (using p=0.05).
In particular, I was curious about the math of the KL divergence as well as your class.

The KL divergence code in Keras has:
k = p_hat - p + p * np.log(p / p_hat)

where as Andrew Ng's equation from his Sparse Autoencoder notes (bottom of page 14) has the following:
k = p * K.log(p / p_hat) + (1-p) * K.log((1-p) / (1-p_hat))

Second, I was wondering if you can describe the math in the class you posted. How does it compare to p_hat in the link above, middle of page 14?

Thank you.

On Wednesday, July 29, 2015 at 3:05:24 AM UTC-4, Ori Tal wrote:

pava...@gmail.com

unread,

Apr 21, 2016, 2:25:02 AM4/21/16

to Keras-users, francois...@gmail.com, tszum...@gmail.com

Has anyone had success implementing a Sparse Autoencoder with the activity regularizer proposed here?

pava...@gmail.com

unread,

Apr 28, 2016, 6:21:06 AM4/28/16

to Keras-users, francois...@gmail.com, tszum...@gmail.com

I have the same doubts in implementing a sparse autoencoder in Keras. Did you figure out how p_hat was calculated? Any help will be greatly appreciated.

On Friday, 15 April 2016 04:03:17 UTC+5:30, tszum...@gmail.com wrote:

Vishnu

unread,

Mar 7, 2017, 9:48:59 PM3/7/17

to Keras-users, francois...@gmail.com

Hi, but isnt it just averaging across the nodes in the layer for the given input?

Shouldnt it be the average across the entire training example?the p hat corresponding to to a node (across all inputs ) should be close to p ?

mathieu...@gmail.com

unread,

Apr 3, 2017, 6:53:37 AM4/3/17

to Keras-users

Hi,

I believe the below function is doing what you need.

To find out which axis was batch and which was for the hidden units I ran the FIT function using a very large multiplier for the values I wanted to extract from the function call (see the DEBUG line comment)

def sparse_reg(activ_matrix):

rho = 0.01; # desired average activation of the hidden units

beta = 3; # weight of sparsity penalty term

#return 1000000*K.shape(activ_matrix)[0] # usefull to DEBUG

# axis 0 size is batch_size

# axis 1 size is layer_size

rho_bar = K.mean(activ_matrix, axis=0) # average over the batch samples

KLs = rho*K.log(rho/rho_bar) + (1-rho)*K.log((1-rho)/(1-rho_bar))

return beta * K.sum(KLs) # sum over the layer units

Dense(activity_regularizer=sparse_reg, ......)

Let me know if you think there is an error with this function

mattia....@gmail.com

unread,

Apr 28, 2017, 3:21:00 PM4/28/17

to Keras-users, mathieu...@gmail.com

Hey, I'm trying your function but I always have a problem on how to get the activation values for a hidden layer.

Have you maybe experienced this issue and come out with a solution? It's getting me crazy.

Thanks!

Message has been deleted

shaifa...@gmail.com

unread,

Oct 6, 2018, 11:29:49 AM10/6/18

to Keras-users

Hi,

I tried your code for sparsity. There is no error in the code. But training loss is nan always. Can you please suggest why.

Sergey O.

unread,

Oct 6, 2018, 12:13:49 PM10/6/18

to shaifa...@gmail.com, Keras-users

Hmmm, I wonder if rho_bar drops to zero? If so, could be a division by zero issue!

Every time you divide, add a small number (0.0001):

for example:
K.log(rho/(rho_bar+0.0001))

--

You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.

To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/cf3c5231-1af9-4ddc-a017-1f8185e8ccdf%40googlegroups.com.

shaifa...@gmail.com

unread,

Oct 6, 2018, 12:17:22 PM10/6/18

to Keras-users

It is still the same. The loss is nan

Sergey O.

unread,

Oct 6, 2018, 12:40:34 PM10/6/18

to shaifa...@gmail.com, Keras-users

log(0) can also cause nan

To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/e02e1b35-54be-4599-9d96-92b7757dee59%40googlegroups.com.

shaifa...@gmail.com

unread,

Oct 6, 2018, 1:00:05 PM10/6/18

to Keras-users

As a check, I ran K.log(rho+1000/(rho_bar+0.01)). This will become log a - log b. Clearly here log a does not seem to be zero. Then why does it again give nan.

Ted Yu

unread,

Oct 6, 2018, 1:41:51 PM10/6/18

to shaifa...@gmail.com, Keras-users

In the expression, did you mean (rho+1000)?

To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/8064c0b2-d957-4cda-beaa-0bb9352392fb%40googlegroups.com.

shaifa...@gmail.com

unread,

Oct 6, 2018, 1:51:35 PM10/6/18

to Keras-users

To verify if I am getting due to log term getting 0, I changed the original expression of K.log(rho/rho_bar) to K.log(rho+1000/(rho_bar+0.01)). The expression log(rho+1000) cannot be zero. So why nan, is my doubt

Sergey O.

unread,

Oct 6, 2018, 2:04:52 PM10/6/18

to Ted Yu, shaifa...@gmail.com, Keras-users

Try KL = 1

(To confirm it has something to do with sparse function)

Next, you have two log() and division operation, make sure both are "zero-proof" :)

To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/5bb8f3d9.1c69fb81.96444.d069%40mx.google.com.

shaifa...@gmail.com

unread,

Oct 6, 2018, 2:27:14 PM10/6/18

to Keras-users

I tried KL=1, It ran normally. Also I observed that rho_bar is perhaps very very small so adding a small number like 0.0001 did not help. The nan vanished only after I added 1 to rho_bar. Now the expression looks like: rho*K.log(rho/(rho_bar+1)) + (1-rho)*K.log((1-rho)/(1-rho_bar+1)). I am not sure if it is mathematically correct to add 1 to rho_bar. Also this expression gives negative loss. Why

Sergey O.

unread,

Oct 6, 2018, 2:59:33 PM10/6/18

to shaifa...@gmail.com, Keras-users

(1-rho_bar+1)

If rho_bar = 2, you are dividing by zero again!

I'm not sure what the original question is and what this function is intended to achieve. But right now if rho_bar reaches 0.0 or 1.0 it's undefined (because you have division by zero).

To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/3e59234d-cca4-4614-8c1a-0044c38d5344%40googlegroups.com.

Jude TCHAYE

unread,

Nov 17, 2020, 3:53:55 AM11/17/20

to Keras-users

Hi! this is a fix to the nan error:

class SparseRegularizer(keras.regularizers.Regularizer):

def __init__(self, rho = 0.01,beta = 3):

"""

rho : Desired average activation of the hidden units

beta : Weight of sparsity penalty term

"""

self.rho = rho

self.beta = beta

def __call__(self, activation):

# sigmoid because we need the probability distributions

activation = tf.nn.sigmoid(activation)

# average over the batch samples

rho_bar = K.mean(activation, axis=0)

# Avoid division by 0

rho_bar = K.maximum(rho_bar,1e-10)

KLs = rho*K.log(rho/rho_bar) + (1-rho)*K.log((1-rho)/(1-rho_bar))

return beta * K.sum(KLs) # sum over the layer units

def get_config(self):

return {

'rho': self.rho,

'beta': self.beta

}

Jude TCHAYE

unread,

Nov 17, 2020, 4:17:32 AM11/17/20

to Keras-users

I fix a small error:

class SparseRegularizer(keras.regularizers.Regularizer):

def __init__(self, rho = 0.01,beta = 1):

"""

rho : Desired average activation of the hidden units

beta : Weight of sparsity penalty term

"""

self.rho = rho

self.beta = beta

def __call__(self, activation):

rho = self.rho

beta = self.beta

Reply all

Reply to author

Forward