Sparse auto encoder with KL-divergence

2,842 views
Skip to first unread message

Ori Tal

unread,
Jul 29, 2015, 1:51:26 AM7/29/15
to Keras-users
I'm trying to build a sparse auto-encoder. I saw there is implantation of the KL-divergence but I don't see any code using it. Am I missing something? Any other option except from implementing it myself?


Thank

François Chollet

unread,
Jul 29, 2015, 2:16:55 AM7/29/15
to Ori Tal, Keras-users
You can use an activity regularizer to implement activity sparsity. See regularizers.py.

Indeed, no part of Keras is currently using KL divergence.

On 29 July 2015 at 14:51, Ori Tal <ori...@gmail.com> wrote:
I'm trying to build a sparse auto-encoder. I saw there is implantation of the KL-divergence but I don't see any code using it. Am I missing something? Any other option except from implementing it myself?


Thank

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/631ff224-c0f2-4d9f-863c-74070d57059a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ori Tal

unread,
Jul 29, 2015, 3:05:24 AM7/29/15
to Keras-users, francois...@gmail.com

I just implemented it:
I will be glad for some comment. If it is useful I'll be glad to add it to the git repository.

from theano import tensor as T
from keras.regularizers import Regularizer
from keras.optimizers import kl_divergence

class SparseActivityRegularizer(Regularizer):
    def __init__(self, l1=0., l2=0., p=0.05):
        self.p = p

    def set_layer(self, layer):
        self.layer = layer

    def __call__(self, loss):
        p_hat = T.sum(T.mean(self.layer.get_output(True) ** 2, axis=0))
        loss += kl_divergence(self.p, p_hat)
        return loss

    def get_config(self):
        return {"name": self.__class__.__name__,
                "p": self.l1}

scott....@gmail.com

unread,
Oct 23, 2015, 11:08:56 AM10/23/15
to Keras-users, francois...@gmail.com
François,
I am very interesting in implementing a sparse auto-encoder in Keras.  Will this be merged into Keras?

Thanks
P.S. I'm very impressed with the quality of Keras and ability to accept modules such as this.  I recently implemented cosine similarity as cost and it was remarkable easy to plug into the framework.

Nick Frosst

unread,
Nov 6, 2015, 12:17:45 AM11/6/15
to Keras-users, francois...@gmail.com, scott....@gmail.com
hey! would you mind sharing the cosine similarity cost? I would like to take a look at it 

cheers
and thanks 
_nick

tszum...@gmail.com

unread,
Apr 14, 2016, 6:33:17 PM4/14/16
to Keras-users, francois...@gmail.com
Ori,

I was wondering if you (or anyone else on the board) had success with the KL divergence code you posted. I'm able to get it to execute, but it doesn't converge very well on MNIST data (using p=0.05).
In particular, I was curious about the math of the KL divergence as well as your class.

The KL divergence code in Keras has:
k = p_hat - p + p * np.log(p / p_hat)

where as Andrew Ng's equation from his Sparse Autoencoder notes (bottom of page 14) has the following:
k = p * K.log(p / p_hat) + (1-p) * K.log((1-p) / (1-p_hat))

Second, I was wondering if you can describe the math in the class you posted. How does it compare to p_hat in the link above, middle of page 14?

Thank you.


On Wednesday, July 29, 2015 at 3:05:24 AM UTC-4, Ori Tal wrote:

pava...@gmail.com

unread,
Apr 21, 2016, 2:25:02 AM4/21/16
to Keras-users, francois...@gmail.com, tszum...@gmail.com
Has anyone had success implementing a Sparse Autoencoder with the activity regularizer proposed here? 

pava...@gmail.com

unread,
Apr 28, 2016, 6:21:06 AM4/28/16
to Keras-users, francois...@gmail.com, tszum...@gmail.com
I have the same doubts in implementing a sparse autoencoder in Keras. Did you figure out how p_hat was calculated? Any help will be greatly appreciated. 


On Friday, 15 April 2016 04:03:17 UTC+5:30, tszum...@gmail.com wrote:

Vishnu

unread,
Mar 7, 2017, 9:48:59 PM3/7/17
to Keras-users, francois...@gmail.com
Hi, but isnt it just averaging across the nodes in the layer for the given input?

Shouldnt it be the average across the entire training example?the p hat corresponding to to a node (across all inputs ) should be close to p ?

mathieu...@gmail.com

unread,
Apr 3, 2017, 6:53:37 AM4/3/17
to Keras-users

Hi,

I believe the below function is doing what you need.
To find out which axis was batch and which was for the hidden units I ran the FIT function using a very large multiplier for the values I wanted to extract from the function call (see the DEBUG line comment)

def sparse_reg(activ_matrix):
    rho = 0.01; # desired average activation of the hidden units
    beta = 3; # weight of sparsity penalty term
    #return 1000000*K.shape(activ_matrix)[0] # usefull to DEBUG
    # axis 0 size is batch_size
    # axis 1 size is layer_size
    rho_bar = K.mean(activ_matrix, axis=0) # average over the batch samples
    KLs = rho*K.log(rho/rho_bar) + (1-rho)*K.log((1-rho)/(1-rho_bar))
    return beta * K.sum(KLs) # sum over the layer units


Dense(activity_regularizer=sparse_reg, ......)

Let me know if you think there is an error with this function

mattia....@gmail.com

unread,
Apr 28, 2017, 3:21:00 PM4/28/17
to Keras-users, mathieu...@gmail.com
Hey, I'm trying your function but I always have a problem on how to get the activation values for a hidden layer.
Have you maybe experienced this issue and come out with a solution? It's getting me crazy.
Thanks!
Message has been deleted

shaifa...@gmail.com

unread,
Oct 6, 2018, 11:29:49 AM10/6/18
to Keras-users
Hi,

I tried your code for sparsity. There is no error in the code. But training loss is nan always. Can you please suggest why.

Sergey O.

unread,
Oct 6, 2018, 12:13:49 PM10/6/18
to shaifa...@gmail.com, Keras-users
Hmmm, I wonder if rho_bar drops to zero? If so, could be a division by zero issue!
Every time you divide, add a small number (0.0001):

for example:
K.log(rho/(rho_bar+0.0001))

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.

shaifa...@gmail.com

unread,
Oct 6, 2018, 12:17:22 PM10/6/18
to Keras-users
It is still the same. The loss is nan

Sergey O.

unread,
Oct 6, 2018, 12:40:34 PM10/6/18
to shaifa...@gmail.com, Keras-users
log(0) can also cause nan

shaifa...@gmail.com

unread,
Oct 6, 2018, 1:00:05 PM10/6/18
to Keras-users
As a check, I ran K.log(rho+1000/(rho_bar+0.01)). This will become log a - log b. Clearly here log a does not seem to be zero. Then why does it again give nan.

Ted Yu

unread,
Oct 6, 2018, 1:41:51 PM10/6/18
to shaifa...@gmail.com, Keras-users
In the expression,  did you mean (rho+1000)?

shaifa...@gmail.com

unread,
Oct 6, 2018, 1:51:35 PM10/6/18
to Keras-users
To verify if I am getting due to log term getting 0, I changed the original expression of K.log(rho/rho_bar) to K.log(rho+1000/(rho_bar+0.01)). The expression log(rho+1000) cannot be zero. So why nan, is my doubt

Sergey O.

unread,
Oct 6, 2018, 2:04:52 PM10/6/18
to Ted Yu, shaifa...@gmail.com, Keras-users
Try KL = 1
(To confirm it has something to do with sparse function)
Next, you have two log() and division operation, make sure both are "zero-proof" :)

shaifa...@gmail.com

unread,
Oct 6, 2018, 2:27:14 PM10/6/18
to Keras-users
I tried KL=1, It ran normally. Also I observed that rho_bar is perhaps very very small so adding a small number like 0.0001 did not help. The nan vanished only after I added 1 to rho_bar. Now the expression looks like: rho*K.log(rho/(rho_bar+1)) + (1-rho)*K.log((1-rho)/(1-rho_bar+1)). I am not sure if it is mathematically correct to add 1 to rho_bar. Also this expression gives negative loss. Why

Sergey O.

unread,
Oct 6, 2018, 2:59:33 PM10/6/18
to shaifa...@gmail.com, Keras-users
(1-rho_bar+1)
If rho_bar = 2, you are dividing by zero again!

I'm not sure what the original question is and what this function is intended to achieve. But right now if rho_bar reaches 0.0 or 1.0 it's undefined (because you have division by zero).


Jude TCHAYE

unread,
Nov 17, 2020, 3:53:55 AM11/17/20
to Keras-users
Hi! this is a fix to the nan error:


class SparseRegularizer(keras.regularizers.Regularizer):
    
    def __init__(self, rho = 0.01,beta = 3):
        """
        rho  : Desired average activation of the hidden units
        beta : Weight of sparsity penalty term
        """
        self.rho = rho
        self.beta = beta
        

    def __call__(self, activation):
        # sigmoid because we need the probability distributions
        activation = tf.nn.sigmoid(activation)
        # average over the batch samples
        rho_bar = K.mean(activation, axis=0)
        # Avoid division by 0
        rho_bar = K.maximum(rho_bar,1e-10) 
        KLs = rho*K.log(rho/rho_bar) + (1-rho)*K.log((1-rho)/(1-rho_bar))
        return beta * K.sum(KLs) # sum over the layer units

    def get_config(self):
        return {
            'rho': self.rho,
            'beta': self.beta
        } 
 

Jude TCHAYE

unread,
Nov 17, 2020, 4:17:32 AM11/17/20
to Keras-users
I fix a small error:

class SparseRegularizer(keras.regularizers.Regularizer):
    
    def __init__(self, rho = 0.01,beta = 1):
        """
        rho  : Desired average activation of the hidden units
        beta : Weight of sparsity penalty term
        """
        self.rho = rho
        self.beta = beta
        

    def __call__(self, activation):
        rho = self.rho
        beta = self.beta
Reply all
Reply to author
Forward
0 new messages