Variable importance Deep Learning

1,176 views
Skip to first unread message

b.benjam...@gmail.com

unread,
Jun 7, 2016, 2:18:41 PM6/7/16
to H2O Open Source Scalable Machine Learning - h2ostream

Hi,
There is something I don't get in the way you define Variiable importance for Deep Learning algorithm.
First, can you define what is relative_imporance, absolute_importance and percentage in the varimp output ? Which one is following the gedeon definition ?
Second, you say that "Variable Importance
Whether to compute variable importances for input features. The implemented method (by Gedeon) considers the weights connecting the input features to the first two hidden layers."
Why can you only consider the two first layer for deeper network ? How does it make sense with Gedeon article ? As gedeon article defined variable importance for any network and is based on every layer.
Thank you for your help.
Ben

arno....@gmail.com

unread,
Jun 7, 2016, 5:45:02 PM6/7/16
to H2O Open Source Scalable Machine Learning - h2ostream, b.benjam...@gmail.com

Hi Ben,

Yes, we only look at the first 2 layers (for simplicity and speed) - The actual implementation is here:
https://github.com/h2oai/h2o-3/blob/master/h2o-algos/src/main/java/hex/deeplearning/DeepLearningModelInfo.java#L610-L610

This gives one number per input neuron, normalized such that the max value is 1.

We then take those numbers and compute the relative importance and their percentages here:
https://github.com/h2oai/h2o-3/blob/master/h2o-core/src/main/java/hex/VarImp.java

For DL, there's no impact of scaling, because the numbers were already scaled.

Hence, the relative importance is the same as the absolute importance (we have the same infrastructure for other models where the importance numbers don't come normalized).

Hope this helps.
Best,
Arno

b.benjam...@gmail.com

unread,
Jun 7, 2016, 8:40:40 PM6/7/16
to H2O Open Source Scalable Machine Learning - h2ostream, b.benjam...@gmail.com, arno....@gmail.com

Thanks a lot, but I still don't get regarding to the paper how using only the two first layers makes sense. If you don't use the weights for the deeper layers then your measure can be totally wrong, don't you agree ?

arno....@gmail.com

unread,
Jun 7, 2016, 9:14:22 PM6/7/16
to H2O Open Source Scalable Machine Learning - h2ostream, b.benjam...@gmail.com, arno....@gmail.com

No, don't think so, the input is directly connected to the first two layers in a way that should reflect in the variable importances. If deep layers decide that the first few layers are not important, then that should change the first two layers. Yes, it's not perfect, but no feature importance is... quite dramatically so. It's more often used as a warning sign that stuff is wrong than as a means to know what really matters.

Reply all
Reply to author
Forward
0 new messages