Get variable name from model object in R

331 views
Skip to first unread message

vraman...@gmail.com

unread,
Sep 11, 2014, 6:59:58 PM9/11/14
to h2os...@googlegroups.com
I've a simple question in R..
model = h2.getModel(...)
model$varimp ...give variable name & importance
How do i get the two (name & value)
separately?
thanjks

ke...@0xdata.com

unread,
Sep 12, 2014, 11:55:22 AM9/12/14
to h2os...@googlegroups.com, vraman...@gmail.com

Hi, vraman? (I didn't see your first name in the post, sorry)
Someone should respond shortly that's more expert in the R part of h2o, but I was curious myself and looked at h2o/R/tests/testdir_demos/runit_demo_VI_all_algos.R

That's in the h2o gihub repo. There are a number of demos in testdir_demos that are useful as examples

For instance, here it was using the variable importance from GBM model result:
+ # Access Variable Importance from the built model
+ gbm.VI = my.gbm@model$varimp
+ print("Variable importance from GBM")
+ print(gbm.VI)
+
+ par(mfrow=c(2,2))
+ # Plot variable importance from GBM
+ barplot(t(gbm.VI[1]),las=2,main="VI from GBM")


The print gave this. It sounds like your question is just how to extract the right column/row of information from the VI result here? I believe it's just a data frame, but maybe you can clarify your question, so someone more expert in R will give you the exact right answer.

the test output from above (not the barplot though, although you can see how the barplot above extracts a column.


[1] "Variable importance from GBM"
Relative importance Scaled.Values Percent.Influence
duration 165.803620 1.00000000 53.7546812
nr.employed 116.284200 0.70133692 37.7001425
pdays 23.857216 0.14388839 7.7346746
euribor3m 2.499952 0.01507779 0.8105017
age 0.000000 0.00000000 0.0000000
job 0.000000 0.00000000 0.0000000
marital 0.000000 0.00000000 0.0000000
education 0.000000 0.00000000 0.0000000
default 0.000000 0.00000000 0.0000000
housing 0.000000 0.00000000 0.0000000
loan 0.000000 0.00000000 0.0000000
contact 0.000000 0.00000000 0.0000000
month 0.000000 0.00000000 0.0000000
day_of_week 0.000000 0.00000000 0.0000000
campaign 0.000000 0.00000000 0.0000000
previous 0.000000 0.00000000 0.0000000
poutcome 0.000000 0.00000000 0.0000000
emp.var.rate 0.000000 0.00000000 0.0000000
cons.price.idx 0.000000 0.00000000 0.0000000
cons.conf.idx 0.000000 0.00000000 0.0000000


-kevin

vraman...@gmail.com

unread,
Sep 12, 2014, 1:47:29 PM9/12/14
to h2os...@googlegroups.com, vraman...@gmail.com, ke...@0xdata.com
Thanks Kevin.
Yeah model$varimp prints everything ..
I want to separate out row names from values..
I'm sure R experts would know
venkatehs

spn...@gmail.com

unread,
Sep 12, 2014, 2:07:29 PM9/12/14
to h2os...@googlegroups.com, vraman...@gmail.com

Hi,

The varimp component is just an R data frame, so you can access the names/values in the usual ways. Here's a complete script with iris:


library(h2o)
h <- h2o.init()
hex <- as.h2o(h, iris)
m <- h2o.gbm(x=1:4, y=5, data=hex, importance=T)

m@model$varimp
Relative importance Scaled.Values Percent.Influence
Petal.Width 7.216290000 1.0000000000 51.22833426
Petal.Length 6.851120500 0.9493965043 48.63600147
Sepal.Length 0.013625654 0.0018881799 0.09672831
Sepal.Width 0.005484723 0.0007600474 0.03893596

is.data.frame(m@model$varimp)
# [1] TRUE

names(m@model$varimp)
# [1] "Relative importance" "Scaled.Values" "Percent.Influence"

rownames(m@model$varimp)
# [1] "Petal.Width" "Petal.Length" "Sepal.Length" "Sepal.Width"

m@model$varimp$"Relative importance"
# [1] 7.216290000 6.851120500 0.013625654 0.005484723

etc.

HTH,
Spencer

vraman...@gmail.com

unread,
Sep 12, 2014, 2:42:35 PM9/12/14
to h2os...@googlegroups.com, vraman...@gmail.com, spn...@gmail.com
Thanks Spencer..That helps..
I built a model using DL..is.data.frame(..) was FALSE
I used as.data.frame(..) & i could extract
thanks

spn...@gmail.com

unread,
Sep 12, 2014, 2:45:29 PM9/12/14
to h2os...@googlegroups.com, vraman...@gmail.com, spn...@gmail.com
Thanks for the feedback,

The variable importance should be plain olde R data frame, I've opened up a jira ticket to track this (https://0xdata.atlassian.net/browse/PUB-1020)

Spencer

vraman...@gmail.com

unread,
Sep 15, 2014, 7:18:21 PM9/15/14
to h2os...@googlegroups.com, vraman...@gmail.com, spn...@gmail.com
Hi Spencer:
Not sure if this is related/& if it's an issue...Here's the use case

Launch R
>library(h2o)
>conn <- ...
>train = h2o.getFrame(conn, '<previously_imported_key>')
>is.data.frame(train)
FALSE

spn...@gmail.com

unread,
Sep 15, 2014, 7:22:48 PM9/15/14
to h2os...@googlegroups.com, vraman...@gmail.com, spn...@gmail.com
This is expected actually. The result of h2o.getFrame(...) is a pointer to the data sitting in H2O. In this case, the class of `train` is H2OParsedData. If you'd like to pull an H2O data frame locally, you could use

as.data.frame(train)

What should be returned as an R data frame are, e.g., variable importances inside of models.

Thanks,
Spencer

Reply all
Reply to author
Forward
0 new messages