autoencoder, h2o.deepfeatures(), h2o.mse(), str() -- using and interpreting results

270 views
Skip to first unread message

hvh

unread,
Jan 9, 2018, 12:27:35 PM1/9/18
to H2O Open Source Scalable Machine Learning - h2ostream
1) What version of H2O are you using:
I got the version from here: http://h2o-release.s3.amazonaws.com/h2o/rel-weierstrass/2/R
2) Specify the type of machine your using (i.e. OS X 10.11.4, Windows 10, etc). Windows 10, but I am on a linux system -- redhat
3) Specify what language you are working in and what version (i.e. Python 2.7, Spark 1.6.1, etc)
I am working with R
4) The code you were executing when you received an error message (please provide a reproducible example if possible).
Please see above
5) Copy and paste in your error message.
Several messages -- see below.
6) Type of data you are using (if applicable).
numeric, binary data
7) Code is pasted below at the end


************ QUESTIONS ********************
I need help using h2o.deepfeatures() and other functions. I'm trying to understand the output of my autoencoder model (ae).

0) Is there a good tutorial on using and interpreting the autoencoder feature?
This tutorial: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/booklets/DeepLearningBooklet.pdf is not that helpful.

1) I was advised in a separate post to use h2o.deepfeatures() to get the output of the ae model. However, when I feed in my model variable, I receive a message that says "Error in chk.H2OFrame(x) : must be an H2OFrame". How do I convert my ae model variable to an H2o frame? Currently, it is an autoencoder object:
> class(It2.dl.ae)
[1] "H2OAutoEncoderModel"
attr(,"package")
[1] "h2o"

2) I was also advised to pass this autoencoder object to h2o.mse() or str() to get metrics.For h2o.mse() I get the following. What does this mean?
> h2o.mse(It2.perf)
[1] "NaN"


3) for str(), I get the following. How do I interpret this?
> str(It2.perf)
Formal class 'H2OAutoEncoderMetrics' [package "h2o"] with 5 slots
..@ algorithm: chr "deeplearning"
..@ on_train : logi FALSE
..@ on_valid : logi FALSE
..@ on_xval : logi FALSE
..@ metrics :List of 10
.. ..$ model :List of 4
.. .. ..$ __meta:List of 3
.. .. .. ..$ schema_version: int 3
.. .. .. ..$ schema_name : chr "ModelKeyV3"
.. .. .. ..$ schema_type : chr "Key<Model>"
.. .. ..$ name : chr "DeepLearning_model_R_1514491686079_1"
.. .. ..$ type : chr "Key<Model>"
.. .. ..$ URL : chr "/3/Models/DeepLearning_model_R_1514491686079_1"
.. ..$ model_checksum: num 7.08e+18
.. ..$ frame :List of 1
.. .. ..$ name: chr "RTMP_sid_a42e_5"
.. ..$ frame_checksum: num -6.26e+18
.. ..$ description : NULL
.. ..$ scoring_time : num 1.51e+12
.. ..$ predictions : NULL
.. ..$ MSE : chr "NaN"
.. ..$ RMSE : chr "NaN"
.. ..$ nobs : int 0

4) Code:
library(h2o)
h2o.init()
It2Path <- "/data/projects/IRAD/MultipleMyeloma/Myeloma_8yearsPreDx.csv"
It2.hex <- h2o.uploadFile(path = It2Path, destination_frame = "It2.hex")
It2.split = h2o.splitFrame(data = It2.hex,ratios = 0.85)
It2.train = It2.split[[1]]
It2.test = It2.split[[2]]
It2.dl.ae = h2o.deeplearning(x = names(It2.train), training_frame = It2.train,autoencoder = TRUE,
reproducible = T,seed = 1234, hidden = c(6,5,6), epochs = 50)

#It2.perf <- h2o.performance(It2.dl.ae, It2.test)

It2.ano <- h2o.anomaly(It2.dl.ae,It2.test)
It2.ano.df <- as.data.frame(recon_error)
png("It2_recon_error.png")
plot.ts(It2.ano)
dev.off()
It2.predict <- h2o.predict(It2.dl.ae, It2.test)
It2.perf <- h2o.performance(It2.dl.ae, It2.test)

Darren Cook

unread,
Jan 9, 2018, 1:09:46 PM1/9/18
to h2os...@googlegroups.com
> 1) I was advised in a separate post to use h2o.deepfeatures() to get the output of the ae model. However, when I feed in my model variable, I receive a message that says "Error in chk.H2OFrame(x) : must be an H2OFrame". How do I convert my ae model variable to an H2o frame? Currently, it is an autoencoder object:

The code you showed does not show your call to h2o.deepfeatures(). Can
you give a reproducible example showing how you are using it?

(It takes two arguments: your model, and then the data frame to extract
features from; my guess would be the error you see is complaining about
the latter.)

Darren

manjula...@gmail.com

unread,
Jan 9, 2018, 1:24:58 PM1/9/18
to H2O Open Source Scalable Machine Learning - h2ostream

Thank you, Darren. This is the code and following error message:
> h2o.deepfeatures(It2.dl.ae)


Error in chk.H2OFrame(x) : must be an H2OFrame

What is the other data frame that I should feed in? How do I create this data frame?

Many thanks!

manjula...@gmail.com

unread,
Jan 9, 2018, 1:40:01 PM1/9/18
to H2O Open Source Scalable Machine Learning - h2ostream

Here is what I got after I added the data frame and layers. How do I interpret these results:

h2o.deepfeatures(It2.dl.ae,It2.hex,layer=1)
|======================================================================| 100%
DF.L1.C1 DF.L1.C2 DF.L1.C3 DF.L1.C4 DF.L1.C5 DF.L1.C6
1 0 761.5329 795.7467 0.0000 9761.562 1283.2779
2 0 710.0958 757.4465 10.8993 7878.651 1211.2640
3 0 773.5230 800.6716 0.0000 8494.188 1287.6789
4 0 736.5125 772.1591 0.0000 8816.322 1299.2417
5 0 710.6893 760.2302 0.0000 9780.690 1222.6867
6 0 124.4696 215.8537 0.0000 0.000 172.7601

[43197 rows x 6 columns]
>
>
>
>
>
>
> h2o.deepfeatures(It2.dl.ae,It2.hex,layer=2)
|======================================================================| 100%
DF.L2.C1 DF.L2.C2 DF.L2.C3 DF.L2.C4 DF.L2.C5
1 643997.33 0 329785.6 1268433.1 0
2 533860.03 0 241803.6 1038752.3 0
3 574353.78 0 260476.4 1118815.6 0
4 591216.57 0 281142.8 1155803.8 0
5 639972.37 0 341162.4 1265156.6 0
6 17569.74 0 0.0 18817.4 0

Darren Cook

unread,
Jan 9, 2018, 4:51:09 PM1/9/18
to h2os...@googlegroups.com
>> Thank you, Darren. This is the code and following error message:
>>> h2o.deepfeatures(It2.dl.ae)
>> Error in chk.H2OFrame(x) : must be an H2OFrame
>>
>> What is the other data frame that I should feed in? How do I create this data frame?

Often it is the data you trained on.

(It really depends on what you are trying to do.)

> h2o.deepfeatures(It2.dl.ae,It2.hex,layer=1)
> |======================================================================| 100%
> DF.L1.C1 DF.L1.C2 DF.L1.C3 DF.L1.C4 DF.L1.C5 DF.L1.C6
> 1 0 761.5329 795.7467 0.0000 9761.562 1283.2779
> 2 0 710.0958 757.4465 10.8993 7878.651 1211.2640
> 3 0 773.5230 800.6716 0.0000 8494.188 1287.6789

Those are weird numbers. This is why it is recommended to use
activation="tanh" instead of the default rectifier, when making
autoencoders:

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/deep-learning.html?highlight=activation

>> h2o.deepfeatures(It2.dl.ae,It2.hex,layer=2)
> |======================================================================| 100%
> DF.L2.C1 DF.L2.C2 DF.L2.C3 DF.L2.C4 DF.L2.C5
> 1 643997.33 0 329785.6 1268433.1 0
> 2 533860.03 0 241803.6 1038752.3 0
> 3 574353.78 0 260476.4 1118815.6 0

One you switch to tanh you will get 5 numbers, that are an *abstract*
representation of your original data.

Each number does not mean anything, by itself. Taken Together the 5 are
a dimension-reduced version of your input data.

You might try looking for values that are close? E.g. using kmeans.
Or you might feed this into a supervised learning algorithm.

Darren


--
Darren Cook, Software Researcher/Developer
My New Book: Practical Machine Learning with H2O:
http://shop.oreilly.com/product/0636920053170.do
Reply all
Reply to author
Forward
0 new messages