Is it possible to get a more verbose output of exactly what 'model.train()' is doing?

105 views
Skip to first unread message

kev...@gmail.com

unread,
Jul 12, 2018, 2:20:48 AM7/12/18
to H2O Open Source Scalable Machine Learning - h2ostream
I have just begun using H2O through the Python module. I am attempting to replicate, and hopefully replace, a Recommender System I built using a deep autoencoder in TensorFlow.

To ensure the model works correctly, initially I have just set a single small hidden layer and trained it for just 1 epoch. My data set has ~700 examples each with ~9000 features.

A single epoch using my TensorFlow model on a GPU takes a couple of seconds. The same for the H2O model (not on a GPU) takes about 6 minutes. However, I am not dissatisfied as the loss after a single epoch on the H20 model is *significantly* lower. It is juts a little frustrating seeing

deeplearning Model Build progress: |███████████████████████████| 100%

sitting static for a couple of minutes without knowing exactly what is happening behind scenes - I can only imagine this would get far more frustrating when training for a large number of epochs.

Is there any way to get a more verbose printout of exactly what H2O and the model is doing throughout training?

Darren Cook

unread,
Jul 12, 2018, 3:25:11 AM7/12/18
to h2os...@googlegroups.com
> ...a Recommender System I built using a deep autoencoder in
> TensorFlow.
>
> To ensure the model works correctly, initially I have just set a
> single small hidden layer and trained it for just 1 epoch. My data
> set has ~700 examples each with ~9000 features.
>
> A single epoch using my TensorFlow model on a GPU takes a couple of
> seconds. The same for the H2O model (not on a GPU) takes about 6
> minutes.

Are you able to show your code? (it'd be great to see the Tensorflow
code, too, for comparison). And maybe if you can show something about
the size of the model that has been built?

I'm wondering if some of your 9000 features are categorical types?
One-hot encoding could turn that into a very large network (which might
explain the long training time). Similarly if you are not manually doing
something with them in Tensorflow it might explain both the quick
training time and poor performance.

> deeplearning Model Build progress: |███████████████████████████|
> 100%

There was a bug that H2O would show 100% while it was working on the
last epoch/tree of a model. (Or was it the last model of a grid?)

> Is there any way to get a more verbose printout of exactly what H2O
> and the model is doing throughout training?

The best way is to go to Flow ( 127.0.0.1:54321 ). You will be able to
monitor model performance, as well as check the flow meter to see the
CPUs are all being kept busy.

Darren

kev...@gmail.com

unread,
Jul 12, 2018, 4:06:30 AM7/12/18
to H2O Open Source Scalable Machine Learning - h2ostream
On Thursday, 12 July 2018 11:25:11 UTC+4, Darren Cook wrote:
> > ...a Recommender System I built using a deep autoencoder in
> > TensorFlow.
> >
> > To ensure the model works correctly, initially I have just set a
> > single small hidden layer and trained it for just 1 epoch. My data
> > set has ~700 examples each with ~9000 features.
> >
> > A single epoch using my TensorFlow model on a GPU takes a couple of
> > seconds. The same for the H2O model (not on a GPU) takes about 6
> > minutes.
>
> Are you able to show your code? (it'd be great to see the Tensorflow
> code, too, for comparison). And maybe if you can show something about
> the size of the model that has been built?

Here is the most simple version of the TensorFlow model:

https://github.com/KevOBrien/Autoencoder/blob/master/Model.py

What do you mean by showing something about the size of the model? The physical sizes of the hidden layers? I am testing it initially with just a single layer of 500 nodes.

> I'm wondering if some of your 9000 features are categorical types?
> One-hot encoding could turn that into a very large network (which might
> explain the long training time). Similarly if you are not manually doing
> something with them in Tensorflow it might explain both the quick
> training time and poor performance.

All the features are real-valued numbers from 0-5. For simple initial testing again, I am using the well-known movie lens data set of user-movie ratings to build an experimental recommender system before using it on my own data set.

> > deeplearning Model Build progress: |███████████████████████████|
> > 100%
>
> There was a bug that H2O would show 100% while it was working on the
> last epoch/tree of a model. (Or was it the last model of a grid?)

I haven't set the H2O model to do any sort of grid search - does it do this by default?

> > Is there any way to get a more verbose printout of exactly what H2O
> > and the model is doing throughout training?
>
> The best way is to go to Flow ( 127.0.0.1:54321 ). You will be able to
> monitor model performance, as well as check the flow meter to see the
> CPUs are all being kept busy.

I have tied Flow but still don't think it gives enough interpretability or insights about what its doing. The H2O model's loss is very, very suspisciously low after a single epoch - I'd like to know what kind of tricks and experimenting it is doing behind the scenes.

Here is the movie ratings data set if you would like to experiment for yourself:

https://drive.google.com/file/d/12e1kUjhTVR7NzzyvLXWY5Gkqs6MBS9WF/view?usp=sharing

Many thanks,

Kevin

Darren Cook

unread,
Jul 12, 2018, 4:39:56 AM7/12/18
to H2O Open Source Scalable Machine Learning - h2ostream
> Here is the most simple version of the TensorFlow model:
>
> https://github.com/KevOBrien/Autoencoder/blob/master/Model.py

Do you have the H2O code too? Someone might spot something you are
doing/not doing that is important.

> What do you mean by showing something about the size of the model? The physical sizes of the hidden layers? I am testing it initially with just a single layer of 500 nodes.

But your input layer will be 9000+ neurons, and the output layer the
same size. If you print the model you will get told that, how many
weights, etc.

Darren

kev...@gmail.com

unread,
Jul 12, 2018, 4:44:51 AM7/12/18
to H2O Open Source Scalable Machine Learning - h2ostream
On Thursday, 12 July 2018 12:39:56 UTC+4, Darren Cook wrote:
> > Here is the most simple version of the TensorFlow model:
> >
> > https://github.com/KevOBrien/Autoencoder/blob/master/Model.py
>
> Do you have the H2O code too? Someone might spot something you are
> doing/not doing that is important.

Well the H2O code is extremely straightforward:

data = h2o.H2OFrame(df) # df is the same file I linked above imported into pandas
colNums = list(range(0, df.shape[1]))

ae = H2OAutoEncoderEstimator(activation="Tanh", hidden=[500], epochs=1)
ae.train(x=colNums, training_frame=data)

print(ae.params) # Here I just print everything I am aware is available to be printed
print(ae)
print(ae.model_performance(train=True))
print(ae.mse(train=True))

> > What do you mean by showing something about the size of the model? The physical sizes of the hidden layers? I am testing it initially with just a single layer of 500 nodes>

> But your input layer will be 9000+ neurons, and the output layer the
> same size. If you print the model you will get told that, how many
> weights, etc.

Yes, the input and output have ~9000 neurons, and then a single hidden layer of 500

Reply all
Reply to author
Forward
0 new messages