Recurrent Neural Networks in H2O for time series prediction?

2,780 views
Skip to first unread message

pablo...@gmail.com

unread,
Jan 26, 2016, 1:46:10 AM1/26/16
to H2O Open Source Scalable Machine Learning - h2ostream
Hi guys,

I'm new with H2O. Have heard great things about it.
I work predominantly on time series forecasting. For this, it is well known that recurrent neural networks like Jordan, Elman, NarX, etc with memory capabilities (time windows) perform very well.

Question:
 - Are these architectures available in H2O.ai? If so, do you have examples of its use?
 - If not, can we conclude that H2O it is not yet fine tuned for time series prediction?

Let me know your comments,

PM 

Erin LeDell

unread,
Jan 26, 2016, 9:56:12 PM1/26/16
to pablo...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream
PM,

We do not support RNNs at this time.  You can read more about our deep learning here: https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/booklets/v2_2015/PDFs/online/DeepLearning_Vignette.pdf

Feel free to email me off list about your use case.  I am interested in collecting time-series use-cases so that we can determine the best path for supporting time-series moving forward.  It's on our roadmap.

Best,
Erin
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Erin LeDell Ph.D.
Statistician & Machine Learning Scientist | H2O.ai

pablo...@gmail.com

unread,
Jan 28, 2016, 12:41:21 AM1/28/16
to H2O Open Source Scalable Machine Learning - h2ostream, pablo...@gmail.com
Erin, 

What a bummer!. I am working with a Hedge Fun, for financial time series prediction. As you can see here, we have our own map reduce with spot instances implementation in AWS, however with the advent of spark, databricks,  and the optimization and elegance of H2O we want to dramatically improve predictions, reduce cost and training times.

Our model uses a Elman network with levemberg-Marquad training, we are testing right now on LSTM and GRU using TensorFlow, however it is still single node library. 

We were looking for a distributed (over spark) library and I love H2O, but the RNNs is a must for time series.

I also have another customer in south america that has a lot of weather data and its looking to improve predictions. Without RNNs i cannot use H2O.

Can you please share any time frame before you release RNNs? It would help me greatly creating a plan here.

Pablo Marin

Sasha Goodman

unread,
Feb 13, 2016, 2:52:21 AM2/13/16
to H2O Open Source Scalable Machine Learning - h2ostream, pablo...@gmail.com
Erin,

In the first Stanford course on deep learning for natural language processing, Recurrent NNs are mentioned as one of the most powerful methods, and also one of the simplest. Simplicity should be important for general sequences. I bet you know Socher was a student of Andrew Ng and Chris Manning. 


A recent paper supports the case they are one of the best models for sequences, especially when they run both forward and backward on the sequence:


That paper also has a way of applying RNNs on clauses and sentences in a sort of nested fashion, reminiscent of CNNs applied to text.

PS I'm a big fan of confidence intervals via bootstrap type methods and noticed in your research you worked on that and hope you or your team engineer some semi-efficient tools that can create confidence intervals on predictions from machine learning methods.

Erin LeDell

unread,
Feb 16, 2016, 2:56:07 PM2/16/16
to Sasha Goodman, H2O Open Source Scalable Machine Learning - h2ostream, pablo...@gmail.com
Hi Sasha,

RNNs are definitely useful and something we are looking at investing in.

Yes, computationally efficient confidence intervals is an area I have done research in.  The bootstrap is great because it's a general purpose tool, but can still be a pain computationally becuase no matter what, you have to train many models, even if its only on a subsample of the data (for example, as in Bag of Little Bootstraps (BLB)).  The efficient confidence interval (CI) methods I work on are based on asymptotic theory (influence function/curves) rather than resampling theory -- they require no retraining of models, which is why they are the most computationally efficient.  The only drawback I see in influence-curve based methods is that a different algo/implementation must be created for each estimator that you are trying to create a CI for.

I worked a lot with imbalanced binary response data problems (where I used AUC to evaluate models) in my applied research, so I was particularly interested in generating CIs for cross-validated AUC estimates.  I have implemented that in the cvAUC R package (https://github.com/ledell/cvAUC) and yes, I do plan to integrate this work into H2O in the future (for AUC and other model performance metrics).  You can read more about this in the last chapter of my dissertation, if you are interested: http://www.stat.berkeley.edu/~ledell/papers/ledell-phd-thesis.pdf, or in the article here: http://projecteuclid.org/euclid.ejs/1437742107

-Erin
Reply all
Reply to author
Forward
0 new messages