Importing an algorithm implemented in Python into H2O Flow

867 views
Skip to first unread message

Helena Goldfarb

unread,
May 2, 2016, 1:39:20 PM5/2/16
to H2O Open Source Scalable Machine Learning - h2ostream
Hello,
I have Logistic Regression model implemented in Python. I would like it to be available in H2O Flow. Is there a way to import this model into H2O? Is there a documentation or an example that can help?

Thank you,
Helena

Lauren DiPerna

unread,
May 2, 2016, 2:13:14 PM5/2/16
to Helena Goldfarb, H2O Open Source Scalable Machine Learning - h2ostream
point your browser to local host 54321 (paste the following into your browser: http://localhost:54321/flow/index.html)

To see the models you already created in python, click on 'Models' in the tool bar and then select 'List All Models' to see the models you've created.

you can read more about Flow in our docs (http://h2o-release.s3.amazonaws.com/h2o/rel-turchin/3/docs-website/h2o-docs/index.html#Flow)

cheers,

Lauren

--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Goldfarb, Helena (GE Global Research, US)

unread,
May 2, 2016, 2:30:55 PM5/2/16
to Lauren DiPerna, H2O Open Source Scalable Machine Learning - h2ostream

Hi Lauren,

Thank you for the quick response. I think I didn’t phrase my question correctly. I have algorithms that I have developed outside H2O. Some of them are in Java, some of them are in R and some are in Python.

 

I am impressed with the H2O framework, but would like to add my ML algorithms to it. These are algorithms that are not trained on data yet. Some of them are classifiers/regressors, some of them are feature engineering algorithms.

 

What is the process of making these algorithms available in H2O Flow, so that when I click on ‘buildModel’ they will be part of “Select an algorithm” drop-down.

 

Do I need to rewrite them using H2O library? Is it possible to write a thin wrapper for each of the algorithms without rewriting them? What is the api these algorithms should support?

 

Thank you for your help,

Helena

rp...@0xdata.com

unread,
May 2, 2016, 6:56:33 PM5/2/16
to H2O Open Source Scalable Machine Learning - h2ostream, lau...@h2o.ai, gold...@ge.com

On Monday, May 2, 2016 at 11:30:55 AM UTC-7, Goldfarb, Helena (GE Global Research, US) wrote:

Hi Lauren,

Thank you for the quick response. I think I didn’t phrase my question correctly. I have algorithms that I have developed outside H2O. Some of them are in Java, some of them are in R and some are in Python.

 

I am impressed with the H2O framework, but would like to add my ML algorithms to it. 


Helena,

That's great!  We're really happy to support you in this.

Flow exposes algorithms from the H2O framework, which (I'm sure you know) run on the distributed compute cluster on a set of JVMs.  Currently all of those algorithms are written in Java, although other JVM languages should also work.  Flow dynamically discovers the algorithms from the server, so new algorithms do not require changes to Flow.


There's a set of three blog posts from November 2014 on adding algorithms to H2O.  You can find them here:


We haven't walked through the examples recently, so they might need a bit if tweaking.  If you take this on please don't hesitate to ask for help, so we can update the posts.


We've talked about working up an example of building an algorithm in Jython, but haven't done so yet.  The main issue that I foresee is that the limitations of Jython (especially in terms of the inability to call C-based Python libraries line NumPy and Pandas) will make it less than desirable to Python folks.  For building new ML algos those obviously aren't required, though.  I'd be happy to work with you if you want to add an algorithm written in Jython inside H2O.


We have had a few customers add their own algorithms and they worked in Flow just fine, except for one current limitation: parameters that specify columns.  We are working on fixing this limitation in Flow, but in the meantime there's a workaround that I can help you with if you need it.


If you want to access your new algo from R or Python you'll need to add a bit of wrapper code.  We can work with you on that, and then document it for others.  


Regarding feature engineering functions, you should probably add them as Rapids functions rather than model builders.  Rapids is the feature engineering / munging language used by Flow, Python and R.  

A great example of adding a feature engineering function to Rapids is the following, which adds a string entropy function:


You can see the server function being added, the small functions to add it to R and Python, and a PyUnit test.


You can call Rapids expressions from Flow.  It's not super pretty right now since Rapids has been in flux, but it's certainly workable.  From Prithvi (the author of Flow):

flow.context.requestExec "expr goes here", (err, result) -> print if err then err else result


To make it easier to call  run this once:


callRapids = (expr) -> flow.context.requestExec expr, (err, result) -> print if err then err else result


and then you can do this several times:


callRapids 'the expr'


Let me know if you have any more questions! 

Goldfarb, Helena (GE Global Research, US)

unread,
May 3, 2016, 9:02:57 AM5/3/16
to rp...@0xdata.com, H2O Open Source Scalable Machine Learning - h2ostream, lau...@h2o.ai

Thank you. This helps, I will read the blogs and let you know if I have follow up questions.

 

I have another related questions. Is it possible to export a trained algorithm as a RESTful service to be deployed outside H2O framework?

 

Thank you,

Helena

rp...@0xdata.com

unread,
May 3, 2016, 9:16:21 PM5/3/16
to H2O Open Source Scalable Machine Learning - h2ostream, rp...@0xdata.com, lau...@h2o.ai, gold...@ge.com
On Tuesday, May 3, 2016 at 6:02:57 AM UTC-7, Goldfarb, Helena (GE Global Research, US) wrote:

Thank you. This helps, I will read the blogs and let you know if I have follow up questions.


Great!
 

I have another related questions. Is it possible to export a trained algorithm as a RESTful service to be deployed outside H2O framework?


There are a couple general approaches to getting predictions from H2O: 
  1. calling the predictions REST API that's built-in to H2O (in-H2O scoring), or
  2. exporting your model as Java code (a POJO) and incorporating that into an application (which could, in turn, expose it as a raw REST API or as a higher-level service).
In-H2O scoring requires that you have an instance of H2O running, but it doesn't need to be the instance that you trained your model on or a cluster; it can be a very lightweight instance.  You can export the binary model and run it in another H2O of the same version at any time in the future.  

POJO scoring requires a thin application to sit in front of a standalone Java class.


For in-H2O scoring there are many examples in the codebase and demos.  The steps are:
  1. save the model to disk from R, Python or Flow, 
  2. reload it in the future into another H2O instance,
  3. call predict()
See loadModel() in the R docs here, save_model() and load_model() in the Python docs here, or exportModel and importModel in the Flow help system.  To do predictions you'll call the /3/Predictions REST API, documented here, possibly by using the supplied Java or C# REST API bindings.  Alternatively you can of course call predict from R or Python.


For POJO scoring, the steps are:
  1. export the model POJO to disk,
  2. link it to your app and run it,
  3. call the REST API exposed by your app.
See download_pojo() in the R docs heredownload_pojo() in the Python docs here, or the Download POJO button in Flow.  To do predictions you'll call the REST API from your wrapper web app.

There's a great example of this workflow here.  The code that accepts the REST API request and calls the standalone model POJO is here.

Let me know if you have any questions!



Goldfarb, Helena (GE Global Research, US)

unread,
May 9, 2016, 9:53:03 PM5/9/16
to rp...@0xdata.com, H2O Open Source Scalable Machine Learning - h2ostream, lau...@h2o.ai

I am following this example - http://blog.h2o.ai/2014/11/hacking-algorithms-into-h2o-kmeans/. I think lots of things changed since it has been written. For examples,

 

1.       This paragraph refers to ExampleHandler.java file, but there’s no such file – “Then I copied the three GUI/REST files in h2o-algos/src/main/java/hex/schemaswith Example in the name (ExampleHandler.java, ExampleModelV2.java,ExampleV2) to their KMeans2* variants.”

2.       This file doesn’t exist either – “I also copied the h2o-algos/src/main/java/hex/api/ExampleBuilderHandler.javafile to its KMeans2variant.”

3.       The lines referred in this sentence are not there – “I also dove into h2o-app/src/main/java/water/H2OApp.java and copied the two Example lines and made their KMeans2 variants.”

 

I put together a simple model example with a unit test. The unit test runs fine. Now I want the model to show up in the Flow web-ui. What should I do?

 

Thank you,

Helena

rp...@0xdata.com

unread,
May 11, 2016, 7:06:11 PM5/11/16
to H2O Open Source Scalable Machine Learning - h2ostream, rp...@0xdata.com, lau...@h2o.ai, gold...@ge.com


On Monday, May 9, 2016 at 6:53:03 PM UTC-7, Goldfarb, Helena (GE Global Research, US) wrote:

I am following this example - http://blog.h2o.ai/2014/11/hacking-algorithms-into-h2o-kmeans/. I think lots of things changed since it has been written. 


Yeah, I was expecting those blog posts to be somewhat out of date.  Sorry about that.  

I've created a ticket to update them: PUBDEV-2907

The individual model builder handler classes are no longer required.   

You'll want to register your algo similarly to how it's done here: hex/api/Register.java but inside your H2OApp equivalent:  water/H2OApp.java

vijay...@gmail.com

unread,
Jun 14, 2017, 8:47:02 AM6/14/17
to H2O Open Source Scalable Machine Learning - h2ostream, rp...@0xdata.com, lau...@h2o.ai, gold...@ge.com
Hey, I am facing the same problem that I cannot make my algorithm visible in the flow. Did you find the solution? Is there any updated blogs to add the algorithm to the H2O library?

Also it would be helpful if you could tell me how can we integrate it into R.

Thank you,
Vijay
Reply all
Reply to author
Forward
0 new messages