h2o in hadoop/hive

anixa...@gmail.com

unread,

Jun 3, 2016, 6:56:06 AM6/3/16

to H2O Open Source Scalable Machine Learning - h2ostream

Hi All,

I have built a classification model in R/h2o and I have taken the binary model out. Now I want to use this model/binary file to deploy this model in hadoop production environment.

I kept the binary model in hdfs and my production data is residing in hive. Now, I want to execute R script in hive, load/connect to h2o, load h2o model and deploy on production table (in hive).

As a set-up I have installed R and h2o in all machines in hadoop. My present status:

i) Able to execute R in hive
ii) Able to load/connect to h2o

But I am not able to load h2o model from hdfs. I used the following command:

model <- h2o.loadModel("hdfs://model_folder/DRF_model_ID")

DRF_model_ID: model folder where all model binaries are stored.

Any suggestions/advice would be helpful.

Regards,
Anirban Ghosh

Tom Kraljevic

unread,

Jun 3, 2016, 12:24:17 PM6/3/16

to anixa...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream

Hi,

Sorry, without actual output of the failure there isn’t much guidance we can give.

But here are a couple of tests you might want to take a look at which exercise saving and loading models.

https://github.com/h2oai/h2o-3/blob/master/h2o-r/tests/testdir_jira/runit_hex_1775_save_load.R
https://github.com/h2oai/h2o-3/blob/master/h2o-r/tests/testdir_hdfs/runit_INTERNAL_HDFS_model_export.R

Tom

> --
> You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

anirban ghosh

unread,

Jun 6, 2016, 5:54:46 AM6/6/16

to Tom Kraljevic, H2O Open Source Scalable Machine Learning - h2ostream

Hi All,

This is the error message I am getting:

Diagnostic Messages for this Task:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"X1":99.10917,"X2":0.0,"X3":11.0,"X4":100.0,"X5":100.0,"X6":80.20368,"X7":0.0,"X8":0.0,"X9":0.0,"X10":0.66543907,"X11":100.0,"X12":47.20982,"X13":12.0,"X14":0.047311783,"X15":0.0,"X16":0.0,"X17":100.0,"X18":100.0,"X19":100.0,"X20":5.8290815,"X21":96.07143,"X22":0.0,"X23":7.0,"X24":50.0,"X25":20.85,"X26":18.0,"X27":null,"X28":"Phone","X29":"-1","X30":"MR","X31":"2G","X32":"2G","X33":null,"X34":null,"X35":null}

at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:195)

at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)

at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"X1":99.10917,"X2":0.0,"X3":11.0,"X4":100.0,"X5":100.0,"X6":80.20368,"X7":0.0,"X8":0.0,"X9":0.0,"X10":0.66543907,"X11":100.0,"X12":47.20982,"X13":12.0,"X14":0.047311783,"X15":0.0,"X16":0.0,"X17":100.0,"X18":100.0,"X19":100.0,"X20":5.8290815,"X21":96.07143,"X22":0.0,"X23":7.0,"X24":50.0,"X25":20.85,"X26":18.0,"X27":null,"X28":"Phone","X29":"-1","X30":"MR","X31":"2G","X32":"2G","X33":null,"X34":null,"X35":null}

at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:550)

at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)

... 8 more

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20001]: An error occurred while reading or writing to your custom script. It may have crashed with an error.

at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:410)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)

at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)

at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95)

at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)

at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)

... 9 more

Caused by: java.io.IOException: Broken pipe

at java.io.FileOutputStream.writeBytes(Native Method)

at java.io.FileOutputStream.write(FileOutputStream.java:345)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)

at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)

at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)

at java.io.DataOutputStream.write(DataOutputStream.java:88)

at org.apache.hadoop.hive.ql.exec.TextRecordWriter.write(TextRecordWriter.java:54)

at org.apache.hadoop.hive.ql.exec.ScriptOperator.processOp(ScriptOperator.java:378)

... 15 more

FAILED: Execution Error, return code 20001 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. An error occurred while reading or writing to your custom script. It may have crashed with an error.

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL

Total MapReduce CPU Time Spent: 0 msec

Tom Kraljevic

unread,

Jun 6, 2016, 10:27:57 AM6/6/16

to anirban ghosh, H2O Open Source Scalable Machine Learning - h2ostream

hi,

this question should be asked in the hive community, rather than the h2o community.