Exception in R on predict()

433 views
Skip to first unread message

gekaprog

unread,
Dec 19, 2016, 2:52:10 AM12/19/16
to H2O Open Source Scalable Machine Learning - h2ostream
I imported into R a model built using H2O web interface (Flow). This was a distributed random forest model and the following data set. 

H2O is throwing an exception on h2o.predict(). As far as I can tell the data-frame is spot on identical
to the one used in H2O UI. I downloaded from the source and inspected it. 

Did anybody run into a similar issue and/or tried exporting/importing models?

h2o = h2o.init(nthreads=4)
  
# load previosly exported model
model=h2o.loadModel("user/mymodel")

# load public data set
df1<-read.csv("tmp/ad.data", header=FALSE)

# predict
df.test.h2o<-as.h2o(df.test, destination_frame="ads.hex")
fit=h2o.predict(object=model, newdata=df.test.h2o)


// R output
java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set
java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set
        at hex.Model.adaptTestForTrain(Model.java:915)
        at hex.Model.adaptTestForTrain(Model.java:747)
        at hex.Model.score(Model.java:959)
        at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:345)
        at water.H2O$H2OCountedCompleter.compute(H2O.java:1217)
        at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
        at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
        at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
        at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
        at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Error: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common ...

// H2O console output 
12-19 02:47:02.554 192.168.1.138:54321   12396  #36705-19 INFO: GET /3/Frames/ads.hex, parms: {row_count=10}
12-19 02:47:24.011 192.168.1.138:54321   12396  #36705-14 INFO: POST /4/Predictions/models/drf-32730a7d-385d-4667-8144-64f362af9857/frames/ads.hex, parms: {}
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common with the training set
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at hex.Model.adaptTestForTrain(Model.java:915)
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at hex.Model.adaptTestForTrain(Model.java:747)
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at hex.Model.score(Model.java:959)
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at water.api.ModelMetricsHandler$1.compute2(ModelMetricsHandler.java:345)
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at water.H2O$H2OCountedCompleter.compute(H2O.java:1217)
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
12-19 02:47:24.058 192.168.1.138:54321   12396  FJ-1-11   ERRR:         at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)


Erin LeDell

unread,
Dec 20, 2016, 6:58:57 PM12/20/16
to gekaprog, H2O Open Source Scalable Machine Learning - h2ostream

Hi there,

Instead of using read.csv and then as.h2o, you can load the data directly into h2o using h2o.importFile.

df.test.h2o <- h2o.importFile("tmp/ad.data", destination_frame="ads.hex")

read.csv and h2o.importFile will provide different default variable names ("V1", "V2", ... vs "C1", "C2", etc) which looks like where the issue is from. I see an error about column names:

Error: java.lang.IllegalArgumentException: Test/Validation dataset has no columns in common ...

Best,
Erin
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
Erin LeDell Ph.D.
Statistician & Machine Learning Scientist | H2O.ai

Hashim Chaudhry

unread,
Nov 25, 2022, 4:04:18 AM11/25/22
to H2O Open Source Scalable Machine Learning - h2ostream
Thanks a lot Erin, this sorted out the issue for me :)
I was using pandas in python instead of the native function.
Reply all
Reply to author
Forward
0 new messages