12-dic 06:50:19.873 172.20.34.149:54321 2408 FJ-9-41 ERRR WATER: + java.lang.IllegalArgumentException: Incompatible column: 'hnd_price', expected (trained on) numeric, was passed a categorical + at water.Model.adapt(Model.java:195) + at water.Model.adapt(Model.java:232) + at water.Model.score(Model.java:99) + at hex.gbm.SharedTreeModelBuilder$Score.doIt(SharedTreeModelBuilder.java:383) + at hex.drf.DRF.doScoring(DRF.java:166) + at hex.drf.DRF.buildModel(DRF.java:149) + at hex.gbm.SharedTreeModelBuilder.buildModel(SharedTreeModelBuilder.java:147) + at hex.drf.DRF.exec(DRF.java:111) + at water.Job$4.compute2(Job.java:482) + at water.H2O$H2OCountedCompleter.compute(H2O.java:668) + at jsr166y.CountedCompleter.exec(CountedCompleter.java:429) + at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) + at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) + at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) + at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Antonio Vidal Vidal | Innovation Manager
T/ + 34 986 410 091 (ext) 240
M/ + 34 673 214 580
www.optaresolutions.com
--
You received this message because you are subscribed to the Google Groups "H2O Users - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Hi Antonio,We are adding a procedure to do this in the UI early in the new year.Meanwhile, we will send you a recipe that allows you to do this from R.
A quick workaround is to turn the integers into strings by changing the datafrom '1', '2', '3'… to 'val1', 'val2', 'val3', …We automatically detect strings and turn them into categoricals.
setwd(normalizePath(dirname(R.utils::commandArgs(asValues=TRUE)$"-f")))source('../../findNSourceUtils.R')test.as.factor.basic <- function(conn) {hex <- h2o.uploadFile(conn, locate("../smalldata/cars.csv"), key = "cars.hex")hex[,"cylinders"] <- as.factor(hex[,"cylinders"])expect_true(is.factor(hex[,"cylinders"])[1])testEnd()}
doTest("Test the as.factor unary operator", test.as.factor.basic)
--
You received this message because you are subscribed to the Google Groups "H2O Users - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
In the example you gave, we strip the quotes and make them ints. It's an issue that we did not run into in the field, HEX-453, yet.If you are trying the shortcut to enum -I'd try really concatenating a single character in the front of the values or use as.factor via R.
fyi, String handling with large cardinality is currently broken and being fixed (HEX-959)
thanks &
Let us know if you need more help, Sri
- 65k uniques is "large" unique strings at the moment.
I'm afraid, we ran into something else here -Would it be possible to describe -- Version you are using (See: Admin > Inspect Logs - the git sha)
- Is this data something you can share so we can try & fix it from here?
> head(TrainingNames)
Year FirstName Gender Freq
1 1880 Mary F 7065
2 1880 Anna F 2604
3 1880 Emma F 2003
4 1880 Elizabeth F 1939
5 1880 Minnie F 1746
6 1880 Margaret F 1578
> summary(TrainingNames)
Year FirstName Gender Freq
Min. :1880 Francis: 268 F:1062432 Min. : 5.0
1st Qu.:1948 James : 268 M: 729659 1st Qu.: 7.0
Median :1981 Jean : 268 Median : 12.0
Mean :1972 Jesse : 268 Mean : 186.1
3rd Qu.:2000 Jessie : 268 3rd Qu.: 32.0
Max. :2013 John : 268 Max. :99674.0
(Other):1790483
Row# Year FirstName Gender Freq
Row 0 1880 Mary F 7065
Row 1 1880 Anna F 2604
Row 2 1880 Emma F 2003
Row 3 1880 Elizabeth F 1939
Row 4 1880 Minnie F 1746
Row 5 1880 Margaret F 1578
Row 6 1880 Ida F 1472
Row 7 1880 Alice F 1414
Row 8 1880 Bertha F 1320
Row Year FirstName Gender Freq
Change Type As Factor As Factor As Factor
Type Int Int Enum Int
Min 1880 - 5
Max 2013 - 99674
Mean 1971.852 � 186.05
Std Dev 33.358 � 1578.377
Cardinality 2
Missing 1792091
0 1880 - F 7065
1 1880 - F 2604
2 1880 - F 2003
3 1880 - F 1939
4 1880 - F 1746
5 1880 - F 1578
6 1880 - F 1472
7 1880 - F 1414
8 1880 - F 1320
9 1880 - F 1288
10 1880 - F 1258
That's assuming that double quoting something makes it a string.
That's not true in h2o.
While not updated in a while, this parser spec for h2o is reasonable:
https://github.com/0xdata/h2o/wiki/Parser-Specification
"The whitespace and quote stripping rules, imply that a pure number can never be used or interpreted as a string."
You can't make a number into a enum, by quoting, in h2o.
I think this might explain why the two files had separate type guessing.
---------------------------
-kevin
I just noticed Chris Kuethe had posted from a year ago, and Zach et.al had updated this post recently. so my reply to Chris probably is too late now! In any case, the info might be useful to folks, even today! -kevin