In randomForest: Got exception 'class java.lang.AssertionError', with msg 'null'

1,345 views
Skip to first unread message

james.h...@gmail.com

unread,
Oct 23, 2015, 10:30:59 AM10/23/15
to H2O Open Source Scalable Machine Learning - h2ostream
Hello,

I'm new to H2O. When using R to make a random forest with max_depth=60, the fitting crashed halfway through:

Got exception 'class java.lang.AssertionError', with msg 'null'
java.lang.AssertionError
at hex.tree.DHistogram.scoreMSE(DHistogram.java:323)
at hex.tree.drf.DRF$DRFDecidedNode.bestCol(DRF.java:432)
at hex.tree.DTree$DecidedNode.<init>(DTree.java:357)
at hex.tree.drf.DRF$DRFDecidedNode.<init>(DRF.java:420)
at hex.tree.drf.DRF.makeDecided(DRF.java:412)
at hex.tree.SharedTree$ScoreBuildOneTree.onCompletion(SharedTree.java:367)
at jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:386)
at water.MRTask.compute2(MRTask.java:683)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1017)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)


Error: 'null'

Any ideas how to avoid this?

Avni Wadhwa

unread,
Oct 30, 2015, 2:16:31 PM10/30/15
to H2O Open Source Scalable Machine Learning - h2ostream
Hi,

If possible, could you please send in how you imported your files/parsed your data in R? 

This error might be coming up because you have not finished parsing and you started building your model. So if you did not import/parse data in the standard way, it might give you this error.

Thanks,

Avni

johns...@gmail.com

unread,
Nov 6, 2015, 9:31:37 PM11/6/15
to H2O Open Source Scalable Machine Learning - h2ostream
I've was having the same issue version h2o-3.2.*; installed the earlier version h2o.3.0.* and I'm not having the same issue there despite using the same commands on the same machine, i.e.:

features <- colnames(train)[!(colnames(train) %in% c("y")]
trainHex<-as.h2o(train)
rfHex <- h2o.randomForest(x=features,
y="LogSales",
ntrees = 500,
max_depth = 20,
nbins_cats = 1115, ## allow it to fit store ID
training_frame=trainHex)

James Hirschorn

unread,
Nov 7, 2015, 7:43:43 PM11/7/15
to H2O Open Source Scalable Machine Learning - h2ostream
lol, it looks like we are both working on the same Kaggle competition :)

Here is my almost identical code:
trainHex<-as.h2o(train)
features<-colnames(train)[!(colnames(train) %in% c("Id","Date","Sales","logSales","Customers"))]
rfHex <- h2o.randomForest(x=features,
                          y="logSales",
                          ntrees = 200,
                          max_depth = 75,
                          nbins_cats = 1115,
                          training_frame=trainHex)

It seems to be related to how much memory is available on the system, and the error only occurs sometimes.

Avni, the only "non-standard" thing about the imported data is that data.table was used:

library(data.table) 
train <- fread("../input/train.csv",stringsAsFactors = T)
store <- fread("../input/store.csv",stringsAsFactors = T)
train <- merge(train,store,by="Store")

Cheers

ccl...@gmail.com

unread,
Nov 9, 2015, 12:12:17 AM11/9/15
to H2O Open Source Scalable Machine Learning - h2ostream
Yeah, turn off asserts. No really.
It's an obscure edge-condition assert, dealing with FP error moving some row's decision from one side of the split to another, leaving a split empty or below min-rows when it was clearly not that way from the prior pass. I pushed the error "backwards" up the food chain, until now it shows up as being something wrong with the MSE calc - sometimes, but only rarely.

I've never had a really reliable test case,
and I'd love one (needs dataset, script, etc).
In the meantime, you probably can ignore it, and everything will turn out fine.

Cliff

On Friday, October 23, 2015 at 7:30:59 AM UTC-7, James Hirschorn wrote:

Spencer Aiello

unread,
Nov 9, 2015, 9:45:58 AM11/9/15
to ccl...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream
If you started H2O from R, you can turn off asserts by setting the assertions field to FALSE in the h2o.init method call. Otherwise avoid passage of '-ea' at the command line.


--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning  - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Chaitra Kamath

unread,
Dec 5, 2015, 2:43:38 AM12/5/15
to H2O Open Source Scalable Machine Learning - h2ostream, ccl...@gmail.com
Hey all, 

I am working on the same Kaggle competition. 

I made sure that all my columns are numeric / int (no character / factor columns). I also turned off 'assertion' in h2o.init(). There are no NAs in the data set. I used data frame in place of using data table though. Am not sure if that would make difference. 

Here is my chunk of code:

set.seed(1)
indices <- sample(1:nrow(final), size = round(nrow(final) * 0.4))
localH2O <- h2o.init(nthreads = -1, max_mem_size = '2G', assertion = FALSE)
trainHex <- as.h2o(final[indices, ])
features <- colnames(final)[!(colnames(final) %in% c('Sales'))]
system.time(
        rfHex <- h2o.randomForest(x = features, 
                          y = 'logSales',
                          training_frame = trainHex, 
                          ntrees = 100, 
                          max_depth = 30, 
                          nbins_cats = 1115
        )
)

I have 4G of memory and am using h2o v 3.6.0.8. After executing the above command, I am still getting the same error:
Got exception 'class java.lang.AssertionError', with msg 'null'
java.lang.AssertionError
at hex.tree.DHistogram.scoreMSE(DHistogram.java:323)
at hex.tree.DTree$DecidedNode$FindSplits.compute2(DTree.java:441)
at hex.tree.DTree$DecidedNode.bestCol(DTree.java:421)
at hex.tree.DTree$DecidedNode.<init>(DTree.java:449)
at hex.tree.SharedTree.makeDecided(SharedTree.java:489)
at hex.tree.SharedTree$ScoreBuildOneTree.onCompletion(SharedTree.java:436)
at jsr166y.CountedCompleter.__tryComplete(CountedCompleter.java:425)
at jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:383)
at water.MRTask.compute2(MRTask.java:683)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1069)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

Can someone please provide some pointers?

Thanks, 
Chai
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+unsubscribe@googlegroups.com.

ccl...@gmail.com

unread,
Dec 5, 2015, 1:11:02 PM12/5/15
to H2O Open Source Scalable Machine Learning - h2ostream, ccl...@gmail.com
Apparently a bug with h2o.init passing the "assertion=FALSE" - you are still getting an assertion error.
You might try manually starting java, being sure to NOT pass "-ea" (-enable asserts).

Cliff


> I made sure that all my columns are numeric / int (no character / factor columns). I also turned off 'assertion' in h2o.init(). There are no NAs in the data set. I used data frame in place of using data table though. Am not sure if that would make difference. 

steg...@googlemail.com

unread,
Dec 8, 2015, 12:11:16 PM12/8/15
to H2O Open Source Scalable Machine Learning - h2ostream, ccl...@gmail.com
thanks for the info!
afraid to ask, but how exactly would this command look like?

greets
Reply all
Reply to author
Forward
0 new messages