Unable to modify data in H2O frame.

224 views
Skip to first unread message

bloog...@gmail.com

unread,
Oct 5, 2015, 2:38:18 PM10/5/15
to H2O Open Source Scalable Machine Learning - h2ostream
I'm trying to adjust weights used in my panel regression.
But I get a 'Can't unlock: Not locked!' error.

Are there other ways to modify data in H2O?

-Brian




    dts = as.data.frame(curDatH$Date)                                           ### h2o ->R
    tsWts = cal.bdayDiff( dts$Date, useUpTo, cal)
    tsWts = 2 ^ -( tsWts / 252 )                                                      ### Calc exponential weights
    tsWtsH = as.h2o( localH20, tsWts, destination_frame= "wts")   


    curDatH$Wts <- curDatH$Wts * tsWtsH                                    ### Adjust weights in h2o


Got exception 'class java.lang.AssertionError', with msg 'Can't unlock: Not locked!'
java.lang.AssertionError: Can't unlock: Not locked!
        at water.Lockable$Unlock.atomic(Lockable.java:181)
        at water.Lockable$Unlock.atomic(Lockable.java:176)
        at water.TAtomic.atomic(TAtomic.java:17)
        at water.Atomic.compute2(Atomic.java:55)
        at water.Atomic.fork(Atomic.java:39)
        at water.Atomic.invoke(Atomic.java:31)
        at water.Lockable.unlock(Lockable.java:171)
        at hex.Model$Parameters.read_unlock_frames(Model.java:170)
        at hex.tree.SharedTree$Driver.compute2(SharedTree.java:231)
        at water.H2O$H2OCountedCompleter.compute(H2O.java:1002)
        at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
        at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
        at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
        at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
        at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)





### if it worked I would then fit the model

    fitH = h2o.gbm(y = dependent, x = independent, weights_column = weights,  training_frame = curDatH,
                ntrees = 100, max_depth = 10, min_rows = 200, nfolds=3, learn_rate = 0.1,
                modelID = paste("GBM",fitDt,sep="_") )

    summary(fitH)

bloog...@gmail.com

unread,
Oct 13, 2015, 10:40:17 AM10/13/15
to H2O Open Source Scalable Machine Learning - h2ostream
Still confused on this...

Spencer Aiello

unread,
Oct 13, 2015, 11:49:58 AM10/13/15
to bloog...@gmail.com, H2O Open Source Scalable Machine Learning - h2ostream
Hi Brian,

Do happen to have the logs available? Or would it be alright to share a snippet of your data so that we may attempt a reproduction? You may pm it to me if you like (spe...@h2o.ai).

Spencer
--
You received this message because you are subscribed to the Google Groups "H2O Open Source Scalable Machine Learning - h2ostream" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

bloog...@gmail.com

unread,
Oct 15, 2015, 5:40:19 PM10/15/15
to H2O Open Source Scalable Machine Learning - h2ostream, bloog...@gmail.com

Hi Spencer,
I found the logs stored in /tmp/

Interestingly it seems that it was complaining on 'Can't unlock: Not locked!'

the logs show something different:
#######h2o_127.0.0.1_54321-3-info.log"
10-14 22:35:40.818 127.0.0.1:54321       32041  FJ-0-39   ERRR: _min_rows: The dataset size is too small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only 19.38260765603209.

#######h2o_127.0.0.1_54321-2-debug.log
10-14 22:35:40.817 127.0.0.1:54321       32041  FJ-0-39   INFO: Building H2O GBM model with these parameters:
10-14 22:35:40.817 127.0.0.1:54321       32041  FJ-0-39   INFO: {"_model_id":null,"_train":{"name":"GBM_model_R_1444875645239_55_cv_1_subset_5422_train","type":"Key"},"_valid":{"name":"GBM_model_R_1444875645239_55_cv_1_subset_5422_valid","type":"Key"},"_nfolds":0,"_keep_cross_validation_predictions":false,"_fold_assignment":"AUTO","_distribution":"AUTO","_tweedie_power":1.5,"_ignored_columns":["Date","ID","Int","Ido"],"_ignore_const_cols":true,"_weights_column":"weights","_offset_column":null,"_fold_column":null,"_score_each_iteration":false,"_response_column":"sumRet","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_hit_ratio_k":10,"_max_confusion_matrix_size":20,"_checkpoint":null,"_ntrees":100,"_max_depth":10,"_min_rows":10.0,"_nbins":20,"_nbins_cats":1024,"_r2_stopping":0.999999,"_seed":-2484894243866892320,"_nbins_top_level":1024,"_build_tree_one_node":false,"_initial_score_interval":4000,"_score_interval":4000,"_learn_rate":0.1}
10-14 22:35:40.818 127.0.0.1:54321       32041  FJ-0-39   INFO: Dropping ignored columns: [Date, ID, Intercept, Idio]
10-14 22:35:40.818 127.0.0.1:54321       32041  FJ-0-39   ERRR: _min_rows: The dataset size is too small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only 19.38260765603209.
10-14 22:35:40.822 127.0.0.1:54321       32041  FJ-0-39   DEBUG: unlock GBM_model_R_1444875645239_55_cv_1_subset_5422_train by job $03017f00000132d4ffffffff$_a24300cbb0eb4b05e64a9fd6b5e046bb_cv0


I've removed the min row count and
To unsubscribe from this group and stop receiving emails from it, send an email to h2ostream+unsubscribe@googlegroups.com.

bloog...@gmail.com

unread,
Oct 15, 2015, 5:41:29 PM10/15/15
to H2O Open Source Scalable Machine Learning - h2ostream, bloog...@gmail.com

I've removed the min row count threshold and it seems to work fine now.
Dont think it was ever any locking type of problem.


On Thursday, October 15, 2015 at 5:40:19 PM UTC-4, bloog...@gmail.com wrote:

Hi Spencer,
I found the logs stored in /tmp/

Interestingly it seems that it was complaining on 'Can't unlock: Not locked!'

the logs show something different:
#######h2o_127.0.0.1_54321-3-info.log"
10-14 22:35:40.818 127.0.0.1:54321       32041  FJ-0-39   ERRR: _min_rows: The dataset size is too small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only 19.38260765603209.

#######h2o_127.0.0.1_54321-2-debug.log
10-14 22:35:40.817 127.0.0.1:54321       32041  FJ-0-39   INFO: Building H2O GBM model with these parameters:
10-14 22:35:40.817 127.0.0.1:54321       32041  FJ-0-39   INFO: {"_model_id":null,"_train":{"name":"GBM_model_R_1444875645239_55_cv_1_subset_5422_train","type":"Key"},"_valid":{"name":"GBM_model_R_1444875645239_55_cv_1_subset_5422_valid","type":"Key"},"_nfolds":0,"_keep_cross_validation_predictions":false,"_fold_assignment":"AUTO","_distribution":"AUTO","_tweedie_power":1.5,"_ignored_columns":["Date","ID","Int","Ido"],"_ignore_const_cols":true,"_weights_column":"weights","_offset_column":null,"_fold_column":null,"_score_each_iteration":false,"_response_column":"sumRet","_balance_classes":false,"_max_after_balance_size":5.0,"_class_sampling_factors":null,"_max_hit_ratio_k":10,"_max_confusion_matrix_size":20,"_checkpoint":null,"_ntrees":100,"_max_depth":10,"_min_rows":10.0,"_nbins":20,"_nbins_cats":1024,"_r2_stopping":0.999999,"_seed":-2484894243866892320,"_nbins_top_level":1024,"_build_tree_one_node":false,"_initial_score_interval":4000,"_score_interval":4000,"_learn_rate":0.1}
10-14 22:35:40.818 127.0.0.1:54321       32041  FJ-0-39   INFO: Dropping ignored columns: [Date, ID, Intercept, Idio]
10-14 22:35:40.818 127.0.0.1:54321       32041  FJ-0-39   ERRR: _min_rows: The dataset size is too small to split for min_rows=10.0: must have at least 20.0 (weighted) rows, but have only 19.38260765603209.
10-14 22:35:40.822 127.0.0.1:54321       32041  FJ-0-39   DEBUG: unlock GBM_model_R_1444875645239_55_cv_1_subset_5422_train by job $03017f00000132d4ffffffff$_a24300cbb0eb4b05e64a9fd6b5e046bb_cv0


I've removed the min row count threshold and it seems to work fine now.
Reply all
Reply to author
Forward
0 new messages