gbm with tweedie offset

120 views
Skip to first unread message

Giorgio Spedicato

unread,
Jun 16, 2016, 6:38:11 AM6/16/16
to H2O Open Source Scalable Machine Learning - h2ostream
I am running a gbm with tweedie marginal distribution to predict loss cost.... I suspect that the gbm does not correclty hand the offset parameter: 
Using different exposures as offset does not alter the predicted outcome. In the example below I tried both log and un logaritmed exposures and i optained the same (unreliable) result...

I suggest the documentation to provide further details on how offset and link functions are handled in gbm

#first model: 

gbm <- h2o.gbm(x = c(predictors.numeric,predictors.categorical), 
  y = "losscost",
                learn_rate = 0.01,
                max_depth = 4,
                min_rows = 5,
                distribution="tweedie",
                offset_column = 'exposure',
                training_frame = trainSplit,
                validation_frame = validSplit,
                stopping_metric="AUTO")


gbm2 <- h2o.gbm(x = c(predictors.numeric,predictors.categorical), 
  y = "losscost",
                learn_rate = 0.01,
                max_depth = 4,
                min_rows = 5,
                distribution="tweedie",
                offset_column = 'log_exposure',
                training_frame = trainSplit,
                validation_frame = validSplit,
                stopping_metric="AUTO")

mr.li...@gmail.com

unread,
Jun 16, 2016, 3:21:36 PM6/16/16
to H2O Open Source Scalable Machine Learning - h2ostream
Giorgio,

It sounds like you're approaching this from the generalized linear model perspective. But keep in mind that GBM is an ensemble tree method, so the traditional log(loss/exposure) approach used in GLM's may not be as applicable here. Instead, I recommend you use pure premium as your target variable with no offset. 

                    ~ Li
Reply all
Reply to author
Forward
0 new messages