Cross validation with two fold columns

Skip to first unread message

Jul 28, 2021, 5:11:05 AMJul 28
to H2O Open Source Scalable Machine Learning - h2ostream
Dear H2O

Thank you for a great product!

We would like to use H2O AutoML to predict harvest yields in field trials that are divided into sub-areas (blocks) according to the soil variation. We have data for four blocks for 10 years. We would like to investigate how well we can predict the yield in an unknown year on an unknown soil with MAE as quality metric.

Is it possible to cross-validate so that when a given block in a given year is used for validation, the training is performed without data from this year and this block?

To me the problem is a bit reminiscent of specifying two fold columns. I have considered combining block and year in a variable which can then be specified as fold_column but I think this will not fully meet the criteria? The training data set will contain some data from the same year and the same block that is included in the validation.

I hope the above is understandable and hope you can provide a hint :)

Kind regards Søren

1) What version of H2O are you using.
2) Specify the type of machine your using (i.e. OS X 10.11.4, Windows 10, etc).
MacOS 10.14.4
3) Specify what language you are working in and what version (i.e. Python 2.7, Spark 1.6.1, etc)
R version 4.0.2
4) The code you were executing when you received an error message (please provide a reproducible example if possible).
5) Copy and paste in your error message.
6) Type of data you are using (if applicable).
See above
Reply all
Reply to author
0 new messages