The Ambiguous description of CrossValidation

1 view

Skip to first unread message

zx...@163.com

unread,

Jun 22, 2024, 8:08:59 AM6/22/24

to scikit...@googlegroups.com

Hello There,

For page 3.1. Cross-validation: evaluating estimator performance — scikit-learn 1.5.0 documentation

When I was checking the procedure that scikit-learn do Cross-Validation or doing the GridSearchCV, your official websites about Cross-Validation in model tuning said ambiguously like:

A solution to this problem is a procedure called cross-validation (CV for short). A test set should still be held out for final evaluation, but the validation set is no longer needed when doing CV. In the basic approach, called k-fold CV, the training set is split into k smaller sets (other approaches are described below, but generally follow the same principles). The following procedure is followed for each of the k “folds”:

A model is trained using 𝑘−1
of the folds as training data;
the resulting model is validated on the remaining part of the data (i.e., it is used as a test set to compute a performance measure such as accuracy).

"( cited from the link at very beginning of this mail, I hope that was your official description)

The words ' using the k-1 of the folds as training data' for each k-folds, it sounds like either fitting are done k-1 times for each of those folds(and even maybe do with an average as aggregated method) or fitting once for the combined data of those k-1 folds.

So Which understanding shall be taken? and it seems that the number of fittings is no longer callable when using GridSearchCV, does number(fittings)=number(candidates)*cv hold?

With Regards

Y,Gao

Reply all

Reply to author

Forward

0 new messages