Hi Sebastian ,
I had a question regarding the implementation of the SFS() method under the cross-validation setting in mlxtend.
The choice of features and the sequence of them in each of the training folds may differ w.r.t to other training folds (based on Introduction to Statistical Learning book)
Say if there are 4 features(a,b,c,d) and 3 cv folds, and on using Forward Subset Selection
Fold 1(as cv), Fold 2,3(as train) may choose Features : a, ab,abc,abcd
Fold 2(as cv), Fold 1,3(as train) may choose Features : b,bc,bcd,bcda
Fold 3(as cv), Fold 1,2(as train) may choose Features : c,ca,cab,cabd
However the final results from the SFS() method just mentions a single chosen sequence of features and their corresponding cv scores .
For eg : b,bc,bcd,bcda and their corresponding cv scores for each fold as mentioned below
b , cv_score for : {fold1,fold2,fold3}
bc, cv_score for : {fold1,fold2,fold3}
bcd , cv_score for : {fold1,fold2,fold3}
bcda, cv_score for : {fold1,fold2,fold3}
Could you please comment more on the cv based implementation of SFS() and how a single sequence of features is chosen, and how is the CV score calculated for each feature subset
Thanks,
Arun