Grid search regressor using max depth, k_features, and two different scoring methods.

53 views
Skip to first unread message

Nicholas Reichel

unread,
Feb 11, 2020, 8:08:13 PM2/11/20
to mlxtend

Hello, I have the following grid search code, which is pretty much a copy of the example on the documentation but I had a couple questions.

I'd like to have the max_depth scale with the k_features. If there's only two features I want a max depth of two. Wondering if this is possible with this pipeline or should i just make my own for loops around this code.

Also a question about the scoring metric. If I put two scoring methods in the param_grid, as shown, how will it be able to compare the results? r2 ranges are not apples to apples comparable to neg_mean_squared_error ranges. Again, should I just for loop this on the outside?

Thanks!!!
        clf = tree.DecisionTreeRegressor()
        
        print('finding best features')

        sfs1 = SFS(estimator=clf,
           k_features=3,
           forward=True,
           floating=True,
           verbose=0,
           scoring='neg_mean_squared_error',
           n_jobs=1,
           cv=3)

        pipe = Pipeline([('sfs', sfs1),
                         ('tree', clf)])

        param_grid = [
          {'sfs__k_features': list(range(4,15)),
          'sfs__scoring': ['r2', 'neg_mean_squared_error'],
          'sfs__estimator__max_depth': list(range(3,13))}
          ]


        gs = GridSearchCV(estimator=pipe,
                          param_grid=param_grid,
                          scoring='neg_mean_squared_error',
                          n_jobs=13,
                          verbose=2,
                          cv=3,
                          iid=True,
                          refit=True)


Sebastian Raschka

unread,
Feb 12, 2020, 1:31:24 PM2/12/20
to Nicholas Reichel, mlxtend
Hi Nicholas,

based on how scikit-learn's grid search works, it would be a good idea to make your own for loops here, because this will save you some unnecessary computation. (Otherwise, all combinations will be evaluated exhaustively).

> Also a question about the scoring metric. If I put two scoring methods in the param_grid, as shown, how will it be able to compare the results? r2 ranges are not apples to apples comparable to neg_mean_squared_error ranges. Again, should I just for loop this on the outside?

I think that's not an issue, because you are not comparing the scoring metrics between the runs but rather use the scoring metrics to select features in the inner loop. You can run it also separately with though and see if using a different metric would give you different feature subset suggestions.

Best,
Sebastian
> --
> You received this message because you are subscribed to the Google Groups "mlxtend" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to mlxtend+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/mlxtend/5de46221-3b69-4f23-93cf-5291fdb107df%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages