StackingClassifier: Pipeline & GridSearchCV

862 views
Skip to first unread message

Rustam Vosilov

unread,
Dec 26, 2016, 5:45:43 PM12/26/16
to mlxtend
Hi everybody,

I'm trying to use a pipeline as one of the classifiers in StackingClassifier with a grid search. However, I'm not sure how to refer to the parameters of the model within the pipeline when constructing the parameter grid.
For instance, if I want to stack three classifiers: a logistic regression, a random forest and an SVM model, where I want to do some pre-processing (using StandardScaler()) for the logistic regression and the SVM model:

pipe_lr = make_pipeline(StandardScaler(), LogisticRegression())
forest
= RandomForestClassifier()
pipe_svm
= make_pipeline(StandardScaler(), SVC())

stacked_model
= StackingClassifier(classifiers=[pipe_lr, forest, pipe_svm], meta_classifier=pipe_lr, use_probas=1)

params = {'randomforestclassifier__max_features': ['sqrt', 'log2', 0.1],
          'pipe_lr__logisticregression__C': np.logspace(-3, 2, 6),      <----------- This is where I'm puzzled!
          'pipe_svm__svc_gamma': np.logspace(-3, 2, 6)                  <----------- This is where I'm puzzled!
          'meta-pipe_lr__logisticregression__C':np.logspace(-3, 2, 6)   <----------- This is where I'm puzzled!
                 }

stacked_grid = GridSearchCV(estimator=stacked_model, param_grid=params,
                            cv=3, refit=True, verbose=1, n_jobs=-1)


I don't know what the syntax is supposed to be for the parameter of a model that is in a pipeline object.
I would appreciate any suggestions. Thank you!

/R

Sebastian Raschka

unread,
Dec 26, 2016, 9:24:36 PM12/26/16
to mlxtend
Hi, Rustam,
it is using the class names and scikit-learn's make_pipeline returns a Pipeline object. 

so, changing

params = {'randomforestclassifier__max_features': ['sqrt', 'log2', 0.1],
          'pipe_lr__logisticregression__C': np.logspace(-3, 2, 6),      <----------- This is where I'm puzzled!
          'pipe_svm__svc_gamma': np.logspace(-3, 2, 6)                  <----------- This is where I'm puzzled!
          'meta-pipe_lr__logisticregression__C':np.logspace(-3, 2, 6)   <----------- This is where I'm puzzled!
                 }


to 

params = {'randomforestclassifier__max_features': ['sqrt', 'log2', 0.1],
          'pipeline-1_lr__logisticregression__C': np.logspace(-3, 2, 6),  
          'pipeline-2__svc_gamma': np.logspace(-3, 2, 6)                  
          'meta-pipeline__logisticregression__C':np.logspace(-3, 2, 6)   
                 }

should technically work. If you run `stacked_model.get_params()` you should see the list of the different parameters.

Hope that helps!
Best,
Sebastian
Reply all
Reply to author
Forward
0 new messages