Can you use a multiclass classifier of feature(s) in a column processing on a stacked classifier?

39 views

Skip to first unread message

Jonathan Gough

unread,

Apr 11, 2019, 2:13:23 PM4/11/19

to mlxtend

(Just found your Repo - and it is AWESOME, thank you all!)

Question is this:

Can you use a multiclass classifier of a feature(s) when processing/feature engineering on top of a stacked classifier?

Use-case:

You have 10 features to use in a multi-class classification problem. 1 of those features is text, the others and categorical, numerical, and time.

9 features get put through typical pipeline steps:

Similar to the sklearn Column Transformer with mixed types example:

numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

The text feature is "preprocessed" or engineered by passing it through a text pipeline that contains a domain-specific trained vector model. The output of which is a 100-dimensional vector/array passed into a multi-class classifier that outputs the classification probabilities ("predict_proba").

These probabilities would then be combined with the features from the above preprocessor before being passed into a classifer/stacking classifier.

text_features = ['domain text']
text_transformer = Pipeline(steps=[
    ('text_vectors', (TextVectorizer() ),
    ('predict_prob',  DecisionTreeClassifier(params='awesome'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('text', text_transformer, text_features),
        ('num', numeric_transformer, numeric_features),

        ('cat', categorical_transformer, categorical_features)])


clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(solver='lbfgs'))])

The real question is:

Can you fit then predict as part of a transformation on a subset your data in a stacked ensemble?

In several different ways, I have used sklearn mixins (BaseEstimator, TransformerMixin, ClassifierMixin) to create custom classes to do this but I have failed miserably.

Will this actually work? Has anyone ever seen anything like this or am I just dreaming up crazy things?

Any insight or thoughts would be appreciated.

Thanks!
Jonathan

Sebastian Raschka

unread,

Apr 11, 2019, 3:10:44 PM4/11/19

to mlxtend

Hi Jonathan,

I believe what you have in mind can be achieved using the FeatureUnion class in scikit-learn:

https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.FeatureUnion.html

Then, you can define the preprocessor based on the Feature union and add it as a pipeline step replacing the preprocessor you have shown as an example at the bottom.

Best,

Sebastian

Reply all

Reply to author

Forward

0 new messages