Can you use a multiclass classifier of feature(s) in a column processing on a stacked classifier?

39 views
Skip to first unread message

Jonathan Gough

unread,
Apr 11, 2019, 2:13:23 PM4/11/19
to mlxtend
(Just found your Repo - and it is AWESOME, thank you all!) 

Question is this: 

Can you use a multiclass classifier of a feature(s) when processing/feature engineering on top of a stacked classifier?

Use-case: 
You have 10 features to use in a multi-class classification problem. 1 of those features is text, the others and categorical, numerical, and time. 

9 features get put through typical pipeline steps: 

Similar to the sklearn Column Transformer with mixed types example:
numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])



The text feature is "preprocessed" or engineered by passing it through a text pipeline that contains a domain-specific trained vector model. The output of which is a 100-dimensional vector/array passed into a multi-class classifier that outputs the classification probabilities ("predict_proba"). 

These probabilities would then be combined with the features from the above preprocessor before being passed into a classifer/stacking classifier. 

text_features = ['domain text']
text_transformer = Pipeline(steps=[
    ('text_vectors', (TextVectorizer() ),
    ('predict_prob',  DecisionTreeClassifier(params='awesome'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('text', text_transformer, text_features),
('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])


clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(solver='lbfgs'))])

The real question is: 
Can you fit then predict as part of a transformation on a subset your data in a stacked ensemble? 

In several different ways, I have used sklearn mixins (BaseEstimator, TransformerMixin, ClassifierMixin) to create custom classes to do this but I have failed miserably. 

Will this actually work? Has anyone ever seen anything like this or am I just dreaming up crazy things? 

Any insight or thoughts would be appreciated. 

Thanks!
Jonathan



Sebastian Raschka

unread,
Apr 11, 2019, 3:10:44 PM4/11/19
to mlxtend
Hi Jonathan,

I believe what you have in mind can be achieved using the FeatureUnion class in scikit-learn:


Then, you can define the preprocessor based on the Feature union and add it as a pipeline step replacing the preprocessor you have shown as an example at the bottom.

Best,
Sebastian

Reply all
Reply to author
Forward
0 new messages