IllegalArgumentException in LookupTransformer for sklearn2pmml...

48 views
Skip to first unread message

Pratyush Banerjee

unread,
Jun 17, 2022, 11:11:23 AM6/17/22
to Java PMML API
Hi,

I have a PMML pipeline with a transformer and a Random Forest Classifier that I have coded in sklearn and using sklearn2pmml to convert to a PMML file.

The prediction column is called 'Label' in my data and it is a string. 
As is the norm, I am using a LabelEncoder() to convert it to integers before calling the fit.
I wanted to use a LookupTransformer to convert the predicted labels back to its string form.
So I was using this:

le = LabelEncoder()
fdf.label = le.fit(fdf.label).transform(fdf.label)
fdf.strings2 = fdf.strings2.apply(tokenize_strings)
pmml_pipe = pmml_pipeline()
pmml_pipe.fit(fdf, fdf.label)
label_to_cat_map = dict(zip(le.classes_, le.transform(le.classes_)))
pmml_pipe.apply_transformer = Alias(LookupTransformer(label_to_cat_map, default_value='unk'), 'predict_label', prefit=True)
sklearn2pmml(pmml_pipe, "event_classifier.pmml", with_repr=True)

However, I get an error at the last step:
Exception in thread "main" java.lang.IllegalArgumentException
        at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:64)
        at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:121)
        at sklearn2pmml.preprocessing.LookupTransformer.getDataType(LookupTransformer.java:59)


I looked at LookupTransformer.java:59 and found that the error occurs here:

TypeUtil.getDataType(inputValues, DataType.STRING);

So my question is, does the dict passed to the LookupTransformer need to be key (int) -> value(String) or the other way round?
I am assuming that the predict function for RandomForest Classifier returns label as numpy.int64

Thanks & Regards,

Pratyush

This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.

Pratyush Banerjee

unread,
Jun 17, 2022, 12:00:54 PM6/17/22
to Java PMML API
Apologies, the actual exception is as follows:

Exception in thread "main" java.lang.IllegalArgumentException
        at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:64)
        at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:121)
        at sklearn2pmml.preprocessing.LookupTransformer.encodeFeatures(LookupTransformer.java:94)
        at sklearn2pmml.decoration.Alias.encodeFeatures(Alias.java:55)
        at sklearn.Transformer.encode(Transformer.java:69)
        at sklearn2pmml.pipeline.PMMLPipeline.encodeOutput(PMMLPipeline.java:447)
        at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:302)
        at com.sklearn2pmml.Main.run(Main.java:84)
        at com.sklearn2pmml.Main.main(Main.java:62)


The last one I reported was when I was using an integer to string map!

Thanks & Regards,

Pratyush

Pratyush Banerjee

unread,
Jun 17, 2022, 12:42:46 PM6/17/22
to Java PMML API
Hi,

Nevermind, I seemed to have got around this issue. It seems both key and value items are expected to be strings. 
So the following changes to my code now works fine:

le = LabelEncoder()
fdf.label = le.fit(fdf.label).transform(fdf.label)
pmml_pipe = pmml_pipeline()
pmml_pipe.fit(fdf, fdf.label)
label_to_cat_map = dict(zip(map(str, le.transform(le.classes_)), le.classes_))
pmml_pipe.predict_transformer = Alias(LookupTransformer(label_to_cat_map, default_value='unk'), 'predict_label', prefit=True)
sklearn2pmml(pmml_pipe, "event_classifier.pmml", with_repr=True)


Once I use the resulting pmml file to predict, I can observe the predict_label for the string categories

Thanks & Regards,

Pratyush

Villu Ruusmann

unread,
Jun 17, 2022, 2:39:15 PM6/17/22
to Java PMML API, Pratyush Banerjee
Hi Pratyush,

Again, anything that involves exception stack traces should go
directly to GitHub issues.

>
> Exception in thread "main" java.lang.IllegalArgumentException
> at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:64)
>
> The last one I reported was when I was using an integer to string map!
>

The TypeUtil is trying to convert a Java type to PMML type there.
There are mappings for Java language primitive types (think
java.lang.String, java.lang.Boolean, java.lang.Double). In your case,
there's a Java non-primitive type passed, which is not recognized.
Hence the IllegalArgumentException.

Agreed, the IllegalArgumentException should spell out the issue. If it
did, then you probably would be able to figure out a proper solution
yourself.

Speaking about Scikit-Learn classifiers, then you don't need to mess
with an external LabelEncoder pass. You can pass a categorical column
directly to the RandomForestClassifier.fit(X, y) method, and it will
work seamlessly. When you get rid of the leading LabelEncoder, you
will also get rid of the trailing PMMLPipeline.predict_transformer
attribute.

You're currently following the Apache Spark way of doing things - the
label goes through StringIndexer and IndexToString helper
transformers.

Scikit-Learn will be fine without external helpers.


VR
Reply all
Reply to author
Forward
0 new messages