Hi Pratyush,
Again, anything that involves exception stack traces should go
directly to GitHub issues.
>
> Exception in thread "main" java.lang.IllegalArgumentException
> at org.jpmml.converter.TypeUtil.getDataType(TypeUtil.java:64)
>
> The last one I reported was when I was using an integer to string map!
>
The TypeUtil is trying to convert a Java type to PMML type there.
There are mappings for Java language primitive types (think
java.lang.String, java.lang.Boolean, java.lang.Double). In your case,
there's a Java non-primitive type passed, which is not recognized.
Hence the IllegalArgumentException.
Agreed, the IllegalArgumentException should spell out the issue. If it
did, then you probably would be able to figure out a proper solution
yourself.
Speaking about Scikit-Learn classifiers, then you don't need to mess
with an external LabelEncoder pass. You can pass a categorical column
directly to the RandomForestClassifier.fit(X, y) method, and it will
work seamlessly. When you get rid of the leading LabelEncoder, you
will also get rid of the trailing PMMLPipeline.predict_transformer
attribute.
You're currently following the Apache Spark way of doing things - the
label goes through StringIndexer and IndexToString helper
transformers.
Scikit-Learn will be fine without external helpers.
VR