Lack of Prediction-class when using PMML model to predict...

221 views
Skip to first unread message

Pratyush Banerjee

unread,
Jun 16, 2022, 5:51:40 AM6/16/22
to Java PMML API
Hi,

I have been trying to generate PMML models from SKLearn and then trying to use them in Spark. 
This is the code I am using to generate the PMML File:
```
import pandas
from sklearn.tree import DecisionTreeClassifier
from sklearn2pmml.pipeline import PMMLPipeline
iris_df = pandas.read_csv("data/iris.csv")
iris_X = iris_df[iris_df.columns.difference(["variety"])]
iris_y = iris_df["variety"]
pipeline = PMMLPipeline([
    ("classifier", DecisionTreeClassifier())
])
pipeline.fit(iris_X, iris_y)
from sklearn2pmml import sklearn2pmml
sklearn2pmml(pipeline, "DecisionTreeIris.pmml", with_repr=True)
```

However, later when I am using the same pmml file for prediction, I don't seem to get a class_prediction, only category_wise probability scores.

Here is the simple code I am using for prediction (using pymml):

```
import pandas as pd
from pypmml import Model

model = Model.load('DecisionTreeIris.pmml')
df = pd.read_csv('data/iris.csv')
results = model.predict(df)
print(results)
```
Here is the result I am getting:
```
     probability(Setosa)  probability(Versicolor)  probability(Virginica)
0                    1.0                      0.0                     0.0
1                    1.0                      0.0                     0.0
2                    1.0                      0.0                     0.0
3                    1.0                      0.0                     0.0
4                    1.0                      0.0                     0.0
..                   ...                      ...                     ...
145                  0.0                      0.0                     1.0
146                  0.0                      0.0                     1.0
147                  0.0                      0.0                     1.0
148                  0.0                      0.0                     1.0
149                  0.0                      0.0                     1.0
```
This example is in python, but I am seeing similar behaviour in Spark using Scala.

The example here (https://github.com/autodeployai/pypmml) uses pretty much the same data, but reports results with predicted_class...

Any idea what I am doing wrong here? or is my expectation wrong about the prediction-class?
Still very new to PMML, so apologies if this is something obvious.

Thanks & Regards,

Pratyush Banerjee  

Villu Ruusmann

unread,
Jun 16, 2022, 6:04:54 AM6/16/22
to Java PMML API, Pratyush Banerjee
Hi PB,

>
> Here is the simple code I am using for prediction (using pymml):
>

Open the DecisionTreeIris.pmml file in text editor, and verify that it
does contain the "variety" field as a target (aka prediction-class).

It's under the /PMML/TreeModel/MiningSchema element:
<MiningSchema>
<MiningField name="variety" usageType="target"/>
</MiningSchema>

Additionally, you're seeing three probability-type output fields, one
for each category level of the categorical target field:

Therefore, the (SkLearn2)PMML converter is doing its job correctly.

> Any idea what I am doing wrong here?
> or is my expectation wrong about the prediction-class?
>

You're using the wrong tool for evaluating the PMML file.

Please switch to JPMML-Evaluator-Python, and everything will work as
advertised/expected:
https://github.com/jpmml/jpmml-evaluator-python

Please note that the main entry method is called "evaluate". It's so
by design, because by performing "evaluate" you shall obtain combined
"predict" and "predict_proba" results.

Append this to your demo script:
<python>
from jpmml_evaluator import make_evaluator
from jpmml_evaluator.pyjnius import jnius_configure_classpath, PyJNIusBackend

# Configure JVM
jnius_configure_classpath()

# Construct a PyJNIus backend
backend = PyJNIusBackend()

evaluator = make_evaluator(backend, "DecisionTreeIris.pmml") \
.verify()

results = evaluator.evaluateAll(iris_df)
print(results)
</python>


VR

Pratyush Banerjee

unread,
Jun 16, 2022, 6:21:23 AM6/16/22
to Java PMML API
Hi VR,

Thanks for the quick reply!

Indeed, when I switched to JPMML-Evaluator-Python, it works as expected and I can see the predicted column (called 'variety')

       variety  probability(Setosa)  probability(Versicolor)  probability(Virginica)
0       Setosa                  1.0                      0.0                     0.0
1       Setosa                  1.0                      0.0                     0.0
2       Setosa                  1.0                      0.0                     0.0
3       Setosa                  1.0                      0.0                     0.0
4       Setosa                  1.0                      0.0                     0.0
..         ...                  ...                      ...                     ...
145  Virginica                  0.0                      0.0                     1.0
146  Virginica                  0.0                      0.0                     1.0
147  Virginica                  0.0                      0.0                     1.0
148  Virginica                  0.0                      0.0                     1.0
149  Virginica                  0.0                      0.0                     1.0

I suppose, I should be using https://github.com/jpmml/jpmml-evaluator-spark for my spark use-case.
Considering that this is Java, it should technically work in Scala! I will check to see if this works.

Thanks again for your help!

Thanks & Regards,

Pratyush

Villu Ruusmann

unread,
Jun 16, 2022, 6:35:58 AM6/16/22
to Java PMML API, Pratyush Banerjee
Hi PB,

> I suppose, I should be using
> https://github.com/jpmml/jpmml-evaluator-spark for my spark use-case.
>

Yes, it's a Spark-style frontend for the core JPMML-Evaluator library.

> Considering that this is Java, it should technically work in Scala! I will check to see if this works.
>

In principle, one and the same Java JAR file should work with any
Apache Spark/Scala/JVM combination.

The 1.3.0 release was targeting Apache Spark 2.X:
https://github.com/jpmml/jpmml-evaluator-spark/blob/1.3.0/pom.xml#L50-L54

However, the current development branch is 1.4-SNAPSHOT, which is
targeting Apache Spark 3.X:
https://github.com/jpmml/jpmml-evaluator-spark/blob/master/pom.xml#L50-L54

As you can see, the switch between Apache Spark major version's didn't
require any Java library code changes:
https://github.com/jpmml/jpmml-evaluator-spark/commit/111759e80f40f91910804e05b356d620a8399260

There's not 1.4.0 release, though. If you need it, please LMK.


VR

However, the 'master' branch has been updated to Apache Spark 3.X:

Pratyush Banerjee

unread,
Jun 16, 2022, 6:44:31 AM6/16/22
to Java PMML API
Hi VR,

Thanks for the version compatibility info.
My setup is still on Spark-2.4.8, so I suppose 1.3.0 release is the one I should be picking up.

Thanks & Regards,
PB
Reply all
Reply to author
Forward
0 new messages