Batch execution with JPMML

89 views
Skip to first unread message

BISHNU SHANKAR PANDEY

unread,
Oct 14, 2020, 5:34:57 AM10/14/20
to Java PMML API
Hi All,

Is there any way in which we can evaluate PMML model with multiple row as an input to the model. I tried DecisionTreeClassifier model executor with some large load and found python execution is way faster that java. One of the reason is python model loads all the data at a time and then execute where as in java we can execute the model row by row. Is there any way we execute model passing multi row input?

Thank you

Regards,
Bishnu

Villu Ruusmann

unread,
Oct 14, 2020, 8:54:33 AM10/14/20
to Java PMML API
Hi Bishnu,

> Is there any way in which we can evaluate PMML model
> with multiple row as an input to the model.
>
> One of the reason is python model loads all the data at a
> time and then execute where as in java we can execute
> the model row by row.

Terminologically, you probably mean "vectorized execution" here, which
is different than "batch execution".

Python/Scikit-Learn can do vectorized execution with linear models
such as LogisticRegression and SVM. However, it cannot do vectorized
execution with decision tree models.

The fact that you're passing a data matrix to
DecisionTreeClassifier.predict(X) is an "API facade". Behind the
scenes, Python/Scikit-Learn is still iterating over the dataset row by
row.

> I tried DecisionTreeClassifier model executor with some
> large load and found python execution is way faster that java.
>

This is to be expected for the general case, especially if you haven't
configured the (J)PMML engine properly.

If execution performance is critical for your use case, then you may
try transpiling models from PMML representation to Java bytecode
representation using the JPMML-Transpiler library
(https://github.com/jpmml/jpmml-transpiler). Please do so, and report
back on Python vs. Java performance numbers again.


VR
Reply all
Reply to author
Forward
0 new messages