Long conversion (unmarshal) time of PMML files

52 views
Skip to first unread message

Lior Ben Zvi

unread,
Mar 10, 2021, 11:38:42 AM3/10/21
to Java PMML API

Hi,

We are dealing with long conversion (unmarshal) time of PMML files (between 2.6-3.5 minutes). The issue seems to be related to “MapValues” object in the PMML files. We have big files without this object that are converted in less than 1.9 seconds and files five times smaller with this object (with small number of values) that are converted within above 2.6 minutes.
Could you please advise how should we deal with the issue and optimize our conversion time?

Additional information:

·       JPMML version: 1.5.14

·       We create the files by Python sklearn2pmml 0.64.0

·       PMML version 4.4
Capture.PNG

·       Example to MapValues:
Capture.PNG

·       Our conversion PMML code example:
Capture.PNG

Thank you!
Lior

 

Villu Ruusmann

unread,
Mar 10, 2021, 1:40:49 PM3/10/21
to Java PMML API
Hi Lior,

> We are dealing with long conversion (unmarshal)
> time of PMML files (between 2.6-3.5 minutes).
>

Let's be clear here - by saying "conversion (unmarshal)" you actually
mean "loading a Java class model object from a PMML file", right?

> Could you please advise how should we deal with
> the issue and optimize our conversion time?
>

Attach a profiler to your Java application while loading a PMML file.

I can only make educated guesses here. You have access to a fully
operational system.

Alternatively, I can profile it for you, but I'd need a representative
test case. Screenshots won't do the trick.

> Additional information:
>
> - JPMML version: 1.5.14
>

Right now, you're using a pre-1.5.X API for loading PMML files.

See here:
https://github.com/jpmml/jpmml-evaluator#basic-usage

The following three-liner gives you a properly initialized, optimized
and interned o.j.e.Evaluator instance:
Evaluator evaluator = new LoadingModelEvaluatorBuilder()
.load(new File(...))
.build();

Switch to this 1.5.X-style builder API and re-run the experiment. The
timings should improve significantly.

> We create the files by Python sklearn2pmml 0.64.0
>

This 'MapValues' element that is displayed on one of your screenshots
is definitely NOT generated by SkLearn2PMML - somebody must have
inserted it manually afterwards.

Your 'MapValues' element is borderline invalid, because it does not
specify correct XML namespace information (there are no 'input' and
'output' tags in the PMML specification). It is likely that the
unmarshaller gets confused by those elements, and is perhaps
performing some extensive error reporting/error recovery (that slows
down the entire operation).

The SkLearn2PMML package provides
'sklearn2pmml.preprocessing.LookupTransformer',
's.p.FilterLookupTransformer' and 's.p.MultiLookupTransformer' for
generating MapValues element-based lookup tables for most conceivable
use cases. Again, use these transformers for generating valid PMML
markup, and see if the timings improve or not.


VR

Amichay Doitch

unread,
Mar 29, 2021, 7:38:23 AM3/29/21
to Java PMML API

Hi Villu,

As you said, we are talking about loading a Java class model object from a PMML file

We tried what you suggested, we switched to 1.5.X-style builder API (just implemented the code sample you gave)

And also regard the MapValues element - we switched to MultiLookupTransformer, a screen shot is attached below

We also tried different dataTypes (double, string, integer) for the MapValues.

The reason we didn’t originally used the MultiLookupTransformer is that we want the lookup only for a late post processing step, and not the entire pipeline. (This is not the main question here, but is it possible to use a MultiLookupTransformer (or any transformer) only for post processing?).

At this moment we only see a minor improvement with conversion time, but it’s still above 2 minutes.

One thing we are changing is the Header, but we have no problem with conversion time when we don’t use the MapValues element


Thank you in advance,
Amichay

mapValues.png
header.png
Reply all
Reply to author
Forward
0 new messages