Refocusing: Long loading (unmarshal) time of PMML files

29 views
Skip to first unread message

Amichay Doitch

unread,
Apr 7, 2021, 11:16:48 AM4/7/21
to Java PMML API

Hi Villu,

We are dealing with long loading (unmarshal) time of PMML files. The issue seems to be related to “MapValues” object in the PMML files.

A link to pmml and csv (input) files: googleDriveLink

The loading time of a very similar pmml file without the “MapValues” is significantly smaller (~2 seconds vs. ~2 minutes)

Here is a snapshot of how we are loading pmml file:

  • File file = new File("pmml." + pmmlFile.getName());

  • FileUtils.writeByteArrayToFile(file, pmmlFile.getContent());

  • Evaluator evaluator = new LoadingModelEvaluatorBuilder().load(file).build();

  • file.delete();

Additional information:

  • JPMML version: 1.5.14

  • We create the files by Python sklearn2pmml 0.64.0

  • PMML version 4.4

    Thank you!
    Amichay

Villu Ruusmann

unread,
Apr 7, 2021, 5:35:52 PM4/7/21
to Java PMML API
Hi Amichay,

>
> A link to pmml and csv (input) files: googleDriveLink
>

I've got your files - you should probably delete them now, as they
might be leaking alpha.

> We are dealing with long loading (unmarshal) time of PMML files.
> The issue seems to be related to “MapValues” object in the PMML files.
>

I inspected the model loading process in Java profiler, and the
rate-limiting step appears to be located deep inside the JAXB stack.
Specifically, there's an XML namespace context switch happening for
each row element (the parent row element belongs to PMML ns, but its
child data elements belong to JPMML-InlineTable ns), which involves
instantiating a new DocumentBuilder(Factory) object instance.

The same behaviour plagues both Glassfish Metro and EclipseLink MOXy
JAXB runtimes. Well, the latter appears to be ~30-50% faster, but that
doesn't help you to get from 2 mins to sub 2 sec loading times.

I could advise you to reorganize your data transformations to minimize
the number of row elements inside MapValues elements, but you've
probably already tried that.

The most viable Java application side workaround would be to implement
an XML filter, which replaces JPMML-InlineTable ns prefixes with PMML
ns prefixes. This way the JAXB stack stays within the same XML
namespace content all the time.

Create an org.xml.sax.helpers.XMLFilterImpl subclass (for inspiration
see the contents of org.jpmml.model.filters package), and register it
with your Java application using the
LoadingModelEvaluatorBuilder#setFilters(List<? extends XMLFilter>)
method.

Something like that:
Evaluator evaluator = new LoadingModelEvaluatorBuilder()
.setFilters(Arrays.asList(new
org.jpmml.model.visitors.ImportFilter(), new
com.mycompany.UpdateRowContentNamespacesFilter()))
.load("3401.xml")
.build();


VR
Reply all
Reply to author
Forward
0 new messages