Yet another Long loading (unmarshal) time of PMML files

54 views
Skip to first unread message

Amichay Doitch

unread,
Apr 11, 2021, 9:35:38 AM4/11/21
to Java PMML API

Hi Villu,

Following our last conversation about long loading time, Another pmml file is causing long loading time

There are no large-two-dimensional map values within, so I wonder what’s the reason for the long loading time.

A link to pmml and text (input) files: linkToDrive

Here is a snapshot of how we are loading pmml file:

  • File file = new File("pmml." + pmmlFile.getName());

  • FileUtils.writeByteArrayToFile(file, pmmlFile.getContent());

  • Evaluator evaluator = new LoadingModelEvaluatorBuilder().load(file).build();

  • file.delete();

Additional information:

  • JPMML version: 1.5.14

  • PMML version 4.4

  • We are not using sklearn2pmml, but our own code


Thank you!
Amichay

Villu Ruusmann

unread,
Apr 11, 2021, 10:55:02 AM4/11/21
to Java PMML API
Hi Amichay,

>
> There are no large-two-dimensional map values within,
> so I wonder what’s the reason for the long loading time.
>

When I scroll through your PMML file then >80% of its content lines
appear to be MapValues content.

This loading issue is not about the size individual MapValue elements
("long MapValues elements bad, short MapValues elements good"). It
affects all MapValue elements.

> We are not using sklearn2pmml, but our own code

Yes, I can see that your PMML file is missing proper XML namespace
declarations again.

When I run the example model + input in Java profiler, then this time
most of the execution time is spent inside the
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl class
(on the GlassFish Metro JAXB runtime).

It means that the Java application is now trying to figure out XML
namespace information. You should fix this by improving your in-house
PMML conversion application. Alternatively, your could write another
SAX filter, which corrects missing XML namespace information during
model loading time.

In summary, when loading MapValues elements (or any other element that
relies heavily on embedded data tables using the InlineTable element),
there are two performance choke points:

1) XML namespace resolution. If table cell elements are missing XML
namespace identifiers, then XML parser must add them itself.
Fix: always add XML ns yourself.

2) XML namespace context switches. If table cell elements use custom
XML namespace identifiers, then XML parser must switch DOM factories.
FIX: keep the number of different XML ns at minimum. Reorder cells
within a row so cells with the same XML ns stay together (as opposed
to alternating them).

Your previous "loading issue" suffered from #2. Your current "loading
issue" suffers from #1 and #2 combined.


VR

Amichay Doitch

unread,
Apr 11, 2021, 11:45:56 AM4/11/21
to Java PMML API
Thank you very much for your quick and detailed response!
Best,
Amichay
Reply all
Reply to author
Forward
0 new messages