Hi Amichay,
>
> There are no large-two-dimensional map values within,
> so I wonder what’s the reason for the long loading time.
>
When I scroll through your PMML file then >80% of its content lines
appear to be MapValues content.
This loading issue is not about the size individual MapValue elements
("long MapValues elements bad, short MapValues elements good"). It
affects all MapValue elements.
> We are not using sklearn2pmml, but our own code
Yes, I can see that your PMML file is missing proper XML namespace
declarations again.
When I run the example model + input in Java profiler, then this time
most of the execution time is spent inside the
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl class
(on the GlassFish Metro JAXB runtime).
It means that the Java application is now trying to figure out XML
namespace information. You should fix this by improving your in-house
PMML conversion application. Alternatively, your could write another
SAX filter, which corrects missing XML namespace information during
model loading time.
In summary, when loading MapValues elements (or any other element that
relies heavily on embedded data tables using the InlineTable element),
there are two performance choke points:
1) XML namespace resolution. If table cell elements are missing XML
namespace identifiers, then XML parser must add them itself.
Fix: always add XML ns yourself.
2) XML namespace context switches. If table cell elements use custom
XML namespace identifiers, then XML parser must switch DOM factories.
FIX: keep the number of different XML ns at minimum. Reorder cells
within a row so cells with the same XML ns stay together (as opposed
to alternating them).
Your previous "loading issue" suffered from #2. Your current "loading
issue" suffers from #1 and #2 combined.
VR