Hi Amichay,
>
> A link to pmml and csv (input) files: googleDriveLink
>
I've got your files - you should probably delete them now, as they
might be leaking alpha.
> We are dealing with long loading (unmarshal) time of PMML files.
> The issue seems to be related to “MapValues” object in the PMML files.
>
I inspected the model loading process in Java profiler, and the
rate-limiting step appears to be located deep inside the JAXB stack.
Specifically, there's an XML namespace context switch happening for
each row element (the parent row element belongs to PMML ns, but its
child data elements belong to JPMML-InlineTable ns), which involves
instantiating a new DocumentBuilder(Factory) object instance.
The same behaviour plagues both Glassfish Metro and EclipseLink MOXy
JAXB runtimes. Well, the latter appears to be ~30-50% faster, but that
doesn't help you to get from 2 mins to sub 2 sec loading times.
I could advise you to reorganize your data transformations to minimize
the number of row elements inside MapValues elements, but you've
probably already tried that.
The most viable Java application side workaround would be to implement
an XML filter, which replaces JPMML-InlineTable ns prefixes with PMML
ns prefixes. This way the JAXB stack stays within the same XML
namespace content all the time.
Create an org.xml.sax.helpers.XMLFilterImpl subclass (for inspiration
see the contents of org.jpmml.model.filters package), and register it
with your Java application using the
LoadingModelEvaluatorBuilder#setFilters(List<? extends XMLFilter>)
method.
Something like that:
Evaluator evaluator = new LoadingModelEvaluatorBuilder()
.setFilters(Arrays.asList(new
org.jpmml.model.visitors.ImportFilter(), new
com.mycompany.UpdateRowContentNamespacesFilter()))
.load("3401.xml")
.build();
VR