HI AS,
First, one copy of your question is enough. And the less text styling,
the better.
>
> Here is the scenario that I am in: I am working on a project
> in which I am using `java-spark` to perform the predictions.
>
Are you using JPMML-Evaluator-Spark for making predictions, or are you
using it to extract linear model coefficients and then make
predictions using your own code?
If it's the latter, then you could work with the base JPMML-Model library:
https://github.com/jpmml/jpmml-model
> And my program has to be such automated that —
> it should be able to look for model coefficients ***only***
> if the model type is linear regression *(suppose)*.
>
There is a special-purpose Visitor API inside the JPMML-Model library.
It's designed for traversing arbitrary complexity PMML data
structures, and querying and/or modifying it as needed.
Simply create a subclass of org.jpmml.model.visitors.AbstractVisitor,
and override appropriate AbstractVisitor#visit(<PMML element>)
methods.
For example, looking up coefficients for continuous features
(NumericPredictor element), and category level contributions for
categorical features:
Visitor coefficientPrinter = new AbstractVisitor(){
@Override
public VisitorAction visit(NumericPredictor numericPredictor){
System.out.println(numericPredictor.getField() + " -> " +
numericPredictor.getCoefficient());
return super.visit(numericPredictor);
}
@Override
public VisitorAction visit(CategoricalPredictor categoricalPredictor){
System.out.println(numericPredictor.getField() + "/" +
categoricalPredictor.getValue() + " -> " +
categoricalPredictor.getCoefficient());
}
};
PMML pmml = loadPMML(..)
pmml.applyTo(coefficientPrinter);
> And in case of tree model I might want to fetch the feature importance etc.
Feature importances are available as
MiningSchema/MiningField@importance attributes:
https://dmg.org/pmml/v4-4-1/MiningSchema.html
Create an AbstractVisitor subclass that visits MiningField elements,
and then prints out the result of MiningField#getImportance() method.
> **Clearly**, here we also need to look through the
> PMML file and fetch the model type and name.
The model type is reflected in the name of the top-level Model element.
For example, all linear regression models become are represented using
the RegressionModel element (irrespective of their native ML framework
representation):
https://dmg.org/pmml/v4-4-1/Regression.html
During Visitor API traversal, you can distinguish between top-level
and nested models by checking the status of the current element stack,
as available via the (Abstract)Visitor#getParents() method.
By definition, for a top-level model, the stack of parent elements
contains a single PMML element.
> So, do we have any library in java by which we can
> get the information that we require on the fly on any pmml file?
>
TLDR: See the JPMML-Model library:
https://github.com/jpmml/jpmml-model
For Visitor API code example, please use GitHub search.
VR