Hi Ian,
> I am using LightGBM to produce a tree model, converting
> the LightGBM model to a PMML model using the jpmml-lightgbm
> converter. Then using the pmml-evaluator to evaluate the model.
>
A general warning - LightGBM is evolving quite rapidly, and the
JPMML-LightGBM library might be more or less outdated. The current
version of JPMML-LightGBM is targeting LightGBM v2.0.7.
> Initially I had some issues converting LGBM models with
> large numeric values in the cat_threshold section.
>
Well spotted!
This is a major issue, and I've propagated it to GitHub:
https://github.com/jpmml/jpmml-lightgbm/issues/9
Would you mind attaching your patch there?
> However, I am having issues using the evaluator
> (getting an InvalidResultException) which I have tracked
> down to my arguments being outside the range
> determined during training.
>
Possible solutions:
1) Add more (dummy-) data records to your dataset so that the complete
"applicability domain" would be covered.
2) Manually edit the "feature_infos" attribute (in the header section)
in LightGBM model text file.
3) You're probably using LightGBM standalone, and converting models
with JPMML-LightGBM command-line application. However, if you switched
to Scikit-Learn framework, then you'd be able to customize feature
bounds with the help of sklearn2pmml.decoration.ContinuousDomain
meta-transformation.
4) Post-process PMML documents using the JPMML-Model library. If you
remove DataField/Interval elements, then all input values will be
considered to be valid.
5) Enhancing the JPMML-LightGBM command-line application with a
"--no-domain" command-line switch, which would apply solution #4
automatically.
Here's an example about exporting "domain-less" LightGBM models using
Scikit-Learn:
pipeline = PMMLPIpeline([
("mapper", DataFrameMapper([
("x", ContinuousDomain(with_data = False)) # THIS!
])),
("estimator", LGBMRegressor())
])
Here's an example using the Visitor API to get rid of all
DataField/Interval elements:
class NoDomainVisitor extends org.jpmml.model.visitors.AbstractVisitor {
@Override
public VisitorAction visit(DataField dataField){
if(dataField.hasIntervals()){
List<Interval> intervals = dataField.getIntervals();
intervals.clear();
}
return super.visit(dataField);
}
}
org.dmg.pmml.PMML pmml = loadPMML();
Visitor visitor = new NoDomainVisitor();
visitor.applyTo(pmml);
VR