Hi Ben,
> However with JPMML, it gives a strange error, an InvalidFeatureException:
>
> java -jar evaluator-1.2.jar --input twoNumbers.csv --model addTwoNumbers.pmml --output validate.csv
> Exception in thread "main" org.jpmml.evaluator.InvalidFeatureException (at or around line 1): PMML
> at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:59)
> at org.jpmml.evaluator.ModelManagerFactory.newModelManager(ModelManagerFactory.java:46)
> at org.jpmml.evaluator.EvaluationExample.execute(EvaluationExample.java:181)
> at org.jpmml.evaluator.Example.execute(Example.java:60)
> at org.jpmml.evaluator.EvaluationExample.main(EvaluationExample.java:110)
The ModelManager factory raises this exception, because the PMML
(root-) element does not contain any Model child elements. Therefore,
the ModelManager thinks that this is an invalid PMML document, because
it is impossible to instantiate any org.jpmml.evaluator.ModelEvaluator
subclass based on it.
Conceptually, PMML gives you three types of building blocks:
1) Built-in and user-defined functions
(
http://dmg.org/pmml/v4-2-1/BuiltinFunctions.html and
http://dmg.org/pmml/v4-2-1/Functions.html)
2) Transformation elements (
http://dmg.org/pmml/v4-2-1/Transformations.html)
3) Model elements.
You should always try to express your "computation problem" at the
highest conceptual level. For example, at the lowest level you can
build the equivalent of a Java/C/C++ switch statement by nesting
multiple <Apply function="if"> elements into one another. However, it
is much more elegant to move to the next conceptual level and use the
MapValues transformation
(
http://dmg.org/pmml/v4-2-1/Transformations.html#xsdElement_MapValues)
instead.
The computational problem "scale the field value 'a' by constant $X
and add constant $Y" (ie. z = X * a + Y) could be expressed as the
following RegressionModel element (you need to substitute $X and $Y
with actual numeric values):
<RegressionModel functionName="regression">
<RegressionTable intercept="$Y">
<NumericPredictor name="a" coefficient="$X"/>
</RegressionTable>
</RegressionModel>
> I was looking at the simplest possible PMMs to see how very basic functionality
> can be implemented using PMMLs, such as data transformations or calculating
> derived attributes.
>
> The PMML code is this:
>
> <PMML version="4.1" xmlns="
http://www.dmg.org/PMML-4_1">
> <Header/>
> <DataDictionary>
> <DataField name="x" dataType="double" optype="continuous"/>
> <DataField name="y" dataType="double" optype="continuous"/>
> </DataDictionary>
> <TransformationDictionary>
> <DerivedField name="z" dataType="double" optype="continuous">
> <Apply function="+">
> <FieldRef field="x"/>
> <FieldRef field="y"/>
> </Apply>
> </DerivedField>
> </TransformationDictionary>
> </PMML>
The simplest possible PMML document should contain at least one model.
Otherwise it is just a collection of data processing
functions/transformations.
The JPMML-Evaluator project does not provide a command-line
application for evaluating standalone functions/transformations.
However, you can use the Java API to build such thing quite easily.
Here's the idea:
// First, some boilerplate
org.dmg.pmml.PMML pmml = loadPmmlDocumentFromFile();
org.jpmml.evaluator.PMMLManager pmmlManager = new PMMLManager(pmml);
// Then, create the request object and define argument values
org.jpmml.evaluator.PMMLEvaluationContext context = new
PMMLEvaluationContext(pmmlManager);
context.declare(FieldName.create("x"), 1d);
context.declare(FieldName.create("y"), 1d);
// Finally, invoke the function by name
org.jpmml.evaluator.FieldValue result = context.evaluate(FieldName.create("z"));
System.out.println(result);
The trouble with working with low(er)-level functions/transformations
is that, unlike Model elements, they do not specify a mining schema.
In other words, it is programmatically rather difficult to find out
what kind of argument values must be provided, and what is the result.
Anyway, this idea can be quite easily expanded to mimic the PMML
Preprocessing service of Google's Prediction API:
https://cloud.google.com/prediction/docs/pmml-schema?hl=en
VR