Hi Peter,
>
> It certainly seems that the PMML spec allows missing values and has many options
> for dealing with them when they are encountered during model evaluation.
> However, I couldn't find a way to represent missing values for use in the JPMML evaluator.
>
The JPMML-Evaluator does not cut corners here. Everything is
implemented according to the PMML specification.
There are two scenarios:
1) "The field does not exist". That is, java.util.Map#containsKey(K)
returns false;
2) "The field exists, but is mapped to the null value". That is,
java.util.Map#containsKey(K) returns true, but java.util.Map#get(K)
returns a null reference.
If you want to pass a null reference as a field value, simply do the following:
Map<FieldName, Object> arguments = ...
arguments.put(new FieldName("optional_field"), null);
Let's assume that you are scoring data records from a CSV value and
encounter a cell whose value is "N/A". In that case you should still
insert a null reference as a value to the arguments map.
> The FieldValue class seems to be pretty locked down in terms of null values and will throw
> an exception if null is used as the value. However, when fields are missed from the context
> instead (as in the CSVEvaluationExample) MissingFieldExceptions are thrown.
>
Class org.jpmml.evaluator.FieldValue (not to be confused with
org.dmg.pmml.FieldValue) is a wrapper around user-provided Java
primitive value. This kind of wrapper is necessary, because a PMML
value has a data type and an operational type (i.e. continuous,
categorical, ordinal). For example, an ordinal String could be created
in Java application code as follows:
OrdinalValue ordinalString = new OrdinalValue(DataType.STRING, "medium");
ordinalString.setOrdering(Arrays.asList("low", "medium", "high"));
However, you should never instantiate classes ContinuousValue,
CategoricalValue or OrdinalValue directly. Please use appropriate
methods from the utility class org.jpmml.evaluator.FieldValueUtil.
Actually, this utility class contains two types of methods. First
there are ordinary object creation methods (#create(...)) and then
there are kind of object casting methods (#refine(...)). For example,
you can cast the above ordinal String to a categorical String as
follows:
CategoricalValue categoricalString =
(CategoricalString)FieldValueUtil.refine(DataType.STRING,
OpType.CATEGORICAL, ordinalString);
A null FieldValue is simply represented by a null reference. For
example, when you invoke FieldValueUtil#create(DataType.STRING,
OpType.CATEGORICAL, null) you will get back a null reference. As of
PMML schema versions 3.X and 4.X, this is an optimal solution. Maybe
PMML schema version 5.0 will introduce operations on null values
(let's hope not!), and in that case it will be necessary to devise a
new solution.
Anyway, in a typical use scenario, there is no need to engage with
FieldValues in your Java application code. In the JPMML-Evaluator
1.1.X series the whole interaction with the PMML scoring engine can be
fit into a single line of Java source code (see
http://openscoring.io/blog/2014/05/15/jpmml_evaluator_api_prepare_evaluate/
chapter "Option 2: lazy preparation"). You simply pass you arguments
as a Map<FieldName, String>, and you'll get back a Map<FieldName, ?>
(where the value type is either an instance of
org.jpmml.evaluator.Computable or a Java primitive type).
Did that answer your question? If you have a PMML model that throws an
MissingFieldException, then it simply "declares" that it requires that
all fields are mapped to non-missing values. I could probably give you
a more detailed answer if you sent me (privately) the PMML file
together with a few problematic lines from the CSV file.
VR