Hi Nima,
>
> However, I'm wondering if it is possible to do so without having to specify
> a field for each individual tree score. I spent some time reading the PMML
> and found that for multiple models (i.e., ensemble models such as the RF)
> there is an attribute called multipleModelMethod (default "average") that
> may be set to "selectAll" (
http://www.dmg.org/v4-2/MultipleModels.html). If
> I get it right, this makes the standard output field carrying all
> predictions (?).
>
That's a terrific idea! You seem to know the PMML specification extremely well.
Indeed, if you are looking to do the post-processing yourself then it
makes sense to use multipleModelMethod "selectAll" instead of
"average". This feature is not advertised much, because unlike all
other functions it returns a collection-values result, not a
single-valued result. For example, in the java terminology,
"selectAll" gives you java.util.List<Double>, whereas "average" gives
you Double.
At the moment you cannot use "selectAll" with JPMML-Evaluator library,
because it will always throw an UnsupportedFeatureException. But it
would be very easy to fix it. In fact, your request comes in at a very
opportune time, because I was working with class MiningModelEvaluator
just yesterday (per Alex's feature request). So, if everything goes
well, we could have an updated JPMML-Evaluator library ready and
released by monday morning.
> Now, how does the output element have to be specified to get the statistics
> mentioned above? There is an aggregate element described in the
> transformations section of the documentation
> (
http://www.dmg.org/v4-2/Transformations.html), but I couldn't figure out
> how to calculate the statistics other than the mean:
>
> <Output>
> <OutputField name="Mean" optype="continuous" dataType="string"
> targetField="response" feature="transformedValue">
> <Aggregate field="Predicted_DV" function="mean"/>
> <OutputField/>
> </Output>
>
> Is this even correct? Do you have a hint on how to calculate standard
> deviation, and quantiles?
>
That's exactly the way to do it. There's only a small detail that you
cannot specify optype="continuous" and dataType="string" together. You
probably meant dataType="double" anyway, because you are dealing with
mean values. I would recommend to omit optype and dataType attributes
if you really don't intend to cast the value from one datatype to
another (e.g. converting from int to double).
As for the mean, standard deviation and percentile functions then you
only get the "average" from the PMML specification. However, when
using the JPMML-Evaluator library then you get a chance to define
additional "user-defined Java-backed functions". Simply implement
interface org.jpmml.evaluator.Function and register the instance with
method org.jpmml.evaluator.FunctionRegistry#putFunction(String,
Function). Please remember that this is a JPMML-specific functionality
and you will lose portability with other PMML consumer applications if
you do so.
For example, you could create class PercentileFunction that takes two
arguments. First, the FieldValue class instance that contains the
result of the "selectAll" function and second, the FieldValue instance
that contains the percentile value. So you could use the same class
for calculating both 5% and 95% percentiles, by only changing the
second argument of the Apply element. I would recommend you to
register this class PercentileFunction with its fully qualified class
name so that it would be plainly obvious for third parties that it is
not a PMML built-in function. Something like this:
<Apply function="de.uni-hamburg.pmml.PercentileFunction">
<FieldRef field="rf_selectAll"/>
<Constant>5</Constant>
</Apply>
Of course, it would possible to collect more common functions under
the JPMML-Evaluator library (e.g. package
org.jpmml.evaluator.extensions) and deploy them automatically to
FunctionRegistry.
> I think a blog about PMML would definitely be appreciated by an increasing
> number of people, since the use of PMML seems to be taking off!
>
You just gave me an idea for the second blog post :-)
VR