How is the output field recognized from the model constructed?

241 views
Skip to first unread message

A Nyugen

unread,
May 27, 2018, 8:42:16 AM5/27/18
to Java PMML API
Hi Villu,

I have been trying to understand how the "output" fields are recognized from the model and translated to represent the "output tag" in the PMML file.

I have been using an RDS file (Linear Regression) for some analysis and converted it to PMML format via the JPMML-R command line tool. In this generated PMML document, I do not see the "output tag" at all. The data dictionary, mining schema and regression table are however present in the PMML. So, I was curious to know how JPMML-R or R2PMML (which was not used in this case) identifies the output fields?
PS: My testing* and scoring functionalities are in separate files. Hence, adding of a new empty output column and my predict() operation happens in a different R file altogether, that JPMML-R does not know about.

I have been trying to understand the how r2pmml is able to recognize the output fields from a previous example I tried. [Screenshot attached]
Here, there are 2 output fields created as a part of the writeAudit function i.e "Probability_0" and "Probability_1".
However, when one observes the the sequence of execution inside the generateGeneralRegressionAudit function, we see that the glm is called first, r2pmml is called next and a pmml file is generated at this step and ONLY then predict() function is called, after which writeAudit() is triggered. So how does r2pmml know in advance about the output fields because technically writeAudit() comes after the execution of the r2pmml statement..

(1) Does it mean that I need to declare my new empty output column somewhere before I construct my model in the same testing* R script for JPMML-R to recognize the output column?
(2) Are there any other reasons for the "output" not being shown in the PMML file in the cases that you have encountered till date?

Any insight on this would be really helpful.

Thanks.

example_audit.png

Villu Ruusmann

unread,
May 27, 2018, 6:17:10 PM5/27/18
to Java PMML API
Hello,

>
> I have been trying to understand how the "output" fields
> are recognized from the model and translated to represent
> the "output tag" in the PMML file.
>

The Output element (http://dmg.org/pmml/v4-3/Output.html; see the
"Outputs Per Model Type" table at the very end) exposes "supporting
information" about a prediction.

Three most common kinds of outputs:
1) Classification: probability distribution ("probability" result feature).
2) Clustering: distances to centroids ("affinity" result feature).
3) Any function: the identifier of the winning element ("entityId"
result feature). For example, the identifier of the terminal Node
element in decision tree models.

If some model object is capable of making such predictions, then JPMML
libraries generate the Output element automatically. For example, if
some R model object supports 'predict::<model>(.., type = "proba")'
function invocation, then the JPMML-R library creates an Output
element, and populates it with OutputField elements, one for each
target category.

> I have been using an RDS file (Linear Regression) for some
> analysis. In this generated PMML document, I do not see the
> "output tag" at all.
>

The 'stats::lm()' function is producing regression-type lm() objects.
This is a very simple model object, which can only predict numeric
response, and nothing else. It would only be possible to create an
empty Output element (ie. "<Output/>") in this case.

> So, I was curious to know how JPMML-R or R2PMML
> (which was not used in this case) identifies the output fields?

It's all hard-coded in the Java application code.

For example:
*) glm() probabilities:
https://github.com/jpmml/jpmml-r/blob/master/src/main/java/org/jpmml/rexp/GLMConverter.java#L118
*) randomForest() probabilities:
https://github.com/jpmml/jpmml-r/blob/master/src/main/java/org/jpmml/rexp/RandomForestConverter.java#L272
*) kmeans() affinities:
https://github.com/jpmml/jpmml-r/blob/master/src/main/java/org/jpmml/rexp/KMeansConverter.java#L93

You cannot turn the generation of the Output element off, or change
the names of OutputField elements from your R script.

If you really dislike current setup/conventions, then you can
post-process the PMML document after it has been generated - read it
from file, apply changes, and save back to file. From the application
perspective, it's standard XML manipulation work.

> Here, there are 2 output fields created as a part of
> the writeAudit function i.e "Probability_0" and "Probability_1".
>

All JPMML conversion libraries use the following convention for
formatting OutputField names: "<function>(<arg1>, <arg2>, ..,
<argn>)".

For example, it doesn't matter if you use JPMML-R, JPMML-SkLearn or
JPMML-SparkML, the probability fields for Iris classification models
are always called "probability(setosa)", "probability(versicolor)" and
"probability(virginica)".


VR

A Nyugen

unread,
May 28, 2018, 1:56:51 AM5/28/18
to Java PMML API
Thank you for the detailed explanation!

I am going through the code..In my case the <Output> </Output> isn't empty but the " <Output> </Output>" tags also seem to be missing which makes me wonder is it because I used step() with direction "both", after my lm() function and saved my "step" object as RDS?

Does JPMML support step() yet?

A Nyugen

unread,
May 28, 2018, 2:07:04 AM5/28/18
to Java PMML API
And could that be the reason for the missing <output>..</output> in the PMML generated?

Villu Ruusmann

unread,
May 28, 2018, 2:25:15 AM5/28/18
to Java PMML API
Hello,

>
> In my case the <Output> </Output> isn't empty but the
> " <Output> </Output>" tags also seem to be missing
>

The Output element is optional. Therefore, a model element with an
empty Output element (case A below), and a model element without an
Output element (case B below), are functionally equivalent:

Case A:
<RegressionModel>
<Output/>
</RegressionModel>

Case B:
<RegressionModel/>

It's just a matter of style. And the JPMML style is not to generate
unnecessary markup (eg. default attributes, empty elements).

>
> Does JPMML support step() yet?
>

The current 'recipes' implementation is very limited. Basically, it's
about capturing the name of the target field (when using
'caret::train()' without 'recipes', then the name of the target field
is typically missing from the RDS representation, and the JPMML-R
library calls it "_target" by default).

It should be possible to add support for most common step functions.
If you need something, then you should open a feature request in
GitHub, and please be sure to provide an example R script that
demonstrates the intended use.


VR

Nyugen

unread,
May 30, 2018, 11:53:27 PM5/30/18
to Java PMML API
Hi VR,

Thank you for the explanation. I am experimenting with a few related things at the moment and will open a Feature Request on GitHub when I get some clarity on what I'm trying to build.

From what you have stated above, looks like JPMML will still somehow be aware of what is the target field to "compute" from the model in "Case B" as well, even though there is no explicit output tag, am I understanding it right?

Villu Ruusmann

unread,
May 31, 2018, 2:47:57 AM5/31/18
to Java PMML API
Hello,

>
> From what you have stated above, looks like JPMML will
> still somehow be aware of what is the target field to "compute"
> from the model in "Case B" as well, even though there is
> no explicit output tag?
>

This must be some terminological issue.

A supervised learning model has typically one target field ("primary
result"), which can be accessed via
org.jpmml.evaluator.Evaluator#getTargetFields().

A supervised or unsupervised learning model can have any number of
output fields ("secondary results"), which can be accessed via
org.jpmml.evaluator.Evaluator#getOutputFields().

The Output element does not affect target field(s); it only affects
output field(s). From the JPMML-Evaluator API perspective, an empty
Output element (case A) and a missing Output element (case B) are
indistinguishable, because they both return an empty list of output
fields declarations:

Evaluator evaluator = ...;
assertEquals(Collections.emptyMap(), evaluator.getOutputFields());


VR
Reply all
Reply to author
Forward
0 new messages